Research Scientist

Posted 2026-05-06
Remote, USA Full-time Immediate Start


Active Inference Benchmarking Researcher


Description



Overview


Contribute to the design, implementation, and evaluation of benchmarking frameworks for uncertainty-aware autonomy—specifically active inference—within a teleoperation-augmented robotics platform. This role focuses on quantifying how probabilistic decision-making improves human-in-the-loop scalability, safety under uncertainty, and autonomous productivity across real-world robotic systems.


Key Responsibilities

1. Active Inference Benchmark Design & Execution

  • Co-design and implement benchmarking protocols comparing active inference agents to:
  • Conventional reinforcement learning (RL) baselines
  • RL systems augmented with uncertainty estimation
  • Evaluate performance across:
  • Data efficiency
  • Safety under distribution shift
  • Directed exploration
  • Sim-to-real robustness
  • Teleoperation scaling efficiency
  • Explainability


2. Teleoperation-Aware Evaluation Framework


  • Integrate benchmarking into a standardized teleoperation control protocol where agents decide when to:
  • Continue autonomous execution
  • Request human takeover under a constrained intervention budget
  • Develop metrics capturing:
  • Human scalability (operator-to-robot ratio, intervention allocation efficiency)
  • Safety under uncertainty (timeliness and selectivity of handovers)
  • Autonomous work efficiency (task completion under limited supervision)


3. Platform Integration (Teleoperation Stack)


  • Align benchmarking workloads with the broader teleoperation platform architecture:
  • On-robot control and safety systems
  • Near-edge inference (uncertainty estimation, planning, intervention logic)
  • Cloud-based training, analytics, and fleet management
  • Ensure benchmarks reflect real system constraints:
  • Latency budgets
  • Network degradation and connectivity loss
  • Multi-robot resource sharing


4. Embodiment Ladder Evaluation


  • Execute experiments across a staged pipeline:
  • Tier 1: Controlled simulation (e.g., MuJoCo environments)
  • Tier 2: High-fidelity robotic simulation (e.g., RLBench, ManiSkill)
  • Tier 3: Real-world or dataset-driven validation
  • Maintain consistency via a shared teleoperation surrogate (expert policy / planner) to emulate human intervention


5. Uncertainty & Intervention Analysis


  • Quantify and analyze:
  • Calibration of uncertainty signals
  • Intervention precision/recall
  • Learning from intervention (post-handover improvement)
  • Stability across repeated autonomy–human control cycles
  • Compare whether:
  • Native probabilistic approaches (active inference)
  • Retrofitted uncertainty (ensembles, Bayesian heads, etc.)
  • Heuristic baselines
  • best optimize teleoperation efficiency


6. Systems & Scaling Insights


  • Profile compute and system behavior of active inference workloads within the teleoperation stack:
  • World model rollouts
  • Posterior inference
  • Intervention decision logic
  • Contribute to:
  • Near-edge workload allocation strategies
  • Fleet scaling models (robots per server)
  • Latency vs. safety tradeoffs


7. Deliverables


  • Reproducible benchmarking suite and datasets
  • Technical reports and whitepapers
  • Conference publications (robotics / ML / systems venues)
  • Design recommendations for teleoperation and autonomy stacks
  • Cross-team guidance for infrastructure, controls, and ML teams


Success Criteria

  • Demonstrated improvement in intervention efficiency vs. safety tradeoff
  • Measurable gains in operator scaling (robots per human)
  • Robust performance under distribution shift and real-world noise
  • Clear evidence of when and why uncertainty-aware methods outperform baselines




About the Company



Noumenal Labs is a deep tech AI company closing performance gaps in outdoor robotics. Our uncertainty-aware systems learn and adapt in real time, positioning Noumenal as a core software layer for next-generation robotic hardware operating in uncharted domains.

Similar Jobs

Back to Job Board