📑 Table of Contents

FP3 Model Nominated for ICRA 2026 Best Paper

📅 · 📁 Research · 👁 1 views · ⏱️ 9 min read
💡 Spirit AI and Tsinghua University's FP3 model targets 3D robotic perception gaps, earning an ICRA 2026 nomination.

FP3 Robot Model Challenges 2D Limits at ICRA 2026

The IEEE International Conference on Robotics and Automation (ICRA) 2026 has announced its prestigious Best Paper Award nominations, highlighting a significant breakthrough in embodied AI. A team from Tsinghua University and startup Spirit AI has secured a nomination with their research paper on FP3, a novel large-scale 3D base policy model.

This recognition underscores the growing importance of three-dimensional geometric understanding in robotics. While most current models rely on flat images, FP3 integrates complex point cloud data to enhance spatial reasoning.

Key Takeaways from the FP3 Research

  • Nomination Status: The FP3 paper is a finalist for the Best Paper Award at ICRA 2026 in Vienna.
  • Leadership: Led by Gao Yang, Assistant Professor at Tsinghua IIIS and Chief Scientist at Spirit AI.
  • Model Scale: FP3 features 1.3 billion parameters built on a scalable diffusion Transformer architecture.
  • Training Data: Pre-trained on 60,000 motion trajectories that include rich point cloud observations.
  • Core Innovation: Shifts focus from 2D image inputs to 3D geometric perception for better robot manipulation.
  • Industry Impact: Represents a major step forward for Chinese tech firms in global robotics research.

Bridging the Gap Between 2D Vision and 3D Reality

Current robotic foundation models face a critical limitation: they primarily process two-dimensional visual data. This approach fails to capture the depth and spatial relationships necessary for complex physical interactions. Robots operating in unstructured environments require more than just pixel data; they need a true understanding of volume and distance.

The FP3 model addresses this deficiency directly. By incorporating 3D geometric information, the system allows robots to perceive and understand real-world spaces with greater accuracy. This shift is crucial for tasks requiring precise manipulation, such as assembling delicate components or navigating cluttered warehouses.

Unlike previous iterations that struggled with spatial ambiguity, FP3 leverages a unique model structure. It processes multimodal data effectively, ensuring that the robot's actions are grounded in a realistic representation of its surroundings. This capability is essential for achieving general-purpose autonomy in dynamic settings.

Technical Architecture and Training Methodology

The technical backbone of FP3 is impressive in its scale and sophistication. The model utilizes a diffusion Transformer architecture, which is known for its ability to generate high-quality sequential data. With 1.3 billion parameters, it strikes a balance between computational efficiency and representational power.

Pre-training involved a massive dataset of 60,000 motion trajectories. Crucially, these trajectories were not just simple video clips but included detailed point cloud observations. Point clouds provide a dense set of data points in space, offering a comprehensive map of the environment's geometry.

Scalable Diffusion Transformers

The use of diffusion transformers allows the model to learn complex distributions of movement. This means the robot can predict optimal paths and grasps by understanding the probability of successful outcomes in 3D space. The scalability of this architecture ensures that future versions can easily incorporate even larger datasets without losing performance.

This methodology contrasts sharply with traditional reinforcement learning approaches, which often require extensive trial-and-error in simulation. FP3's pre-training phase provides a robust baseline, enabling faster adaptation to new tasks with minimal fine-tuning.

Strategic Leadership and Industry Context

The research behind FP3 is driven by prominent figures in the AI community. Gao Yang, an Assistant Professor at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University, serves as a key advisor. His academic expertise bridges theoretical computer science with practical robotic applications.

Simultaneously, Gao Yang holds the role of Co-founder and Chief Scientist at Spirit AI (Qianxun Intelligence). Spirit AI is recognized as a leading 'unicorn' in the domestic embodied intelligence sector. Their involvement highlights the trend of academia-industry collaboration accelerating technological deployment.

Global Competition in Embodied AI

The nomination of FP3 alongside other notable papers like HITTER signals strong competition in the field. Western companies such as Tesla with their Optimus bot and Figure AI are also racing to develop advanced robotic brains. However, the emphasis on 3D perception in FP3 offers a distinct advantage in precision tasks.

This development reflects a broader shift in the global AI landscape. Researchers are moving beyond pure language processing to integrate sensory-motor skills. The ability to manipulate physical objects intelligently is becoming the next frontier for artificial intelligence investment.

Practical Implications for Developers and Businesses

For developers building robotic systems, the availability of 3D-aware base models reduces the barrier to entry. Instead of creating custom perception pipelines from scratch, teams can leverage pre-trained models like FP3. This accelerates development cycles and lowers costs associated with training data collection.

Businesses in manufacturing and logistics stand to benefit significantly. Enhanced 3D perception allows for more flexible automation. Robots can handle irregularly shaped items or adapt to changing warehouse layouts without reprogramming. This flexibility is vital for maintaining competitiveness in supply chain operations.

Furthermore, the open nature of academic research facilitates knowledge sharing. Other researchers can build upon the FP3 framework, leading to rapid迭代ation and improvement. This collaborative environment fosters innovation across borders, benefiting the entire tech ecosystem.

Looking Ahead: The Future of Robotic Perception

As we look toward the future, the integration of 3D perception will become standard practice. Models that ignore geometric data will likely be deemed obsolete for complex manipulation tasks. The success of FP3 at ICRA 2026 validates this direction and encourages further investment in spatial AI.

Next steps for the FP3 team may include scaling up the parameter count and expanding the diversity of training scenarios. Real-world deployment trials will be crucial to validate the model's robustness outside controlled laboratory conditions.

The timeline for widespread adoption depends on hardware advancements as well. As sensors become cheaper and more powerful, the full potential of 3D base models will be realized. We can expect to see more sophisticated robots in homes and factories within the next decade.

Gogo's Take

  • 🔥 Why This Matters: FP3 solves the 'flat world' problem in robotics. By prioritizing 3D geometry over 2D images, it enables robots to interact with the physical world safely and precisely, a prerequisite for true general-purpose automation.
  • ⚠️ Limitations & Risks: Processing point clouds requires significant computational power. Smaller startups may struggle with the hardware costs needed to run 1.3B parameter models efficiently. Additionally, reliance on specific sensor types could limit compatibility with existing legacy hardware.
  • 💡 Actionable Advice: Robotics engineers should start experimenting with point cloud data formats now. Evaluate your current perception stack against 3D-native models to identify gaps in spatial reasoning before competitors gain an edge.