Object pose estimation using visual data is crucial for robotic interaction with the environment. Many existing instance-level methods are restricted by their requirements for 3D CAD models or multiple object views, which limits their flexibility and generalizability. Overcoming this limitation is critical to enhance the adaptability of pose estimation systems. In this work, a novel pipeline that leverages recent advances in reconstruction techniques is presented to address these challenges. To this end, Large Reconstruction Models (LRM) represent an advanced neural architecture capable of generating 3D object models from a limited set of views. Nevertheless, the resulting 3D models often lack relevant geometric and texture details due to insufficient input information. This research presents InstantPose, an innovative zero-shot instance-level pose estimation method that, building upon LRM, can determine the pose of unseen objects using as little as a single unposed RGB reference and RGB-D query images. Extensive experiments demonstrate that InstantPose achieves remarkable performance in object pose estimation on the YCB-V dataset, compared to methods conceived to rely on a geometrically perfect object's model. Furthermore, the 6D pose provided through the presented approach facilitates successful object grasping, highlighting its practical utility in robotic manipulation tasks.
InstantPose: Zero-Shot Instance-Level 6D Pose Estimation From a Single View
Remus A.
;Avizzano C. A.
2025-01-01
Abstract
Object pose estimation using visual data is crucial for robotic interaction with the environment. Many existing instance-level methods are restricted by their requirements for 3D CAD models or multiple object views, which limits their flexibility and generalizability. Overcoming this limitation is critical to enhance the adaptability of pose estimation systems. In this work, a novel pipeline that leverages recent advances in reconstruction techniques is presented to address these challenges. To this end, Large Reconstruction Models (LRM) represent an advanced neural architecture capable of generating 3D object models from a limited set of views. Nevertheless, the resulting 3D models often lack relevant geometric and texture details due to insufficient input information. This research presents InstantPose, an innovative zero-shot instance-level pose estimation method that, building upon LRM, can determine the pose of unseen objects using as little as a single unposed RGB reference and RGB-D query images. Extensive experiments demonstrate that InstantPose achieves remarkable performance in object pose estimation on the YCB-V dataset, compared to methods conceived to rely on a geometrically perfect object's model. Furthermore, the 6D pose provided through the presented approach facilitates successful object grasping, highlighting its practical utility in robotic manipulation tasks.| File | Dimensione | Formato | |
|---|---|---|---|
|
InstantPose_Zero-Shot_Instance-Level_6D_Pose_Estimation_From_a_Single_View.pdf
accesso aperto
Tipologia:
Documento in Pre-print/Submitted manuscript
Licenza:
Copyright dell'editore
Dimensione
1.99 MB
Formato
Adobe PDF
|
1.99 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

