Object pose estimation using visual data is crucial for robotic interaction with the environment. Many existing instance-level methods are restricted by their requirements for 3D CAD models or multiple object views, which limits their flexibility and generalizability. Overcoming this limitation is critical to enhance the adaptability of pose estimation systems. In this work, a novel pipeline that leverages recent advances in reconstruction techniques is presented to address these challenges. To this end, Large Reconstruction Models (LRM) represent an advanced neural architecture capable of generating 3D object models from a limited set of views. Nevertheless, the resulting 3D models often lack relevant geometric and texture details due to insufficient input information. This research presents InstantPose, an innovative zero-shot instance-level pose estimation method that, building upon LRM, can determine the pose of unseen objects using as little as a single unposed RGB reference and RGB-D query images. Extensive experiments demonstrate that InstantPose achieves remarkable performance in object pose estimation on the YCB-V dataset, compared to methods conceived to rely on a geometrically perfect object's model. Furthermore, the 6D pose provided through the presented approach facilitates successful object grasping, highlighting its practical utility in robotic manipulation tasks.

InstantPose: Zero-Shot Instance-Level 6D Pose Estimation From a Single View

Remus A.
;
Avizzano C. A.
2025-01-01

Abstract

Object pose estimation using visual data is crucial for robotic interaction with the environment. Many existing instance-level methods are restricted by their requirements for 3D CAD models or multiple object views, which limits their flexibility and generalizability. Overcoming this limitation is critical to enhance the adaptability of pose estimation systems. In this work, a novel pipeline that leverages recent advances in reconstruction techniques is presented to address these challenges. To this end, Large Reconstruction Models (LRM) represent an advanced neural architecture capable of generating 3D object models from a limited set of views. Nevertheless, the resulting 3D models often lack relevant geometric and texture details due to insufficient input information. This research presents InstantPose, an innovative zero-shot instance-level pose estimation method that, building upon LRM, can determine the pose of unseen objects using as little as a single unposed RGB reference and RGB-D query images. Extensive experiments demonstrate that InstantPose achieves remarkable performance in object pose estimation on the YCB-V dataset, compared to methods conceived to rely on a geometrically perfect object's model. Furthermore, the 6D pose provided through the presented approach facilitates successful object grasping, highlighting its practical utility in robotic manipulation tasks.
2025
File in questo prodotto:
File Dimensione Formato  
InstantPose_Zero-Shot_Instance-Level_6D_Pose_Estimation_From_a_Single_View.pdf

accesso aperto

Tipologia: Documento in Pre-print/Submitted manuscript
Licenza: Copyright dell'editore
Dimensione 1.99 MB
Formato Adobe PDF
1.99 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11382/587778
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
social impact