InstantPose: Zero-Shot Instance-Level 6D Pose Estimation From a Single View

IRIS

Object pose estimation using visual data is crucial for robotic interaction with the environment. Many existing instance-level methods are restricted by their requirements for 3D CAD models or multiple object views, which limits their flexibility and generalizability. Overcoming this limitation is critical to enhance the adaptability of pose estimation systems. In this work, a novel pipeline that leverages recent advances in reconstruction techniques is presented to address these challenges. To this end, Large Reconstruction Models (LRM) represent an advanced neural architecture capable of generating 3D object models from a limited set of views. Nevertheless, the resulting 3D models often lack relevant geometric and texture details due to insufficient input information. This research presents InstantPose, an innovative zero-shot instance-level pose estimation method that, building upon LRM, can determine the pose of unseen objects using as little as a single unposed RGB reference and RGB-D query images. Extensive experiments demonstrate that InstantPose achieves remarkable performance in object pose estimation on the YCB-V dataset, compared to methods conceived to rely on a geometrically perfect object's model. Furthermore, the 6D pose provided through the presented approach facilitates successful object grasping, highlighting its practical utility in robotic manipulation tasks.

InstantPose: Zero-Shot Instance-Level 6D Pose Estimation From a Single View

Di Felice F.;Remus A.;Gasperini S.;Busam B.;Ott L.;Thalhammer S.;Tombari F.;Avizzano C. A.

2025-01-01

Abstract

Object pose estimation using visual data is crucial for robotic interaction with the environment. Many existing instance-level methods are restricted by their requirements for 3D CAD models or multiple object views, which limits their flexibility and generalizability. Overcoming this limitation is critical to enhance the adaptability of pose estimation systems. In this work, a novel pipeline that leverages recent advances in reconstruction techniques is presented to address these challenges. To this end, Large Reconstruction Models (LRM) represent an advanced neural architecture capable of generating 3D object models from a limited set of views. Nevertheless, the resulting 3D models often lack relevant geometric and texture details due to insufficient input information. This research presents InstantPose, an innovative zero-shot instance-level pose estimation method that, building upon LRM, can determine the pose of unseen objects using as little as a single unposed RGB reference and RGB-D query images. Extensive experiments demonstrate that InstantPose achieves remarkable performance in object pose estimation on the YCB-V dataset, compared to methods conceived to rely on a geometrically perfect object's model. Furthermore, the 6D pose provided through the presented approach facilitates successful object grasping, highlighting its practical utility in robotic manipulation tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno del prodotto

2025

Appare nelle tipologie:

1.1 Articolo su Rivista/Article

File in questo prodotto:

File	Dimensione	Formato
InstantPose_Zero-Shot_Instance-Level_6D_Pose_Estimation_From_a_Single_View.pdf accesso aperto Tipologia: Documento in Pre-print/Submitted manuscript Licenza: Copyright dell'editore Dimensione 1.99 MB Formato Adobe PDF Visualizza/Apri	1.99 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11382/587778

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

social impact