We show that SGs can encode surgical scenes in a human-readable format. We propose a novel pre-training step that encodes global and local information from (image, mask, SG) triplets. The learned ...
MapAnything is an open-source research framework for universal metric 3D reconstruction. At its core is a simple, end-to-end trained transformer model that directly regresses the factored metric 3D ...