SeeClear

✨ Abstract ✨

Monocular depth estimation remains challenging for transparent objects, where refraction and transmission are difficult to model and break the appearance assumptions used by depth networks. As a result, state-of-the-art estimators often produce unstable or incorrect depth predictions for transparent materials. We propose SeeClear, a novel framework that converts transparent objects into generative opaque images, enabling stable monocular depth estimation for transparent objects. Given an input image, we first localize transparent regions and transform their refractive appearance into geometrically consistent opaque shapes using a diffusion-based generative opacification module. The processed image is then fed into an off-the-shelf monocular depth estimator without retraining or architectural changes. To train the opacification model, we construct SeeClear-396k, a synthetic dataset containing 396k paired transparent-opaque renderings. Experiments on both synthetic and real-world datasets show that SeeClear significantly improves depth estimation for transparent objects.

🎯 Pipeline 🎯

Starting from an image, we first apply a segmentation model to obtain the transparent object mask. Guided by the mask and the image, a latent diffusion model generates an opacified image of the transparent object. A mask refinement module then predicts a soft blending mask to alpha-composite the generated opaque region with the original background, producing the final composited image. The composited image is finally fed into a depth model to estimate accurate depth.

♣️ Qualitative Comparison on In-the-Wild Images ♣️

MoGe-2 exhibits depth leakage through transparent objects (e.g., a water bottle), while our method produces more accurate and consistent depth predictions.

♠️ Comparison ♠️

We evaluate transparent-object depth estimation on ClearGrasp and TransPhy3D datasets. Compared with the baseline. SeeClear produces accurate transparent-object depth.

Input