With the advent of large-scale 3D datasets, feed-forward 3D generative models, such as the Large Reconstruction Model (LRM), have gained significant attention and achieved remarkable success. However, we observe that RGB images often lead to conflicting training objectives and lack the necessary clarity for geometry reconstruction.
In this paper, we revisit the inductive biases associated with mesh reconstruction and introduce DiMeR, a novel disentangled dual-stream feed-forward model for sparse-view mesh reconstruction. The key idea is to disentangle both the input and framework into geometry and texture parts, thereby reducing the training difficulty for each part according to the Principle of Occam's Razor. Given that normal maps are strictly consistent with geometry and accurately capture surface variations, we utilize normal maps as exclusive input for the geometry branch to reduce the complexity between the network's input and output. Moreover, we improve the mesh extraction algorithm to introduce 3D ground truth supervision.
As for texture branch, we use RGB images as input to obtain the textured mesh. Overall, DiMeR demonstrates robust capabilities across various tasks, including sparse-view reconstruction, single-image-to-3D, and text-to-3D. Numerous experiments show that DiMeR significantly outperforms previous methods, achieving over 30% improvement in Chamfer Distance on the GSO and OmniObject3D dataset.
The core idea of DiMeR is to identify the necessary inductive biases for different stages of 3D mesh reconstruction. Specifically, geometry reconstruction does not require texture information, as RGB textures often introduce ambiguities in important geometric cues. Leveraging the inductive bias that normal maps are inherently consistent with realistic geometry, we focus on learning the geometry reconstruction solely from normal maps.
Interactive 3D model viewer - you can use the mouse to drag and drop to rotate, the scroll wheel to zoom in and out, and the Shift key to pan.
A battle mech in a mix of red, blue, and black color, with a cannon on the head.
Detailed facial sculpt, horned head, tapered horns, deep set eyes, prominent cheekbones, furrowed brow.
Pink teapot model symmetrical, curved spout, rounded body, flat base, circular lid, elongated handle, tapered top.
Charlie Brown, a cartoon character in a yellow and black outfit, upright posture.
A person wearing a virtual reality headset, sitting position, bent legs, clasped hands.
A pink frog wearing a green hat and bow tie, humanoid shape, bulbous hat.
3D reconstruction from the input image
3D reconstruction from the input image
3D reconstruction from the input image
3D reconstruction from the input image
@article{Jiang2025dimer,
title={DiMeR: Disentangled Mesh Reconstruction Model},
author={Jiang, Lutao and Lin, Jiantao and Chen, Kanghao and Ge, Wenhang and Yang, Xin and Jiang, Yifan and Lyu, Yuanhuiyi and Zheng, Xu and Chen, Yingcong},
journal={arXiv preprint arXiv:2504.17670},
year={2025}
}