Test3R: Learning to Reconstruct 3D
at Test Time

National University of Singapore

Abstract

Dense matching methods like DUStt3R regress pairwise pointmaps for 3D reconstruction. However, the reliance on pairwise prediction and the limited generalization capability inherently restrict the global geometric consistency. In this work, we introduce Test3R, a surprisingly simple test-time learning technique that significantly boosts geometric accuracy. Using image triplets \((I_1,I_2,I_3\)), Test3R generates reconstructions from pairs \((I_1,I_2\)) and \((I_1,I_3\)). The core idea is to optimize the network at test time via a self-supervised objective: maximizing the geometric consistency between these two reconstructions relative to the common image \(I_1\). This ensures the model produces cross-pair consistent outputs, regardless of the inputs. Extensive experiments demonstrate that our technique significantly outperforms previous state-of-the-art methods on the 3D reconstruction and multi-view depth estimation tasks. Moreover, it is universally applicable and nearly cost-free, making it easily applied to other models and implemented with minimal test-time training overhead and parameter footprint.

Methods

Interpolation end reference image.

The primary goal of Test3R is to adapt a pretrained reconstruction model \(f_{s}\) to the specific distribution of test scenes \(f_{t}\). It achieves this goal by optimizing a set of visual prompts at test time through a self-supervised training objective that maximizes cross-pair consistency between \(X_1^{ref, ref}\) and \(X_2^{ref, ref}\).

Quantitative Results

Interpolation end reference image.

Comparison on cross-pair consistency

Interpolation end reference image.

We visualize the pointmaps of the same reference view but paired with different source views. Compared to vanilla DUSt3R, Test3R demonstrates superior consistency across different pairs. The depthmaps predicted by DUSt3R exhibit significant inconsistencies in regions with limited overlap. After optimizing by cross-pairs consistency objective, Test3R generates consistent and reliable depth predictions in these regions. Moreover, even in the \((I^{ref}, I^{ref})\) pair case, Test3R can still predict relatively consistent depth maps.


Qualitative Results

Interpolation end reference image.
DUSt3R
Test3R
Interpolation end reference image.
DUSt3R
Test3R

BibTeX

@misc{yuan2025test3rlearningreconstruct3d,
      title={Test3R: Learning to Reconstruct 3D at Test Time}, 
      author={Yuheng Yuan and Qiuhong Shen and Shizun Wang and Xingyi Yang and Xinchao Wang},
      year={2025},
      eprint={2506.13750},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.13750}, 
}