The primary goal of Test3R is to adapt a pretrained reconstruction model \(f_{s}\) to the specific distribution of test scenes \(f_{t}\). It achieves this goal by optimizing a set of visual prompts at test time through a self-supervised training objective that maximizes cross-pair consistency between \(X_1^{ref, ref}\) and \(X_2^{ref, ref}\).
We visualize the pointmaps of the same reference view but paired with different source views. Compared to vanilla DUSt3R, Test3R demonstrates superior consistency across different pairs. The depthmaps predicted by DUSt3R exhibit significant inconsistencies in regions with limited overlap. After optimizing by cross-pairs consistency objective, Test3R generates consistent and reliable depth predictions in these regions. Moreover, even in the \((I^{ref}, I^{ref})\) pair case, Test3R can still predict relatively consistent depth maps.
@misc{yuan2025test3rlearningreconstruct3d,
title={Test3R: Learning to Reconstruct 3D at Test Time},
author={Yuheng Yuan and Qiuhong Shen and Shizun Wang and Xingyi Yang and Xinchao Wang},
year={2025},
eprint={2506.13750},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.13750},
}