Multi-View Stereo for Community Photo Collections
Abstract
We develop an advanced multi-view stereo system designed to address significant variations in lighting conditions, object scales, crowd density, and various other factors within large online community photo archives.
Our approach involves intelligently selecting images that correspond to each view and pixel level respectively.
We demonstrate this adaptive view selection allows for robust performance despite significant changes in the visual appearance.
The stereo matching technique employs sparse 3D points computed via SfM methods as its input and iteratively expands surfaces from these points. By optimizing for surface normal orientations within the photoconsistency measure, this approach achieves notably enhanced matching outcomes.
While the emphasis of our approach lies in generating high-fidelitydepth maps, we also demonstrate instances where these depth maps are integrated into compelling scene reconstructions. We present our algorithm across standard multi-view stereo datasets and showcase diverse photo collections featuring iconic scenes sourced from the Internet.
Figure
Figure 1

CPC composed of pictures of the Trevi Fountain gathered online.
Changes in illumination and camera response result in notable appearance variations.
Additionally, images frequently feature clutter (including tourists), which vary notably between images.
Figure 2

Images of Notre Dame with drastically different sampling rates.
All the images are displayed at native resolution and then downsized to 200×200 pixels to exhibit a sampling ratio over three orders of magnitude.
Figure 3

Parametrization for stereo matching.
Left: The window centered at pixel

in the reference view corresponds to apoint

at adistance

along the viewing ray

.
Right: A Cross-sectional view through the window demonstrates how the window's orientation is parameterized as a depth offset.

.
Figure 4

Various perspectives from the Trevi Fountain, Statue of Liberty, St. Peter’s Cathedral, and Pisa Duomo’s dataset are associated with their corresponding depth maps and shaded renderings.
Figure 5

The impact of local view selection (LVS) and tuning of normals (ON) within a depth map from the nskulla model.
Figure 6

**居中排列 : **整合后的深度图集合包含72张深度图,并配合close-up view进行展示。
The integrated model of the Notre Dame Cathedral's central portal comprises 206 depth maps.
Figure 7

对比分析合并后的PISA模型(a)与激光扫描模型(b)
Conclusion
We introduced a novel multi-view stereo approach that is capable of reconstructing high-quality images across a broad spectrum of scenarios using large, shared, multi-user photo collections that are available online.
Thanks to the surge in imagery available online, this ability provides an extraordinary potential for computing accurate geometric models of the world’s sites, cities, and landscapes.
