estimate multiple camera poses and reconstruct the dynamic scene at the same time.
  • Python 89.2%
  • Cuda 9.3%
  • C++ 1%
  • Shell 0.5%
Find a file
Shuo 93e8d42cc6 Update: move Overview section before Reconstruction Examples
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:58:17 +01:00
calib initial commit 2025-11-27 17:29:57 +01:00
configs initial commit 2025-11-27 17:29:57 +01:00
cvd_opt adaptively extend the video depth consistency optimization 2026-03-09 14:55:22 +01:00
droid_slam fix: edges count.txt file appending error 2026-03-11 23:36:21 +01:00
evaluation_scripts initial commit 2025-11-27 17:29:57 +01:00
gifs Update: replace human scene with human2 gif in reconstruction examples 2026-03-26 21:55:52 +01:00
src initial commit 2025-11-27 17:29:57 +01:00
submodules update installation 2025-11-27 20:41:00 +01:00
thirdparty initial commit 2025-11-27 17:29:57 +01:00
vis Add: add visualization scripts 2026-03-14 23:10:31 +01:00
.gitignore fix the bug of initialization 2026-01-21 22:03:35 +01:00
.gitmodules update installation 2025-11-27 20:41:00 +01:00
demo_all.py Add edge count tracking and adjust optimization parameters 2026-01-28 16:14:12 +01:00
demo_multicam.py initial commit 2025-11-27 17:29:57 +01:00
demo_multicam_auto.py Use ufm for wide-baseline optical flow estimation + depth optimization 2025-11-30 23:51:39 +01:00
preprocess_depth.py add rescale of images as pre-processing 2025-12-09 18:21:23 +01:00
README.md Update: move Overview section before Reconstruction Examples 2026-03-26 21:58:17 +01:00
requirements.txt Use ufm for wide-baseline optical flow estimation + depth optimization 2025-11-30 23:51:39 +01:00
setup.py update installation 2025-11-27 20:41:00 +01:00

Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos

arXiv

Shuo Sun1, Unal Artan1, Malcolm Mielle2, Achim J. Lilienthal1,3, Martin Magnusson1
1 Örebro University, Sweden  |  2 Schindler Group  |  3 Technical University of Munich, Germany

Given synchronized multi-view videos and camera intrinsics, our pipeline jointly reconstructs dense dynamic 3D scenes and estimates per-camera poses — combining multi-camera SLAM, monocular depth estimation, optical flow, and scale-consistent video depth optimization.


Overview

This repository implements a multi-stage pipeline for dense dynamic scene reconstruction and camera pose estimation from synchronized multi-view videos:


Reconstruction Examples

Meeting Scene RoboArm Scene
RoboDog Scene Multiple Tracking (with mvtracker)
Multi-Camera Scene Human Scene 2

Table of Contents


TODO

  • Dataset processing and uploading

Installation

# 1. Clone with submodules
git clone --recursive git@github.com:ljjTYJR/multiple-view-dynamic-reconstruction.git
cd multiple-view-dynamic-reconstruction

# 2. Create environment (uses uv — install via: pip install uv)
uv venv .venv --python=3.10
source .venv/bin/activate

# 3. Install PyTorch (CUDA 11.8)
uv pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118

# 4. Install dependencies
uv pip install -r requirements.txt

# 5. Install local packages
uv pip install --no-build-isolation thirdparty/lietorch
uv pip install --no-build-isolation thirdparty/pytorch_scatter
uv pip install --no-build-isolation .
uv pip install --no-build-isolation -e submodules/ufm/UniCeption/
uv pip install --no-build-isolation -e submodules/ufm/

Running

Full Pipeline

python demo_all.py \
  --scene_dir /path/to/scene \
  --calib calib/kinect_crop.txt \
  --stride 1 \
  --model_type megasam \
  --prior_depth \
  --depth_model_size l \
  --flow_model_type ufm \
  --flow_model infinity1096/UFM-Base

Key Arguments

Argument Default Description
--scene_dir required Scene directory with cam01/, cam02/, ... subdirectories
--calib required Camera calibration file (fx fy cx cy)
--stride 1 Frame stride for SLAM
--model_type megasam SLAM backend: megasam or droid
--prior_depth off Use monocular depth as SLAM prior
--depth_model_size l UniDepth model size: s (faster) or l (better)
--flow_model_type ufm Optical flow model: ufm or raft
--filter_thresh 2.4 SLAM filter threshold
--keyframe_thresh 4.0 SLAM keyframe selection threshold

Optimization Weights

Argument Default Description
--w_ratio 1.0 Scale ratio consistency weight
--w_flow 0.2 Optical flow consistency weight
--w_si 0.1 Scale-invariant depth weight
--w_grad 0.2 Depth gradient smoothness weight
--w_normal 1.0 Surface normal consistency weight

Skip Stages

Use --skip_slam, --skip_depth, --skip_flow, or --skip_optimization to skip individual pipeline stages when rerunning with cached intermediate results.


Dataset

We provide demo datasets on HuggingFace:

Dataset Description Size
MultiCamRobolab Multi-camera robotic lab scenes (RoboDog, RoboArm) ~68 MB
MultiCamVideo-Dataset Multi-camera video sequences

Download

# Install huggingface_hub if needed
pip install huggingface_hub

# Download MultiCamRobolab
huggingface-cli download shuooru/MultiCamRobolab --repo-type dataset --local-dir ./data/MultiCamRobolab

Usage

Each scene directory should follow this structure:

scene/
├── cam01/          # images for camera 1 (e.g., 000000.jpg, 000001.jpg, ...)
├── cam02/          # images for camera 2
└── ...

Then run the pipeline with the provided calibration file:

python demo_all.py \
  --scene_dir ./data/MultiCamRobolab/robotdog_01 \
  --calib calib/kinect_crop.txt \
  --model_type megasam \
  --prior_depth

Visualization

Viser (interactive 3D viewer)

python viser_est.py \
  --traj-ids 1 2 \
  --max_frames 200 \
  --data_path /path/to/scene \
  --pose_dir /path/to/result

Rerun

python rerun_est.py \
  --traj-ids 1 2 \
  --max_frames 200 \
  --data_path /path/to/scene \
  --pose_dir /path/to/result

Citation

If you find this work useful, please cite:

@article{sun2026dense,
  title   = {Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos},
  author  = {Sun, Shuo and Artan, Unal and Mielle, Malcolm and Lilienthal, Achim J. and Magnusson, Martin},
  journal = {arXiv preprint arXiv:2603.12064},
  year    = {2026}
}

Acknowledgements

Our work builds on the following open-source projects:

  • MegaSAM — SLAM backend
  • DROID-SLAM — SLAM backbone
  • UniDepth — Monocular depth estimation
  • UFM — UFM: A Unified Dense Image Correspondence Estimator for both Optical Flow & Wide Baseline Matching Tasks.
  • VGGT — Visual geometry grounded transformer
  • mvtracker — Multi-view tracking