Research/multiple-view-dynamic-reconstruction

mirror of https://github.com/ljjTYJR/multiple-view-dynamic-reconstruction.git synced 2026-05-06 08:08:42 +00:00

estimate multiple camera poses and reconstruct the dynamic scene at the same time.

Python 89.2%
Cuda 9.3%
C++ 1%
Shell 0.5%

Find a file

Shuo 93e8d42cc6 Update: move Overview section before Reconstruction Examples Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-03-26 21:58:17 +01:00
calib	initial commit	2025-11-27 17:29:57 +01:00
configs	initial commit	2025-11-27 17:29:57 +01:00
cvd_opt	adaptively extend the video depth consistency optimization	2026-03-09 14:55:22 +01:00
droid_slam	fix: edges count.txt file appending error	2026-03-11 23:36:21 +01:00
evaluation_scripts	initial commit	2025-11-27 17:29:57 +01:00
gifs	Update: replace human scene with human2 gif in reconstruction examples	2026-03-26 21:55:52 +01:00
src	initial commit	2025-11-27 17:29:57 +01:00
submodules	update installation	2025-11-27 20:41:00 +01:00
thirdparty	initial commit	2025-11-27 17:29:57 +01:00
vis	Add: add visualization scripts	2026-03-14 23:10:31 +01:00
.gitignore	fix the bug of initialization	2026-01-21 22:03:35 +01:00
.gitmodules	update installation	2025-11-27 20:41:00 +01:00
demo_all.py	Add edge count tracking and adjust optimization parameters	2026-01-28 16:14:12 +01:00
demo_multicam.py	initial commit	2025-11-27 17:29:57 +01:00
demo_multicam_auto.py	Use ufm for wide-baseline optical flow estimation + depth optimization	2025-11-30 23:51:39 +01:00
preprocess_depth.py	add rescale of images as pre-processing	2025-12-09 18:21:23 +01:00
README.md	Update: move Overview section before Reconstruction Examples	2026-03-26 21:58:17 +01:00
requirements.txt	Use ufm for wide-baseline optical flow estimation + depth optimization	2025-11-30 23:51:39 +01:00
setup.py	update installation	2025-11-27 20:41:00 +01:00

README.md

Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos

Shuo Sun¹, Unal Artan¹, Malcolm Mielle², Achim J. Lilienthal^1,3, Martin Magnusson¹
¹ Örebro University, Sweden | ² Schindler Group | ³ Technical University of Munich, Germany

Given synchronized multi-view videos and camera intrinsics, our pipeline jointly reconstructs dense dynamic 3D scenes and estimates per-camera poses — combining multi-camera SLAM, monocular depth estimation, optical flow, and scale-consistent video depth optimization.

Overview

This repository implements a multi-stage pipeline for dense dynamic scene reconstruction and camera pose estimation from synchronized multi-view videos:

Reconstruction Examples

Meeting Scene	RoboArm Scene

RoboDog Scene	Multiple Tracking (with mvtracker)

Multi-Camera Scene	Human Scene 2

Overview
TODO
Installation
Running
Dataset
Visualization
Citation
Acknowledgements

TODO

Dataset processing and uploading

Installation

# 1. Clone with submodules
git clone --recursive git@github.com:ljjTYJR/multiple-view-dynamic-reconstruction.git
cd multiple-view-dynamic-reconstruction

# 2. Create environment (uses uv — install via: pip install uv)
uv venv .venv --python=3.10
source .venv/bin/activate

# 3. Install PyTorch (CUDA 11.8)
uv pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118

# 4. Install dependencies
uv pip install -r requirements.txt

# 5. Install local packages
uv pip install --no-build-isolation thirdparty/lietorch
uv pip install --no-build-isolation thirdparty/pytorch_scatter
uv pip install --no-build-isolation .
uv pip install --no-build-isolation -e submodules/ufm/UniCeption/
uv pip install --no-build-isolation -e submodules/ufm/

Running

Full Pipeline

python demo_all.py \
  --scene_dir /path/to/scene \
  --calib calib/kinect_crop.txt \
  --stride 1 \
  --model_type megasam \
  --prior_depth \
  --depth_model_size l \
  --flow_model_type ufm \
  --flow_model infinity1096/UFM-Base

Key Arguments

Argument	Default	Description
`--scene_dir`	required	Scene directory with `cam01/`, `cam02/`, ... subdirectories
`--calib`	required	Camera calibration file (`fx fy cx cy`)
`--stride`	`1`	Frame stride for SLAM
`--model_type`	`megasam`	SLAM backend: `megasam` or `droid`
`--prior_depth`	off	Use monocular depth as SLAM prior
`--depth_model_size`	`l`	UniDepth model size: `s` (faster) or `l` (better)
`--flow_model_type`	`ufm`	Optical flow model: `ufm` or `raft`
`--filter_thresh`	`2.4`	SLAM filter threshold
`--keyframe_thresh`	`4.0`	SLAM keyframe selection threshold

Optimization Weights

Argument	Default	Description
`--w_ratio`	`1.0`	Scale ratio consistency weight
`--w_flow`	`0.2`	Optical flow consistency weight
`--w_si`	`0.1`	Scale-invariant depth weight
`--w_grad`	`0.2`	Depth gradient smoothness weight
`--w_normal`	`1.0`	Surface normal consistency weight

Skip Stages

Use --skip_slam, --skip_depth, --skip_flow, or --skip_optimization to skip individual pipeline stages when rerunning with cached intermediate results.

Dataset

We provide demo datasets on HuggingFace:

Dataset	Description	Size
MultiCamRobolab	Multi-camera robotic lab scenes (RoboDog, RoboArm)	~68 MB
MultiCamVideo-Dataset	Multi-camera video sequences	—

Download

# Install huggingface_hub if needed
pip install huggingface_hub

# Download MultiCamRobolab
huggingface-cli download shuooru/MultiCamRobolab --repo-type dataset --local-dir ./data/MultiCamRobolab

Usage

Each scene directory should follow this structure:

scene/
├── cam01/          # images for camera 1 (e.g., 000000.jpg, 000001.jpg, ...)
├── cam02/          # images for camera 2
└── ...

Then run the pipeline with the provided calibration file:

python demo_all.py \
  --scene_dir ./data/MultiCamRobolab/robotdog_01 \
  --calib calib/kinect_crop.txt \
  --model_type megasam \
  --prior_depth

Visualization

Viser (interactive 3D viewer)

python viser_est.py \
  --traj-ids 1 2 \
  --max_frames 200 \
  --data_path /path/to/scene \
  --pose_dir /path/to/result

Rerun

python rerun_est.py \
  --traj-ids 1 2 \
  --max_frames 200 \
  --data_path /path/to/scene \
  --pose_dir /path/to/result

Citation

If you find this work useful, please cite:

@article{sun2026dense,
  title   = {Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos},
  author  = {Sun, Shuo and Artan, Unal and Mielle, Malcolm and Lilienthal, Achim J. and Magnusson, Martin},
  journal = {arXiv preprint arXiv:2603.12064},
  year    = {2026}
}

Acknowledgements

Our work builds on the following open-source projects:

MegaSAM — SLAM backend
DROID-SLAM — SLAM backbone
UniDepth — Monocular depth estimation
UFM — UFM: A Unified Dense Image Correspondence Estimator for both Optical Flow & Wide Baseline Matching Tasks.
VGGT — Visual geometry grounded transformer
mvtracker — Multi-view tracking