- Python 89.2%
- Cuda 9.3%
- C++ 1%
- Shell 0.5%
|
|
||
|---|---|---|
| calib | ||
| configs | ||
| cvd_opt | ||
| droid_slam | ||
| evaluation_scripts | ||
| gifs | ||
| src | ||
| submodules | ||
| thirdparty | ||
| vis | ||
| .gitignore | ||
| .gitmodules | ||
| demo_all.py | ||
| demo_multicam.py | ||
| demo_multicam_auto.py | ||
| preprocess_depth.py | ||
| README.md | ||
| requirements.txt | ||
| setup.py | ||
Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos
Shuo Sun1, Unal Artan1, Malcolm Mielle2, Achim J. Lilienthal1,3, Martin Magnusson1
1 Örebro University, Sweden |
2 Schindler Group |
3 Technical University of Munich, Germany
Given synchronized multi-view videos and camera intrinsics, our pipeline jointly reconstructs dense dynamic 3D scenes and estimates per-camera poses — combining multi-camera SLAM, monocular depth estimation, optical flow, and scale-consistent video depth optimization.
Overview
This repository implements a multi-stage pipeline for dense dynamic scene reconstruction and camera pose estimation from synchronized multi-view videos:
Reconstruction Examples
| Meeting Scene | RoboArm Scene |
|---|---|
![]() |
![]() |
| RoboDog Scene | Multiple Tracking (with mvtracker) |
|---|---|
![]() |
| Multi-Camera Scene | Human Scene 2 |
|---|---|
![]() |
![]() |
Table of Contents
TODO
- Dataset processing and uploading
Installation
# 1. Clone with submodules
git clone --recursive git@github.com:ljjTYJR/multiple-view-dynamic-reconstruction.git
cd multiple-view-dynamic-reconstruction
# 2. Create environment (uses uv — install via: pip install uv)
uv venv .venv --python=3.10
source .venv/bin/activate
# 3. Install PyTorch (CUDA 11.8)
uv pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118
# 4. Install dependencies
uv pip install -r requirements.txt
# 5. Install local packages
uv pip install --no-build-isolation thirdparty/lietorch
uv pip install --no-build-isolation thirdparty/pytorch_scatter
uv pip install --no-build-isolation .
uv pip install --no-build-isolation -e submodules/ufm/UniCeption/
uv pip install --no-build-isolation -e submodules/ufm/
Running
Full Pipeline
python demo_all.py \
--scene_dir /path/to/scene \
--calib calib/kinect_crop.txt \
--stride 1 \
--model_type megasam \
--prior_depth \
--depth_model_size l \
--flow_model_type ufm \
--flow_model infinity1096/UFM-Base
Key Arguments
| Argument | Default | Description |
|---|---|---|
--scene_dir |
required | Scene directory with cam01/, cam02/, ... subdirectories |
--calib |
required | Camera calibration file (fx fy cx cy) |
--stride |
1 |
Frame stride for SLAM |
--model_type |
megasam |
SLAM backend: megasam or droid |
--prior_depth |
off | Use monocular depth as SLAM prior |
--depth_model_size |
l |
UniDepth model size: s (faster) or l (better) |
--flow_model_type |
ufm |
Optical flow model: ufm or raft |
--filter_thresh |
2.4 |
SLAM filter threshold |
--keyframe_thresh |
4.0 |
SLAM keyframe selection threshold |
Optimization Weights
| Argument | Default | Description |
|---|---|---|
--w_ratio |
1.0 |
Scale ratio consistency weight |
--w_flow |
0.2 |
Optical flow consistency weight |
--w_si |
0.1 |
Scale-invariant depth weight |
--w_grad |
0.2 |
Depth gradient smoothness weight |
--w_normal |
1.0 |
Surface normal consistency weight |
Skip Stages
Use --skip_slam, --skip_depth, --skip_flow, or --skip_optimization to skip individual pipeline stages
when rerunning with cached intermediate results.
Dataset
We provide demo datasets on HuggingFace:
| Dataset | Description | Size |
|---|---|---|
| MultiCamRobolab | Multi-camera robotic lab scenes (RoboDog, RoboArm) | ~68 MB |
| MultiCamVideo-Dataset | Multi-camera video sequences | — |
Download
# Install huggingface_hub if needed
pip install huggingface_hub
# Download MultiCamRobolab
huggingface-cli download shuooru/MultiCamRobolab --repo-type dataset --local-dir ./data/MultiCamRobolab
Usage
Each scene directory should follow this structure:
scene/
├── cam01/ # images for camera 1 (e.g., 000000.jpg, 000001.jpg, ...)
├── cam02/ # images for camera 2
└── ...
Then run the pipeline with the provided calibration file:
python demo_all.py \
--scene_dir ./data/MultiCamRobolab/robotdog_01 \
--calib calib/kinect_crop.txt \
--model_type megasam \
--prior_depth
Visualization
Viser (interactive 3D viewer)
python viser_est.py \
--traj-ids 1 2 \
--max_frames 200 \
--data_path /path/to/scene \
--pose_dir /path/to/result
Rerun
python rerun_est.py \
--traj-ids 1 2 \
--max_frames 200 \
--data_path /path/to/scene \
--pose_dir /path/to/result
Citation
If you find this work useful, please cite:
@article{sun2026dense,
title = {Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos},
author = {Sun, Shuo and Artan, Unal and Mielle, Malcolm and Lilienthal, Achim J. and Magnusson, Martin},
journal = {arXiv preprint arXiv:2603.12064},
year = {2026}
}
Acknowledgements
Our work builds on the following open-source projects:




