mirror of
https://github.com/ljjTYJR/Wild-video-3d-reconstruction.git
synced 2026-05-06 08:08:43 +00:00
Run 3D reconstruction from uncalibrated unlimited videos
- Jupyter Notebook 72.8%
- Python 17%
- Cuda 5.6%
- C++ 4.4%
- C 0.2%
| calib | ||
| colmap_configs | ||
| data_config | ||
| DPRetrieval | ||
| dpvo | ||
| dpvo_configs | ||
| evaluation | ||
| formatter | ||
| media | ||
| nerf_train | ||
| src | ||
| test_scripts | ||
| thirdparty | ||
| .gitignore | ||
| .gitmodules | ||
| dpvo_demo.py | ||
| environment.yaml | ||
| README.md | ||
| setup.py | ||
3D reconstruction from in-the-wild videos
Overview
Table of Contents
Installation
Prerequisites
- CUDA-capable GPU (compute capability 6.0+)
- CUDA Toolkit 11.x or 12.x
- Conda package manager
Setup Steps
-
Create and activate the conda environment:
conda create -n dpvo python=3.10.14 conda activate dpvo -
Install PyTorch with CUDA support: (specify the version according to your CUDA installation)
# For CUDA 11.8 conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia # Or for CUDA 12.1 conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia -
Install other dependencies:
- Install COLMAP & GLOMAP: Refer to https://github.com/colmap/glomap
- Install
hloc(for loop closure detection):pip install git+https://github.com/cvg/Hierarchical-Localization.git@v1.4
-
Install the lietorch library:
pip install thirdparty/lietorch -
Build the SLAM extensions:
pip install -e . -
Mask and depth generation tools:
- We use
UniDepthto generate dense depth maps - We use
MaskRCNNto generate dynamic object masks
- We use
Usage
Running the Demo
Run the SLAM system on an image sequence:
conda activate dpvo
python dpvo_demo.py \
--imagedir=/path/to/images \
--depthdir=/path/to/depths \
--calib=calib/calibration.txt \
--stride=1 \
--skip=0 \
--buffer=2048 \
--export_colmap
Required Arguments:
--imagedir: Path to input image directory
Optional Arguments:
--depthdir: Path to depth maps (enables depth-aided tracking)--maskdir: Path to dynamic object masks (filters dynamic objects)--calib: Camera calibration file (format:fx fy cx cy [k1 k2 p1 p2 k3])--stride: Frame sampling stride (default: 1)--skip: Number of initial frames to skip (default: 0)--buffer: Maximum buffer size for keyframes (default: 2048)--export_colmap: Export results in COLMAP format for further processing--rerun: Enable Rerun visualization (https://rerun.io/)--loop_enabled: Enable loop closure detection
Data
Videos used in the paper:
| Name | Link | Description |
|---|---|---|
| Yanshan Park, China | Link | |
| Taicang Park, China | Link | |
| Helsingborg, Sweden | Link |
Extract frames from videos
-
Install the youtube video download tool: https://github.com/yt-dlp/yt-dlp
-
Use ffmpeg to extract frames. We extract with resolution with 512*384 and with 5 FPS with 15 minutes each video clip.