Depth Anything强大的单目深度估计Python与C++模型部署

阅读量：

Depth Anything强大的单目深度估计Python与C++模型部署

引言

最近注意到一个名为Depth Anything的单目深度估计模型受到了广泛关注，并且随后利用闲暇时间下载了该代码库并进行了实际操作。论文链接：https://arxiv.org/pdf/2401.10891.pdf
代码链接：https://github.com/LiheYoung/Depth-Anything
项目主页: https://depth-anything.github.io/

本文仅记录使用官方代码进行演示，并借助onnxruntime进行推理。

🔥核心课程：计算机视觉与图像处理技术实现（涵盖毕业辅导、作业辅导与代码开发）
✍🏻专业背景：拥有机器学习与深度学习技术开发经验
🚀B站实战项目展示地址：B站官方账号
😄 感谢您的关注！如需帮助，请留下您的信息并备注需求
🤵♂代做服务联系：@个人主页

1. 使用官方代码跑demo

首先从GitHub把代码clone下来：

复制代码

    git clone https://github.com/LiheYoung/Depth-Anything

然后安装依赖库：

复制代码

    cd Depth-Anything
    pip install -r requirements.txt

由于项目中使用的依赖项相对较少，在环境搭建过程中极为简便。安装完毕后可以直接运行demo程序。当输入为图像时，即可运行以下代码段：

复制代码

    python run.py --encoder <vits | vitb | vitl> --img-path <img-directory | single-img | txt-file> --outdir <outdir>

比如：

复制代码

    python run.py --encoder vitb --img-path ../test.jpg --outdir output/

--img-path参数可以是单张图像的路径、存放多张图片的文件夹的路径、存放一系列图像路径的TXT文件的路径。目前官方发布了三个模型：depth_anything_vits14.pth,depth_anything_vitb14.pth,depth_anything_vitl14.pth，分别与参数里的vits, vitb,vitl对应。执行上面的命令后，会自动从Huggingface的网站上下载对应的模型。「不过需要注意的是，国内目前无法访问Huggingface」。怎么办呢？不用慌，我们可以使用Huggingface的镜像网站。首先在命令行执行下面的命令设置一下环境变量：

复制代码

    export HF_ENDPOINT=https://hf-mirror.com

通过运行程序\ run.py, 模型能够顺利执行并生成结果。程序将生成并保存预测结果至指定的输出目录\ --outdir, 该目录由命令行参数控制。接下来是利用来自\ nuScenes\ 数据集的图像进行模型推理的结果展示。

如果需要跑视频，那么可以用run_video.py脚本:

复制代码

    python run_video.py --encoder vitb --video-path assets/examples_video --outdir output/

Python Onnx模型部署

2.1 导出onnx模型

导出onnx模型的方法可以参考下面这个仓库：

https://github.com/fabio-sim/Depth-Anything-ONNX

把代码下载下来后export.py脚本即可导出onnx模型:

复制代码

    python export.py --model s # s对应vits模型

该ONNX模型通常会被保存至(weights/)目录下。在当前脚本中, 同时也会从Huggingface网站下载PyTorch模型, 因此建议更换至镜像站点使用。具体操作方法也很简单：只需将代码中的指定链接中的huggingface.co直接替换成hf-mirror.com即可。

复制代码

    depth_anything.to(device).load_state_dict(
      torch.hub.load_state_dict_from_url(
          f"https://hf-mirror.com/spaces/LiheYoung/Depth-Anything/resolve/main/checkpoints/depth_anything_vit{model}14.pth",
          map_location="cpu",
      ),
      strict=True,
      )

此外，在该脚本导出ONNX模型的过程中采用了动态参数配置的方式。若要避免这种动态参数配置的方式，则应移除dynamic_axes字段并采用静态轴配置的方式进行建模处理。导出后的ONNX文件可通过调用工具集中的simplify ONNX model.py -m onnx-simplifier -i path/to/output.onnx -o optimized_model.onnx --verbose命令实现进一步优化与简化

2.2 用onnxruntime部署onnx模型

采用深度学习框架中的Depth Anything模型进行部署的过程与使用传统方法相似。在完成模型加载后，在对输入图像进行预处理时需执行以下操作：首先，在进行预处理时需执行减去均值并归一化除以标准差的操作来规范图像数据；其余操作步骤与YOLOv8框架以及基于Transformer的目标检测模型RT-DETR相同。具体的预处理函数实现如下：

复制代码

    def preprocess(
    bgr_image,
    width,
    height,
    mean=[123.675, 116.28, 103.53],
    std=[58.395, 57.12, 57.375],
    ):
    image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)
    image = cv2.resize(image, (width, height)).astype(np.float32)
    image = (image - mean) / std
    image = np.transpose(image, (2, 0, 1)).astype(np.float32)
    input_tensor = np.expand_dims(image, axis=0)
    return input_tensor
    
    
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-18/LTbGB6v9m0hpnJPEgKAiSeUtNRCI.png)

输入数据准备好以后，就可以送入模型进行推理：

复制代码

    outputs = session.run(None, {session.get_inputs()[0].name: input_tensor})

得到模型推理结果后，只需要做一点简单的后处理操作就可以了:

复制代码

    depth = outputs[0][0]  
    depth = np.transpose(depth, [1, 2, 0]) #chw->hwc
    
    depth = cv2.normalize(depth,None, 0,255,cv2.NORM_MINMAX,cv2.CV_8UC1)
    colormap = cv2.applyColorMap(depth,cv2.COLORMAP_INFERNO)
    colormap = cv2.resize(colormap,(image_width,image_height))
    combined_results = cv2.hconcat([image, colormap])

在后处理过程中，首先执行OpenCV的normalize函数以将深度图像的像素值标准化至[0,255]范围，并随后应用applyColorMap函数对其进行伪彩色处理：

设置模型输入尺寸为 $518\times518$ ；将batch size设定为 $1$ ；位于Geforce RTX 3090显卡上的三个模型的耗时数据如下：

该深度估计模型基于VITS14架构，在FP32精度下运行时间为16毫秒；该深度估计模型基于VITB14架构，在FP32精度下运行时间为42毫秒；该深度估计模型基于VITL14架构，在FP32精度下运行时间为90毫秒

C++ Onnx模型部署

复制代码

    #include "dpt.h"
    
    Dpt::Dpt()
    {
    blob_pool_allocator.set_size_compare_ratio(0.f);
    workspace_pool_allocator.set_size_compare_ratio(0.f);
    }
    
    
    int Dpt::load(std::string param_path, std::string bin_path, int _target_size, 
    const float* _mean_vals, const float* _norm_vals, bool use_gpu)
    {
    dpt_.clear();
    blob_pool_allocator.clear();
    workspace_pool_allocator.clear();
    
    ncnn::set_cpu_powersave(2);
    ncnn::set_omp_num_threads(ncnn::get_big_cpu_count());
    
    dpt_.opt = ncnn::Option();
    
    #if NCNN_VULKAN
    dpt_.opt.use_vulkan_compute = use_gpu;
    #endif
    
    dpt_.opt.num_threads = ncnn::get_big_cpu_count();
    dpt_.opt.blob_allocator = &blob_pool_allocator;
    dpt_.opt.workspace_allocator = &workspace_pool_allocator;
    
       /* char parampath[256];
    char modelpath[256];
    sprintf(parampath, "dpt%s.param", modeltype);
    sprintf(modelpath, "dpt%s.bin", modeltype);*/
    
    dpt_.load_param(param_path.c_str());
    dpt_.load_model(bin_path.c_str());
    
    target_size_ = _target_size;
    mean_vals_[0] = _mean_vals[0];
    mean_vals_[1] = _mean_vals[1];
    mean_vals_[2] = _mean_vals[2];
    norm_vals_[0] = _norm_vals[0];
    norm_vals_[1] = _norm_vals[1];
    norm_vals_[2] = _norm_vals[2];
    
    color_map_ = cv::Mat(target_size_, target_size_, CV_8UC3);
    
    return 0;
    }
    
    int Dpt::detect(const cv::Mat& rgb, cv::Mat& depth_color)
    {
    
    int width = rgb.cols;
    int height = rgb.rows;
    
    // pad to multiple of 32
    int w = width;
    int h = height;
    float scale = 1.f;
    if (w > h)
    {
        scale = (float)target_size_ / w;
        w = target_size_;
        h = h * scale;
    }
    else
    {
        scale = (float)target_size_ / h;
        h = target_size_;
        w = w * scale;
    }
    
    ncnn::Mat in = ncnn::Mat::from_pixels_resize(rgb.data, ncnn::Mat::PIXEL_RGB, width, height, w, h);
    
    // pad to target_size rectangle
    int wpad = target_size_ - w;
    int hpad = target_size_ - h;
    ncnn::Mat in_pad;
    ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 0.f);
    
    in_pad.substract_mean_normalize(mean_vals_, norm_vals_);
    
    ncnn::Extractor ex = dpt_.create_extractor();
    
    ex.input("image", in_pad);
    
    ncnn::Mat out;
    ex.extract("depth", out);
    
    cv::Mat depth(out.h, out.w, CV_32FC1, (void*)out.data);
    cv::normalize(depth, depth, 0, 255, cv::NORM_MINMAX, CV_8UC1);
    cv::applyColorMap(depth, color_map_, cv::ColormapTypes::COLORMAP_INFERNO);
    cv::resize(color_map_(cv::Rect(wpad / 2, hpad / 2, w, h)), depth_color, rgb.size());
    
    return 0;
    }
    
    int Dpt::draw(cv::Mat& rgb, cv::Mat& depth_color)
    {
    cv::cvtColor(depth_color, rgb, cv::COLOR_RGB2BGR);
    
    return 0;
    }
    
    
    
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-08-18/tWhTBIY4XrevnjE0qwgpzxHQl8RN.png)

最终的结果如下：

源码下载地址：<>

全部评论 (0)

还没有任何评论哟~

Depth Anything强大的单目深度估计Python与C++模型部署

引言最近看到一个叫DepthAnything单目深度估计模型火了，于是抽空把代码和模型下载下来体验了一下，发现确实是很强大。论文链接：https://arxiv.org/pdf/2401.1089...

Depth Anything强大的单目深度估计Python与C++模型部署

DepthAnything强大的单目深度估计Python与C++模型部署引言最近看到一个叫DepthAnything单目深度估计模型火了，于是抽空把代码和模型下载下来体验了一下，发现确实是很强大。

Depth Anything——强大的单目深度估计模型

DepthAnything——强大的单目深度估计模型概述单目深度估计（MonocularDepthEstimation,MDE）是一项在计算机视觉领域中非常重要的技术，它旨在从单张图像中恢复出场景...

【计算机视觉】单目深度估计模型-Depth Anything-V2

概述本篇将简单介绍DepthAnythingV2单目深度估计模型，该模型旨在解决现有的深度估计模型在处理复杂场景、透明或反射物体时的性能限制。与前一代模型相比，V2版本通过采用合成图像训练、增加教师...

单目深度估计---- MiDaS DPT与Depth-Anything比较

DPT论文说法：DPT在性能上显著优于MiDaS。‌ DPT（‌DensePredictiveTransformers）‌与MiDaS都是深度学习模型，‌用于密集预测任务，‌如单目深度估计和语义分割。...

调用Depth-Anything进行深度估计（depth estimation）

fromtransformersimportAutoImageProcessor,AutoModelForDepthEstimation importtorch fromPILimportImage ...

超越Depth Anything V2！中科大新作DepthMaster：驯服单目深度估计！

0\.论文信息标题：DepthMaster:TamingDiffusionModelsforMonocularDepthEstimation 作者：ZiyangSong,ZerongWang,BoL...

基于depth anything模型理解深度估计运行机理

文章目录前言一、概念说明 1、深度概念 2、绝对深度概念 3、相对深度概念 4、深度估计表示方法二、相对深度估计与绝对（即度量）深度估计 1、模型预测绝对深度劣势与应用优势 2、模型预测相对深度...

港大&字节提出用于任意图像的深度估计大模型Depth Anything

DepthAnything:UnleashingthePowerofLargeScaleUnlabeledData 文章目录 DepthAnything:UnleashingthePowerofLar...

10倍加速！今年最火的深度估计模型升级：Depth Anything V2 来了！

编辑：计算机视觉工坊添加小助理：dddvision，备注：方向+学校/公司+昵称，拉你入群。文末附行业细分群扫描下方二维码，加入3D视觉知识星球，星球内凝聚了众多3D视觉实战问题，以及各个模块的学...

是否确定退出登录?

Depth Anything强大的单目深度估计Python与C++模型部署