«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

HTML)

分享到：

《武汉工程大学学报》[ISSN:1674-2869/CN:42-1779/TQ]

卷:: 46
期数:: 2024年01期

页码:: 85-90

栏目:: 机电与信息工程

出版日期:: 2024-03-12

文章信息/Info

Title:: Monocular depth estimation based on adaptive fusion of
multi-scale depth maps

文章编号:: 1674 - 2869（2024）01 - 0085 - 06

作者:: 郑游; 王磊^*; 杨紫文; 武汉工程大学电气信息学院，湖北武汉 430205

Author(s):: ZHENG You; WANG Lei^*; YANG Ziwen; School of Electrical and Information Engineering，Wuhan Institute of Technology，Wuhan 430205，China

关键词:: 单目深度估计; 注意力机制; 多尺度特征融合网络; 多尺度深度自适应融合网络

Keywords:: monocular depth estimation; attention mechanism; multi-scale feature fusion network; multi-scale depth adaptive fusion network

分类号:: TP391.41

DOI:: 10.19843/j.cnki.CN42-1779/TQ.202306025

文献标志码:: A

摘要:: 深度估计网络通常具有较多的网络层数，图像特征在网络编码和解码过程中会丢失大量信息，因此预测的深度图缺乏对象结构细节且边缘轮廓不清晰。本文提出了一种基于多尺度深度图自适应融合的单目深度估计方法，可有效保留对象的细节和几何轮廓。首先，引入压缩与激励残差网络（SE-ResNet），利用注意力机制对不同通道的特征进行编码，从而保留远距离平面深度图的更多细节信息。然后，利用多尺度特征融合网络，融合不同尺度的特征图，得到具有丰富几何特征和语义信息的特征图。最后，利用多尺度自适应深度融合网络为不同尺度特征图生成的深度图添加可学习的权重参数，对不同尺度的深度图进行自适应融合，增加了预测深度图中的目标信息。本文方法在NYU Depth V2数据集上预测的深度图具有更高的准确度和丰富的物体信息，绝对相对误差为0.115，均方根误差为0.525，精确度最高达到99.3%。

Abstract:: Depth estimation networks usually have a large number of layers，which may lose substantial image information in the process of image feature encoding and decoding，so the predicted depth maps lack detailed object structures and the maps’ edges are not clear. In this paper，a monocular depth estimation method was proposed based on the adaptive fusion of multi-scale depth maps and it can better preserve the object details and geometric contours. First，the squeeze-and-excitation residual network was employed and the attention mechanisms were utilized to encode the feature maps from different channels，and more details of the long-distance depth maps can be reserved. Second，a multi-scale feature fusion network was adopted to fuse the feature maps of different scales，which produced feature maps with rich geometric and semantic information. Third，a multi-scale adaptive depth fusion network was used to add learnable weights to the depth maps generated by feature maps of different scales. Finally，the depth maps of different scales can be adaptively fused and the object information in the predicted depth maps increases. Experiments on NYU Depth V2 dataset demonstrate that the absolute relative error is 0.115，the root mean square error is 0.525，and the accuracy can reach 99.3%. The depth map predictions of the proposed method have higher accuracy and rich object information.

参考文献/References:

［1］王秋晨，帅惠，刘青山. 递归特征融合的单目深度累积估计［J］.计算机辅助设计与图形学学报，2022，34（10）：1533-1541.

［2］ EIGEN D，PUHRSCH C，FERGUS R. Depth map prediction from a single image using a multi-scale deep network［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems（NIPS）. Cambridge Ma：MIT Press，2014：2366-2374.

［3］ HE K M， ZHANG X Y， REN S Q ，et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence，2015，37（9）：1904-1916.

［4］ WU K W， ZHANG S R， XIE Z. Monocular depth prediction with residual DenseASPP network［J］. IEEE Access，2020，8：129899-129910.

［5］ WU J W，ZHOU W J，LUO T，et al. Multiscale multilevel context and multimodal fusion for RGB-D salient object detection［J］. Signal Process，2021，178：107766.

［6］ QI F，LIN C H，SHI G M，et al. A convolutional encoder-decoder network with skip connections for saliency prediction ［J］. IEEE Access，2019，7：60428-60438.

［7］ ZHAO S Y，ZHANG L，SHEN Y，et al. Super-resolution for monocular depth estimation with multi-scale sub-pixel convolutions and a smoothness constraint ［J］. IEEE Access，2019，7：16323-16335.

［8］ RONNEBERGER O，FISCHER P，BROX T. U-Net：convolutional networks for biomedical image segmentation［C］//Medical Image Computing and Computer-Assisted Intervention. Berlin：Springer，2015：234-241.

［9］ QIU C C，ZHANG S Y，WANG C，et al. Improving transfer learning and squeeze-and-excitation networks for small-scale fine-grained fish image classification［J］. IEEE Access，2018，6：78503-78512.

［10］ LI L F， FANG Y，WU J， et al. Encoder-decoder full residual deep networks for robust regression and spatiotemporal estimation［J］. IEEE Transactions on Neural Networks and Learning Systems，2021，32（9）：4217-4230.

［11］ PARK J S，JEONG Y，JOO K，et al. Adaptive cost volume fusion network for multi-modal depth estimation in changing environments［J］. IEEE Robotics and Automation Letters，2022，7（2）：5095-5102.

［12］ LOSHCHILOV I，HUTTER F. Decoupled weight decay regularization［OL］.（2019-01-04）［2023-06-29］. https：//arxiv.org/abs/1711.05101.

［13］ LOZA A，MIHAYLOVA L，BULL D，et al. Structural similarity-based object tracking in multimodality surveillance videos［J］. Machine Vision and Applications，2009，20（2）：71-83.

［14］ LIU F Y，SHEN C H，LIN G S，et al. Learning depth from single monocular images using deep convolutional neural fields［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence，2016，38（10）：2024-2039.

［15］ CAO Y Z H，WU Z F，SHEN C H. Estimating depth from monocular images as classification using deep fully convolutional residual networks［J］. IEEE Transactions on Circuits and Systems for Video Technology，2018，28（11）：3174-3182.

［16］ XU X F，CHEN Z，YIN F L. Monocular depth estimation with multi-scale feature fusion［J］. IEEE Signal Processing Letters，2021，28：678-682.

［17］ KIM D，LEE S，LEE J，et al. Leveraging contextual information for monocular depth estimation［J］. IEEE Access，2020，8：147808-147817.

［18］ FU H，GONG M M，WANG C H，et al. Deep ordinal regression network for monocular depth estimation［C］//Conference on Computer Vision and Pattern Recognition. New York：IEEE，2018：2002-2011.

［19］ YUAN W H，GU X D，DAI Z Z，et al. NeW CRFs：neural window fully-connected CRFs for monocular depth estimation［C］//Conference on Computer Vision and Pattern Recognition. Los Alamitos：IEEE，2022：3906-3915.

［20］ LEE J H，HAN M K，KO D W，et al. From big to small：multi-scale local planar guidance for monocular depth estimation［OL］.（2021-09-23）［2023-06-29］. https：//arxiv.org/abs/1907.10326v5

相似文献/References:

[1]王丽亚,刘昌辉*,蔡敦波,等.基于CNN-BiLSTM网络引入注意力模型的文本情感分析[J].武汉工程大学学报,2019,(04):386.[doi:10. 3969/j. issn. 1674?2869. 2019. 04. 016]
　WANG Liya,LIU Changhui*,CAI Dunbo,et al.Text Sentiment Analysis Based on CNN-BiLSTM Network and Attention Model[J].Journal of Wuhan Institute of Technology,2019,(01):386.[doi:10. 3969/j. issn. 1674?2869. 2019. 04. 016]
[2]方晓东,刘昌辉*,王丽亚,等.基于BERT的复合网络模型的中文文本分类[J].武汉工程大学学报,2020,42(06):688.[doi:10.19843/j.cnki.CN42-1779/TQ. 202002009]
　FANG Xiaodong,LIU Changhui*,WANG Liya,et al.Chinese Text Classification Based on BERT’s Composite Network Model[J].Journal of Wuhan Institute of Technology,2020,42(01):688.[doi:10.19843/j.cnki.CN42-1779/TQ. 202002009]

备注/Memo

备注/Memo:: 收稿日期：2023-06-29
基金项目：武汉工程大学校内科学研究基金（21QD23）
作者简介：郑游，硕士研究生。Email：898651804@qq.com
*通信作者：王磊，博士，讲师。Email：wanglei@wir.edu.com
引文格式：郑游，王磊，杨紫文. 基于多尺度深度图自适应融合的单目深度估计［J］. 武汉工程大学学报，2024，46（1）：85-90.

更新日期/Last Update: 2024-03-01

《武汉工程大学学报》[ISSN:1674-2869/CN:42-1779/TQ]

文章信息/Info

参考文献/References:

相似文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics