|本期目录/Table of Contents|

[1]徐 虹,曾祥进*,华永斌.基于改进近端策略优化算法的在线三维装箱方法[J].武汉工程大学学报,2025,47(05):565-570.[doi:10.19843/j.cnki.CN42-1779/TQ.202410005]
 XU Hong,ZENG Xiangjin*,HUA Yongbin.An online 3D bin packing method using enhanced proximal policy optimization algorithm[J].Journal of Wuhan Institute of Technology,2025,47(05):565-570.[doi:10.19843/j.cnki.CN42-1779/TQ.202410005]
点击复制

基于改进近端策略优化算法的在线三维装箱方法
(/HTML)
分享到:

《武汉工程大学学报》[ISSN:1674-2869/CN:42-1779/TQ]

卷:
47
期数:
2025年05期
页码:
565-570
栏目:
智能制造
出版日期:
2025-10-31

文章信息/Info

Title:
An online 3D bin packing method using enhanced proximal policy optimization algorithm

文章编号:
1674 - 2869(2025)05 - 0565 - 06
作者:
1.武汉工程大学计算机科学与工程学院,湖北 武汉 430205;
2.智能机器人湖北省重点实验室(武汉工程大学),湖北 武汉 430205

Author(s):
1. School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China;
2. Hubei Provincial Key Laboratory of Intelligent Robotics(Wuhan Institute of Technology), Wuhan, 430205,China

关键词:
Keywords:
分类号:
TP39
DOI:
10.19843/j.cnki.CN42-1779/TQ.202410005
文献标志码:
A
摘要:
为解决现有三维装箱算法优化效率低的问题,本文提出了一种改进近端策略优化(PPO)算法的在线三维装箱方法。首先,基于现实装箱的边界约束、支撑约束、重力约束、碰撞约束等条件,在演员-评论家框架中添加可行性掩码预测网络,限制不可行装箱动作点的选取,以满足现实物流过程中的装箱需求。其次,使用长短期记忆网络替换PPO算法神经网络结构中的全连接层,专注学习高奖励值的样本,以便更快速地优化模型。最后,采用两个不同的数据集进行对比实验,其中数据集1采用随机生成的箱子序列,数据集2采用切割库存的箱子序列,保证实验的全面性。实验结果表明,基于改进的PPO算法缩短了强化学习应用于装箱过程中动作节点的盲目搜索时间。在数据集2中,单个箱子平均码放时间缩短了0.3 s,箱子数量增加了2.7个,空间利用率提升了2.2%。本文提出的优化算法能够有效提高三维装箱问题的空间利用率和降低装载时间,为三维装箱问题的工程化应用提供有效的解决方案和参考。
Abstract:
To address the low optimization efficiency of existing 3D bin packing algorithms, this study proposed an online packing method based on an enhanced proximal policy optimization (PPO) algorithm. First, based on practical packing constraints including boundary limitations, support requirements, gravity effects, and collision avoidance, a feasibility mask prediction network was incorporated into the Actor-Critic framework. This network restricted the selection of infeasible packing action nodes to meet real-world logistics demands. Second, the fully connected layers in the neural network architecture of the PPO algorithm were replaced with long short-term memory (LSTM) networks. This modification enabled focused learning from high-reward samples, thereby accelerating model optimization. Finally, comparative experiments were conducted using two distinct datasets (dataset 1: randomly generated box sequences, dataset 2: cutting-stock derived box sequences)to ensure the comprehensiveness of the experiment. Experimental results demonstrated that the enhanced PPO algorithm reduced blind search time for action nodes in reinforcement learning-based packing. In dataset 2, average placement time per box decreased by 0.3 s,packed box quantity increased by 2.7 units per container and space utilization rate improved by 2.2%. The proposed algorithm effectively enhances spatial efficiency while reducing operational time, providing a viable solution for problems in engineering applications of 3D bin packing .

参考文献/References:

[1] CRAINIC T G,PERBOLI G,TADEI R. Extreme point-based heuristics for three-dimensional bin packing[J]. INFORMS Journal on Computing,2008,20(3):368-384.
[2] 张德富,彭煜,朱文兴,等. 求解三维装箱问题的混合模拟退火算法[J]. 计算机学报,2009,32(11):2147-2156.
[3] VINOD CHANDRA S S,ANAND H S. Nature inspired meta heuristic algorithms for optimization problems[J]. Computing,2022,104(2):251-269.
[4] 李明阳,许可儿,宋志强,等. 多智能体强化学习算法研究综述[J]. 计算机科学与探索,2024,18(8):1979-1997.
[5] HU H Y,ZHANG X D,YAN X W,et al. Solving a new 3D bin packing problem with deep reinforcement learning method[Z/OL]. (2017-08-20)[2025-04-02]. https://doi.org/10.48550/arXiv.1078.05930.
[6] HU R Z,XU J Z,CHEN B,et al. TAP-Net: transport-and-pack using reinforcement learning[J]. ACM Transactions on Graphics,2020,39(6CD):232.
[7] ZHAO H,SHE Q J,ZHU C Y,et al. Online 3D bin packing with constrained deep reinforcement learning[C]//35th AAAI Conference on Artificial Intelligence. Menlo Park:Association for the Advancement of Artificial Intelligence,2021:741-749.
[8] VERMA R, SINGHAL A, KHADILKAR H,et al. A generalized reinforcement learning algorithm for online 3D bin-packing[Z/OL]. (2020-07-01)[2025-04-02]. https://doi.org/10.48550/arXiv.2007.00463.
[9] 张长勇,刘佳瑜,王艳芳. 求解货物在线装箱问题的融合算法[J]. 科学技术与工程,2021,21(11):4513-4518.
[10] YU C,VELU A,VINITSKY E,et al. The surprising effectiveness of PPO in cooperative multi-agent games[J]. Advances in Neural Information Processing Systems,2022,35:24611-24624.
[11] HUANG N C,HSIEH P C,HO K H,et al. PPO-Clip attains global optimality:towards deeper understandings of clipping[J]. Proceedings of the AAAI Conference on Artificial Intelligence,2024,38(11):12600-12607.
[12] 乐恒韬,赵康康,吴松林,等. 基于LSTM网络的机器人异空间手眼标定方法[J]. 武汉工程大学学报,2024,46(5):574-578.
[13] 高阳,陈世福,陆鑫. 强化学习研究综述[J]. 自动化学报,2004,30(1):86-100.
[14] GILMORE P C,GOMORY R E. A linear program-ming approach to the cutting stock problem:part II [J]. Operations?Research,1963,11(6):863-888.
[15] ZHAO H, ZHU C Y, XU X, et al. Learning practically feasible policies for online 3D bin packing[J]. Science China Information Sciences,2022,65(1):112105.

相似文献/References:

备注/Memo

备注/Memo:
收稿日期:2024-10-13
基金项目:国家自然科学基金(61502355)
作者简介:徐 虹,硕士研究生。Email:1677535561@qq.com
*通信作者:曾祥进,博士,副教授。Email:xjzeng21@163.com
引文格式:徐虹,曾祥进,华永斌. 基于改进近端策略优化算法的在线三维装箱方法[J]. 武汉工程大学学报,2025,47(5):565-570.

更新日期/Last Update: 2025-11-03