南京大学学报(自然科学版) ›› 2022, Vol. 58 ›› Issue (3): 420–429.doi: 10.13232/j.cnki.jnju.2022.03.006

• • 上一篇    

深度强化学习结合图注意力模型求解TSP问题

王扬, 陈智斌( ), 杨笑笑, 吴兆蕊   

  1. 昆明理工大学理学院,昆明,650000
  • 收稿日期:2022-01-19 出版日期:2022-05-30 发布日期:2022-06-07
  • 通讯作者: 陈智斌 E-mail:chenzhibin311@126.com
  • 基金资助:
    国家自然科学基金(11761042)

Deep reinforcement learning combined with graph attention model to solve TSP

Yang Wang, Zhibin Chen(), Xiaoxiao Yang, Zhaorui Wu   

  1. Faculty of Science,Kunming University of Science and Technology,Kunming,650000,China
  • Received:2022-01-19 Online:2022-05-30 Published:2022-06-07
  • Contact: Zhibin Chen E-mail:chenzhibin311@126.com

摘要:

旅行商问题(Traveling Salesman Problem,TSP)是组合最优化问题(Combinatorial Optimization Problem,COP)中的经典问题,多年以来一直被反复研究.近年来深度强化学习(Deep Reinforcement Learning,DRL)在无人驾驶、工业自动化、游戏等领域的广泛应用,显示了强大的决策力和学习能力.结合DRL和图注意力模型,通过最小化路径长度求解TSP问题.改进REINFORCE算法,训练行为网络参数,可以有效地减小方差,防止局部最优;在编码结构中采用位置编码(Positional Encoding,PE),使多重的初始节点在嵌入的过程中满足平移不变性,可以增强模型的稳定性;进一步结合图神经网络(Graph Neural Network,GNN)和Transformer架构,首次将GNN聚合操作处理应用到Transformer的解码阶段,有效捕捉图上的拓扑结构及点与点之间的潜在关系.实验结果显示,模型在100?TSP问题上的优化效果超越了目前基于DRL的方法和部分传统算法.

关键词: 深度强化学习, 旅行商问题, 图注意力模型, 图神经网络, 组合最优化

Abstract:

Traveling Salesman Problem (TSP) is a classic problem in Combinatorial Optimization Problem (COP),which has been repeatedly studied for many years. In recent years,Deep Reinforcement Learning(DRL)has been widely applied in driverless,industrial automation,game and other fields,showing strong decision?making and learning ability. In this paper,DRL and graph attention model are combined to solve TSP by minimizing the path length. Specifically,the behavioral network parameters are trained by an improved REINFORCE algorithm to effectively reduce the variance and prevent local optima; Positional Encoding (PE) is used to the encoding structure to make the multiple node satisfy translation invariance during the embedding process and enhance the stability of the model. Further,we combine Graph Neural Network (GNN) and Transformer architecture,and apply GNN aggregate operation processing to transformer decoding stage for the first time,which effectively capture the topological structure of the graph and the potential relationships between points. The experimental results show that the optimization effect of the model on the 100?TSP problem surpasses the current DRL?based methods and some traditional algorithms.

Key words: Deep Reinforcement Learning (DRL), Travel Salesman Problem (TSP), graph attention model, Graph Neural Network (GNN), Combinatorial Optimization (CO)

中图分类号: 

  • O22

图1

本文提出的图注意力模型示意图"

图2

多重初始点示意图"

图3

四个节点的TSP问题解码示例图"

图4

TSP问题编码解码结构示意图"

表1

不同模型在TSP问题上的优化结果比较"

模型20⁃TSP50⁃TSP100⁃TSP
花费间隙时间花费间隙时间花费间隙时间
Concorde[9]3.830.00%5 min5.690.00%13 min7.760.00%1 h
Gurobi[8]3.830.00%7 s5.690.00%2 min7.760.00%17 min
OR⁃Tools3.860.94%1 min5.852.87%5 min8.063.86%23 min
LKH3[12]3.830.00%42 s5.690.00%6 min7.760.00%25 min
2⁃opt [1]3.953.13%1 s6.117.38%7 s8.509.53%33 s
Farthest Insertion[11]3.891.56%1 s5.974.92%2 s8.347.47%10 s
Nearest Neighbor[1]4.4816.9%1 s6.9421.9%3 s9.6824.7%7 s
Kool et al(Greedy)[19]3.850.34%≪1 s5.801.76%2 s8.124.53%6 s
Kool et al(Sampling)[19]3.840.08%5 min5.730.52%24 min7.942.26%1 h
Costa et al[17]3.830.00%15 min5.710.12%29 min7.830.87%41 min
Wu et al[22]3.830.00%1 h5.700.20%1.5 h7.871.42%2 h
Kwon et al(single trajec)[5]3.830.12%≪1 s5.741.03%3 s7.841.12%8 s
Kwon et al(no augment)[5]3.830.04%≪1 s5.710.35%10 s7.790.50%54 s
Kwon et al(8×augment)[5]3.830.00%16 s5.690.05%1 min7.770.14%7 min
Ours (single trajec)3.830.13%≪1 s5.730.70%5 s7.841.08%8 s
Ours (no augment)3.830.05%≪1 s5.700.28%10 s7.790.47%55 s
Ours (8×augment)3.830.00%17 s5.690.03%1 min7.770.12%7 min
Ours (4×augment)3.830.00%9 s5.690.01%42 s7.760.09%3 min

图5

经典模型最优间隙的对比图"

表2

POMO模型在TSP问题上的消融实验"

模型20⁃TSP50⁃TSP100⁃TSP
花费间隙时间花费间隙时间花费间隙时间
POMO (8×augment)[5]3.830.00%16 s5.690.05%1 min7.770.14%7 min
POMO (4×augment)[5]3.830.00%7 s5.690.03%40 s7.770.13%3 min

图6

本文模型与POMO模型在TSP50/100 (a,b) 上的训练损失对比图"

表3

本文模型对TSP问题的泛化能力比较"

模型20⁃TSP50⁃TSP100⁃TSP
花费间隙时间花费间隙时间花费间隙时间
Ours (TSP20)5.730.68%1 min8.053.73%5 min
Ours (TSP50)3.830.13%3 s7.841.03%4 min
Ours (TSP100)3.830.05%1 s5.710.33%30 s

表4

本文模型在训练和推理阶段的时间花费"

阶段TSP20TSP50TSP100
训练模型3 h24 h136 h
推理(single trajec)≪1 s5 s8 s
推理(no augment)≪1s10 s55 s
推理(8×augment)17 s1 min7 min
推理(4×augment)9 s42 s3 min
1 Cook W J, Cunningham W H, Pulleyblank W R,et al. Combinatorial optimization. New York,NY,USA:Wiley?Interscience,2010:11-22.
2 Papadimitriou C H. The Euclidean travelling salesman problem is NP?complete. Theoretical Computer Science19774(3):237-244.
3 林敏,刘必雄,林晓宇. 带Metropolis准则的混合离散布谷鸟算法求解旅行商问题. 南京大学学报(自然科学)201753(5):972-983.
Lin M, Liu B X, Lin X Y. Hybrid discrete cuckoo search algorithm with metropolis criterion for traveling salesman problem. Journal of Nanjing University (Natural Science)201753(5):972-983.
4 Bengio Y, Lodi A, Prouvost A. Machine learning for combinatorial optimization:A methodological tour d'horizon. European Journal of Operational Research2021290(2):405-421.
5 Kwon Y D, Choo J, Kim B,et al. POMO:Policy optimization with multiple optima for reinforcement learning. 2020,arXiv:.
6 Vaswani A, Shazeer N, Parmar N,et al. Attention is all you need∥Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY,USA:Curran Associates Inc.,2017,30:6000-6010.
7 Li Y X. Deep reinforcement learning:An overview. 2017,arXiv:.
8 Optimization I G. Gurobi optimizer reference manual. https:∥www.gurobi.com2015.
9 Applegate D L, Bixby D E, Chvatal V,et al. The traveling salesman problem:A computational study. Interfaces200838(4):344-345.
10 Christofides N. Worst?case analysis of a new heuristic for the travelling salesman problem. Pittsburgh,PA,USA:Carnegie?Mellon University,1976.
11 Johnson D S. Local optimization and the traveling salesman problem∥The 17th International Colloquium on Automata,Languages and Programming. Springer Berlin Heidelberg,1990:446-461.
12 Helsgaun K. An effective implementation of the Lin?Kernighan traveling salesman heuristic. European Journal of Operational Research2000126(1):106-130.
13 Vinyals M, Fortunato M, Jaitly N. Pointer networks∥Proceedings of the 29th International Conference on Neural Information Processing System. Cambridge,MA,USA:MIT Press,2015(28):2692-2700.
14 Bello I, Pham H, Le Q V,et al. Neural combinatorial optimization with reinforcement learning. 2017,arXiv:.
15 Deudon M, Cournut P, Lacoste A,et al. Learning heuristics for the TSP by policy gradient∥Proceedings of the 15th International Conference on the Integration of Constraint Programming,Artificial Intelligence and Operations Research. Springer Berlin Heidelberg,2018:170-181.
16 Nazari M, Oroojlooy A, Taká? M,et al. Reinforcement learning for solving the vehicle routing problem∥Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook,NY,USA:Curran Associates Inc.,2018(31):9861-9871.
17 Costa P R O D, Rhuggenaath J, Zhang Y Q,et al. Learning 2?opt heuristics for the traveling salesman problem via deep reinforcement learning∥Proceedings of the 12th Asian Conference on Machine Learning. Bangkok,Thailand:JMLR,2020:465-480.
18 Dai H J, Khalil E B, Zhang Y Y,et al. Learning combinatorial optimization algorithms over graphs∥Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY,USA:Curran Associates Inc.,2017,30:6351-6361.
19 Kool W, Van Hoof H, Attention Welling M.,learn to solve routing problems. 2019,arXiv:.
20 Chen X Y, Tian Y D. Learning to perform local rewriting for combinatorial optimization∥Proceedings of the 33rd Neural Information Processing Systems. Vancouver,Canada:NIPS,2019:6278-6289.
21 Li K W, Zhang T, Wang R. Deep reinforcement learning for multiobjective optimization. IEEE Transactions on Cybernetics202051(6):3103-3114.
22 Wu Y X, Song W, Cao Z G,et al. Learning improvement heuristics for solving routing problems. IEEE Transactions on Neural Networks and Learning Systems2021:1-13.
23 Xin L, Song W, Cao Z G,et al. Multi?decoder attention model with embedding glimpse for solving vehicle routing problems∥Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto,CA,USA:AAAI Press,2021:12042-12049.
24 Scarselli F, Gori M, Tsoi A C,et al. The graph neural network model. IEEE Transactions on Neural Networks200820(1):61-80.
25 Williams R J. Simple statistical gradient?following algorithms for connectionist reinforcement learning. Machine Learning19928(3):229-256.
26 Ho Y, Wookey S. The real?world?weight cross?entropy loss function:Modeling the costs of mislabeling. IEEE Access2019(8):4806-4813.
27 Ma Q, Ge S W, He D Y,et al. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. 2019,arXiv:.
[1] 徐樱笑, 资文杰, 宋洁琼, 邵瑞喆, 陈浩. 基于多站点、多时间注意力机制的电磁强度时空关联分析与可视化[J]. 南京大学学报(自然科学版), 2021, 57(5): 838-846.
[2] 刘玲珊, 熊轲, 张煜, 张锐晨, 樊平毅. 信息年龄受限下最小化无人机辅助无线供能网络的能耗:一种基于DQN的方法[J]. 南京大学学报(自然科学版), 2021, 57(5): 847-856.
[3]  林 敏*,刘必雄,林晓宇.  带Metropolis准则的混合离散布谷鸟算法求解旅行商问题[J]. 南京大学学报(自然科学版), 2017, 53(5): 972-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!

4617作文网卜易居测姓名算命测试打分起个平安的名字周公解梦白天做梦准吗算死命免费阅读全文正版梦幻文学名词解释梦见修坟墓是什么预兆周公解梦周公解梦梦见栽树周易中的八卦名称宇晨的起名寓意周公解梦2345生活哪个算命网站好周公解梦大全查询梦见房倒屋塌起名字那个晨好重庆鱼火锅怎样起名2020年出生的女孩起什么乳名免费算命网姓名配对廖姓男孩起名字周易生辰八字算命免费梦奖解码软件宝宝起名姓张云字派辅导培训学校起名字柯起名字的含义马子起名梦见鬼神周公解梦男宝宝起名字英文名起名周易分免费测试测算结果起名大全起个理发店的好名字起名大全店免费取名学周易去哪里找老师淀粉肠小王子日销售额涨超10倍罗斯否认插足凯特王妃婚姻让美丽中国“从细节出发”清明节放假3天调休1天男子给前妻转账 现任妻子起诉要回网友建议重庆地铁不准乘客携带菜筐月嫂回应掌掴婴儿是在赶虫子重庆警方辟谣“男子杀人焚尸”国产伟哥去年销售近13亿新的一天从800个哈欠开始男孩疑遭霸凌 家长讨说法被踢出群高中生被打伤下体休学 邯郸通报男子持台球杆殴打2名女店员被抓19岁小伙救下5人后溺亡 多方发声单亲妈妈陷入热恋 14岁儿子报警两大学生合买彩票中奖一人不认账德国打算提及普京时仅用姓名山西省委原副书记商黎光被逮捕武汉大学樱花即将进入盛花期今日春分张家界的山上“长”满了韩国人?特朗普谈“凯特王妃P图照”王树国3次鞠躬告别西交大师生白宫:哈马斯三号人物被杀代拍被何赛飞拿着魔杖追着打315晚会后胖东来又人满为患了房客欠租失踪 房东直发愁倪萍分享减重40斤方法“重生之我在北大当嫡校长”槽头肉企业被曝光前生意红火手机成瘾是影响睡眠质量重要因素考生莫言也上北大硕士复试名单了妈妈回应孩子在校撞护栏坠楼网友洛杉矶偶遇贾玲呼北高速交通事故已致14人死亡西双版纳热带植物园回应蜉蝣大爆发男孩8年未见母亲被告知被遗忘张立群任西安交通大学校长恒大被罚41.75亿到底怎么缴沈阳一轿车冲入人行道致3死2伤奥运男篮美国塞尔维亚同组周杰伦一审败诉网易国标起草人:淀粉肠是低配版火腿肠外国人感慨凌晨的中国很安全男子被流浪猫绊倒 投喂者赔24万杨倩无缘巴黎奥运男子被猫抓伤后确诊“猫抓病”春分“立蛋”成功率更高?记者:伊万改变了国足氛围奥巴马现身唐宁街 黑色着装引猜测

4617作文网 XML地图 TXT地图 虚拟主机 SEO 网站制作 网站优化