Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control

Mustafa Can Bıngol

doi:10.47495/okufbed.1031976

Araştırma Makalesi

İki Ayaklı Robot Kontrolü için DDPG ve PPO Algoritmalarının Değerlendirilmesi

Yıl 2022, Cilt: 5 Sayı: 2, 783 - 791, 18.07.2022

Mustafa Can Bıngol

https://doi.org/10.47495/okufbed.1031976

Cited By: 1

Öz

Bacaklı robotlar, zorlu arazilerde hareket edebilmeleri nedeniyle robotik alanında çalışılan popüler konulardan biridir. Bu çalışmada, ayaklı bir robot olan iki ayaklı bir robotun yürümesi amaçlanmıştır. Bu amaçla sistem incelenmiş ve bir yapay sinir ağı tasarlanmıştır. Daha sonra Derin Deterministik Politika Gradyanı (Deep Deterministic Policy Gradient - DDPG) ve Yakınsal Politika Optimizasyonu (Proximal Policy Optimization - PPO) algoritmaları kullanılarak sinir ağı eğitilmiştir. Eğitim sürecinden sonra PPO algoritması, DDPG algoritmasına göre daha iyi eğitim performansı oluşturulmuştur. Ayrıca, PPO algoritmasının en iyi gürültü standart sapması araştırılmıştır. Sonuçlar, en iyi sonuçların 0,50 kullanılarak elde edildiğini göstermiştir. Sistem, 0,50 gürültü standart sapmasına sahip PPO algoritmasını eğiten yapay sinir ağları kullanılarak test edilmiştir. Test sonucuna göre toplam ödül 274,334 olarak hesaplanmış ve amaçlanan yapı ile yürüme görevi gerçekleştirilmiştir. Sonuç olarak, mevcut çalışma, iki ayaklı bir robotun kontrol edilmesi ve PPO gürültü standart sapma seçiminin temelini oluşturmuştur.

Anahtar Kelimeler

İki Ayaklı Robot, Derin Deterministik Politika Gradyanı, Yakınsal Politika Öğrenme, Pekiştirmeli Öğrenme

Kaynakça

Bingol M. C. a. Development of Neural Network Based on Deep Reinforcement Learning to Compensate for Damaged Actuator of a Planar Robot. Global Conference on Engineering Research (GLOBCER’21) June 2021; 310–317.
Bingol, M. C. b. Investigation of the Standard Deviation of Ornstein - Uhlenbeck Noise in the DDPG Algorithm. Gazi University Journal of Science Part C: Design and Technology 2021; 9(2): 200–210.
Dong, Y., Zhang, S., Liu, X., Zhang, Y., & Shen, T. Variance aware reward smoothing for deep reinforcement learning. Neurocomputing 2021; 458, 327–335.
Ghouri, U. H., Zafar, M. U., Bari, S., Khan, H., Khan, M. U. Attitude Control of Quad-copter using Deterministic Policy Gradient Algorithms (DPGA). 2019 2nd International Conference on Communication, Computing and Digital Systems C-CODE 2019; 149–153.
He, K., Zhang, X., Ren, S., & Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision 2015; 1026-1034.
https://github.com/openai/gym/blob/b6b4fc38388c42c76516c93fd107dab124af64dd/gym/envs/box2d/bipedal_walker.py (Accessed October 14, 2021) Bipedal-Walker
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Wierstra, D. Continuous control with deep reinforcement learning. 4th International Conference on Learning Representations, ICLR 2016.
Lopes, G. C., Ferreira, M., Da Silva Simoes, A., Colombini, E. L. Intelligent control of a quadrotor with proximal policy optimization reinforcement learning. 15th Latin American Robotics Symposium, 6th Brazilian Robotics Symposium and 9th Workshop on Robotics in Education, 2018; 509–514.
Pang, H., Gao, W. Deep Deterministic Policy Gradient for Traffic Signal Control of Single Intersection. 31st Chinese Control and Decision Conference 2019; 5861–5866.
Schmitz, A., Berthet-Rayne, P. Using Deep-Learning Proximal Policy Optimization to Solve the Inverse Kinematics of Endoscopic Instruments. IEEE Transactions on Medical Robotics and Bionics 2020; 3(1): 273–276
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. Proximal Policy Optimization Algorithms. 2017.
Toan, N. D., Woo, K. G. Mapless navigation with deep reinforcement learning based on the convolutional proximal policy optimization network. 2021 IEEE International Conference on Big Data and Smart Computing 2021; 298–301.
Wang, Y., Tong, J., Song, T. Y., Wan, Z. H. Unmanned surface vehicle course tracking control based on neural network and deep deterministic policy gradient algorithm. 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans OCEANS - 2018; 3–7.
Ying, C. S., Chow, A. H. F., Wang, Y. H., Chin, K. S. Adaptive Metro Service Schedule and Train Composition with a Proximal Policy Optimization Approach Based on Deep Reinforcement Learning. IEEE Transactions on Intelligent Transportation Systems, 2021; 1–12.
Zhang, S., Pang, Y., Hu, G. Trajectory-Tracking Control of Robotic System via Proximal Policy Optimization. 9th International Conference on Cybernetics and Intelligent Systems and Robotics, Automation and Mechatronics 2019; 380–385.
Zhen, Y., Hao, M., & Sun, W. Deep reinforcement learning attitude control of fixed-wing UAVs. 3rd International Conference on Unmanned Systems 2020; 239–244.

Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control

Yıl 2022, Cilt: 5 Sayı: 2, 783 - 791, 18.07.2022

Mustafa Can Bıngol

https://doi.org/10.47495/okufbed.1031976

Cited By: 1

Öz

Legged robots are very popular topics in the robotic field owing to walking on hard terrain. In the current study, the walking of a bipedal robot that is legged robot was aimed. For this purpose, the system was examined and an artificial neural network was designed. After, the neural network was trained by using the Deep Deterministic Policy Gradient (DDPG) and the Proximal Policy Optimization (PPO) algorithms. After the training process, the PPO algorithm was formed better training performance than the DDPG algorithm. Also, the optimal noise standard deviation of the PPO algorithm was investigated. The results were shown that the best results were obtained by using 0.50. The system was tested by utilizing the artificial neural networks that trained the PPO algorithm which has got 0.50 noise standard deviation. According to the test result, the total reward was calculated as 274.334 and the walking task was achieved by purposed structure. As a result, the current study has formed the basis for controlling a bipedal robot and the PPO noise standard deviation selection.

Anahtar Kelimeler

Bipedal Robot, Deep Deterministic Policy Gradient (DDPG), Proximal Policy Learning (PPO), Reinforcement Learning (RL)

Kaynakça

Bingol M. C. a. Development of Neural Network Based on Deep Reinforcement Learning to Compensate for Damaged Actuator of a Planar Robot. Global Conference on Engineering Research (GLOBCER’21) June 2021; 310–317.
Bingol, M. C. b. Investigation of the Standard Deviation of Ornstein - Uhlenbeck Noise in the DDPG Algorithm. Gazi University Journal of Science Part C: Design and Technology 2021; 9(2): 200–210.
Dong, Y., Zhang, S., Liu, X., Zhang, Y., & Shen, T. Variance aware reward smoothing for deep reinforcement learning. Neurocomputing 2021; 458, 327–335.
Ghouri, U. H., Zafar, M. U., Bari, S., Khan, H., Khan, M. U. Attitude Control of Quad-copter using Deterministic Policy Gradient Algorithms (DPGA). 2019 2nd International Conference on Communication, Computing and Digital Systems C-CODE 2019; 149–153.
He, K., Zhang, X., Ren, S., & Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision 2015; 1026-1034.
https://github.com/openai/gym/blob/b6b4fc38388c42c76516c93fd107dab124af64dd/gym/envs/box2d/bipedal_walker.py (Accessed October 14, 2021) Bipedal-Walker
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Wierstra, D. Continuous control with deep reinforcement learning. 4th International Conference on Learning Representations, ICLR 2016.
Lopes, G. C., Ferreira, M., Da Silva Simoes, A., Colombini, E. L. Intelligent control of a quadrotor with proximal policy optimization reinforcement learning. 15th Latin American Robotics Symposium, 6th Brazilian Robotics Symposium and 9th Workshop on Robotics in Education, 2018; 509–514.
Pang, H., Gao, W. Deep Deterministic Policy Gradient for Traffic Signal Control of Single Intersection. 31st Chinese Control and Decision Conference 2019; 5861–5866.
Schmitz, A., Berthet-Rayne, P. Using Deep-Learning Proximal Policy Optimization to Solve the Inverse Kinematics of Endoscopic Instruments. IEEE Transactions on Medical Robotics and Bionics 2020; 3(1): 273–276
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. Proximal Policy Optimization Algorithms. 2017.
Toan, N. D., Woo, K. G. Mapless navigation with deep reinforcement learning based on the convolutional proximal policy optimization network. 2021 IEEE International Conference on Big Data and Smart Computing 2021; 298–301.
Wang, Y., Tong, J., Song, T. Y., Wan, Z. H. Unmanned surface vehicle course tracking control based on neural network and deep deterministic policy gradient algorithm. 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans OCEANS - 2018; 3–7.
Ying, C. S., Chow, A. H. F., Wang, Y. H., Chin, K. S. Adaptive Metro Service Schedule and Train Composition with a Proximal Policy Optimization Approach Based on Deep Reinforcement Learning. IEEE Transactions on Intelligent Transportation Systems, 2021; 1–12.
Zhang, S., Pang, Y., Hu, G. Trajectory-Tracking Control of Robotic System via Proximal Policy Optimization. 9th International Conference on Cybernetics and Intelligent Systems and Robotics, Automation and Mechatronics 2019; 380–385.
Zhen, Y., Hao, M., & Sun, W. Deep reinforcement learning attitude control of fixed-wing UAVs. 3rd International Conference on Unmanned Systems 2020; 239–244.

Toplam 16 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgisayar Yazılımı
Bölüm	Araştırma Makaleleri (RESEARCH ARTICLES)
Yazarlar	Mustafa Can Bıngol 0000-0001-5448-8281
Yayımlanma Tarihi	18 Temmuz 2022
Gönderilme Tarihi	3 Aralık 2021
Kabul Tarihi	2 Mart 2022
Yayımlandığı Sayı	Yıl 2022 Cilt: 5 Sayı: 2

Kaynak Göster

APA	Bıngol, M. C. (2022). Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 5(2), 783-791. https://doi.org/10.47495/okufbed.1031976
AMA	Bıngol MC. Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control. Osmaniye Korkut Ata University Journal of The Institute of Science and Techno. Temmuz 2022;5(2):783-791. doi:10.47495/okufbed.1031976
Chicago	Bıngol, Mustafa Can. “Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control”. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 5, sy. 2 (Temmuz 2022): 783-91. https://doi.org/10.47495/okufbed.1031976.
EndNote	Bıngol MC (01 Temmuz 2022) Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 5 2 783–791.
IEEE	M. C. Bıngol, “Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control”, Osmaniye Korkut Ata University Journal of The Institute of Science and Techno, c. 5, sy. 2, ss. 783–791, 2022, doi: 10.47495/okufbed.1031976.
ISNAD	Bıngol, Mustafa Can. “Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control”. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 5/2 (Temmuz 2022), 783-791. https://doi.org/10.47495/okufbed.1031976.
JAMA	Bıngol MC. Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control. Osmaniye Korkut Ata University Journal of The Institute of Science and Techno. 2022;5:783–791.
MLA	Bıngol, Mustafa Can. “Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control”. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi, c. 5, sy. 2, 2022, ss. 783-91, doi:10.47495/okufbed.1031976.
Vancouver	Bıngol MC. Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control. Osmaniye Korkut Ata University Journal of The Institute of Science and Techno. 2022;5(2):783-91.