Research Article
BibTex RIS Cite

DDPG Algoritmasında Bulunan Ornstein–Uhlenbeck Gürültüsünün Standart Sapmasının Araştırılması

Year 2021, Volume: 9 Issue: 2, 200 - 210, 27.06.2021
https://doi.org/10.29109/gujsc.872646

Abstract

Pekiştirmeli öğrenme, birçok canlının yemek yeme ve yürüme gibi beceriler kazanmak için genellikle farkında olmadan kullandığı bir öğrenme yöntemidir. Bu öğrenme yönteminden ilham alan makine öğrenmesi araştırmacıları, değer öğrenme ve politika öğrenme olarak bu öğrenme yöntemini alt başlıklara indirgemişlerdir. Yapılan bu çalışmada politika öğrenme algoritmalarından biri olan derin deterministtik politika gradienti (deep deterministic policy gradiend-DDPG) yönteminin gürültü standart sapması RR robotunun ters kinematik çözümü için incelenmiştir. Yapılan bu inceleme için eylem yapay sinir ağının çıkışının maksimum değerine bağlı olarak 8 farklı fonksiyon belirlenmiştir. Oluşturulan yapay sinir ağları, bu fonksiyonlar kullanılarak her bir iterasyonda 200 adım olacak şekilde 1000 iterasyon eğitilmiştir. Eğitim sonrasında gruplar arası istatistiksel fark bakılmış ve en iyi üç grup arasında istatistiksel fark olmadığı saptanmıştır. Bu nedenle en iyi üç grup 2500 iterasyon ve 200 adım yeniden eğitilmiş ve eğitim sonrasında 100 farklı test senaryosu için test edilmiştir. Test işleminden sonra minimal hatalar ile RR robotunun ters kinematik denklemi yapay sinir ağları yardımı ile elde edilmiştir. Sonuçlar ışığında, gürültünün standart sapması seçiminin önemi ve hangi aralıkta seçilmesinin daha doğru olacağı bu alanda çalışacak olan araştırmacılar için sunulmuştur.

References

  • [1] Murphy, K.P. (2012). Machine learning: a probabilistic perspective. MIT press. https://doi.org/10.1109/pes.2005.1489456
  • [2] Geron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media.
  • [3] De La Bourdonnaye, F., Teuliere, C., Chateau, T. ve Triesch, J. (2019). Within Reach? Learning to touch objects without prior models. 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019, s. 93–8. https://doi.org/10.1109/DEVLRN.2019.8850702
  • [4] Ghouri, U.H., Zafar, M.U., Bari, S., Khan, H. ve Khan, M.U. (2019). Attitude Control of Quad-copter using Deterministic Policy Gradient Algorithms (DPGA). 2019 2nd International Conference on Communication, Computing and Digital Systems, C-CODE 2019, IEEE. s. 149–53. https://doi.org/10.1109/C-CODE.2019.8681003
  • [5] Wang, Y., Tong, J., Song, T.Y. ve Wan, Z.H. (2018). Unmanned surface vehicle course tracking control based on neural network and deep deterministic policy gradient algorithm. 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans, OCEANS - Kobe 2018, IEEE. s. 3–7. https://doi.org/10.1109/OCEANSKOBE.2018.8559329
  • [6] Tuyen, L.P. ve Chung, T.C. (2017). Controlling bicycle using deep deterministic policy gradient algorithm. 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence, URAI 2017, s. 413–7. https://doi.org/10.1109/URAI.2017.7992765
  • [7] Wang, W.Y., Ma, F. ve Liu, J. (2019). Course tracking control for smart ships based on a deep deterministic policy gradient-based algorithm. ICTIS 2019 - 5th International Conference on Transportation Information and Safety, IEEE. s. 1400–4. https://doi.org/10.1109/ICTIS.2019.8883840
  • [8] Shi, X., Guo, Z., Huang, J., Shen, Y. ve Xia, L. (2020). A Distributed Reward Algorithm for Inverse Kinematics of Arm Robot. Proceedings - 5th International Conference on Automation, Control and Robotics Engineering, CACRE 2020, s. 92–6. https://doi.org/10.1109/CACRE50138.2020.9230347
  • [9] Phaniteja, S., Dewangan, P., Guhan, P., Sarkar, A. ve Krishna, K.M. (2017). A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots. 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), s. 1818–23.
  • [10] Ayoubi, T. ve Bao, H. (2020). Persistence and extinction in stochastic delay Logistic equation by incorporating Ornstein-Uhlenbeck process. Applied Mathematics and Computation, Elsevier Inc. C.386, 125465. https://doi.org/10.1016/j.amc.2020.125465
  • [11] Arenas-López, J.P. ve Badaoui, M. (2020). The Ornstein-Uhlenbeck process for estimating wind power under a memoryless transformation. Energy, C.213. https://doi.org/10.1016/j.energy.2020.118842
  • [12] Millefiori, L.M., Braca, P. ve Willett, P. (2016). Consistent Estimation of Randomly Sampled Ornstein-Uhlenbeck Process Long-Run Mean for Long-Term Target State Prediction. IEEE Signal Processing Letters, IEEE. C.23, Sayı 11, 1562–6. https://doi.org/10.1109/LSP.2016.2605705
  • [13] Nauta, J., Khaluf, Y. ve Simoens, P. (2019). Using the Ornstein-Uhlenbeck process for random exploration. COMPLEXIS 2019 - Proceedings of the 4th International Conference on Complexity, Future Information Systems and Risk, s. 59–66. https://doi.org/10.5220/0007724500590066
  • [14] Wu, R., Zhou, C., Chao, F., Yang, L., Lin, C.M. ve Shang, C. (2020). Integration of an actor-critic model and generative adversarial networks for a Chinese calligraphy robot. Neurocomputing, Elsevier B.V. C.388, 12–23. https://doi.org/10.1016/j.neucom.2020.01.043
  • [15] Hou, Y., Liu, L., Wei, Q., Xu, X. ve Chen, C. (2017). A novel DDPG method with prioritized experience replay. 2017 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2017, s. 316–21. https://doi.org/10.1109/SMC.2017.8122622
  • [16] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y. vd. (2016). Continuous control with deep reinforcement learning. 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings,.

Investigation of the Standard Deviation of Ornstein - Uhlenbeck Noise in the DDPG Algorithm

Year 2021, Volume: 9 Issue: 2, 200 - 210, 27.06.2021
https://doi.org/10.29109/gujsc.872646

Abstract

Reinforcement learning is a learning method that many creatures often unwittingly use to gain abilities such as eating and walking. Inspired by this learning method, machine learning researchers have reduced this learning method to subheadings as value learning and policy learning. In this study, the noise standard deviation of the deep deterministic policy gradient (DDPG) method, which is one of the policy learning algorithms, was examined to solve inverse kinematics of a 2 degrees-of-freedom planar robot. For this examination, 8 different functions were determined depending on the maximum value of the output of the action artificial neural network. Created artificial neural networks were trained by using these functions in 1000 iterations with 200 steps in each iteration. After the training, the statistical difference between the groups was examined and it was found that there was no statistical difference between the three best groups. For this reason, the best three groups were retrained 2500 iterations and 200 steps and tested for 100 different test scenarios after the training. After testing, the inverse kinematic equation of the 2 degrees-of-freedom planar robot with minimal errors was obtained with the help of artificial neural networks. In the light of the results, the importance of the choice of the standard deviation of noise and the correct range of selection was presented for researchers who will work in this field.

References

  • [1] Murphy, K.P. (2012). Machine learning: a probabilistic perspective. MIT press. https://doi.org/10.1109/pes.2005.1489456
  • [2] Geron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O’Reilly Media.
  • [3] De La Bourdonnaye, F., Teuliere, C., Chateau, T. ve Triesch, J. (2019). Within Reach? Learning to touch objects without prior models. 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019, s. 93–8. https://doi.org/10.1109/DEVLRN.2019.8850702
  • [4] Ghouri, U.H., Zafar, M.U., Bari, S., Khan, H. ve Khan, M.U. (2019). Attitude Control of Quad-copter using Deterministic Policy Gradient Algorithms (DPGA). 2019 2nd International Conference on Communication, Computing and Digital Systems, C-CODE 2019, IEEE. s. 149–53. https://doi.org/10.1109/C-CODE.2019.8681003
  • [5] Wang, Y., Tong, J., Song, T.Y. ve Wan, Z.H. (2018). Unmanned surface vehicle course tracking control based on neural network and deep deterministic policy gradient algorithm. 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans, OCEANS - Kobe 2018, IEEE. s. 3–7. https://doi.org/10.1109/OCEANSKOBE.2018.8559329
  • [6] Tuyen, L.P. ve Chung, T.C. (2017). Controlling bicycle using deep deterministic policy gradient algorithm. 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence, URAI 2017, s. 413–7. https://doi.org/10.1109/URAI.2017.7992765
  • [7] Wang, W.Y., Ma, F. ve Liu, J. (2019). Course tracking control for smart ships based on a deep deterministic policy gradient-based algorithm. ICTIS 2019 - 5th International Conference on Transportation Information and Safety, IEEE. s. 1400–4. https://doi.org/10.1109/ICTIS.2019.8883840
  • [8] Shi, X., Guo, Z., Huang, J., Shen, Y. ve Xia, L. (2020). A Distributed Reward Algorithm for Inverse Kinematics of Arm Robot. Proceedings - 5th International Conference on Automation, Control and Robotics Engineering, CACRE 2020, s. 92–6. https://doi.org/10.1109/CACRE50138.2020.9230347
  • [9] Phaniteja, S., Dewangan, P., Guhan, P., Sarkar, A. ve Krishna, K.M. (2017). A deep reinforcement learning approach for dynamically stable inverse kinematics of humanoid robots. 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), s. 1818–23.
  • [10] Ayoubi, T. ve Bao, H. (2020). Persistence and extinction in stochastic delay Logistic equation by incorporating Ornstein-Uhlenbeck process. Applied Mathematics and Computation, Elsevier Inc. C.386, 125465. https://doi.org/10.1016/j.amc.2020.125465
  • [11] Arenas-López, J.P. ve Badaoui, M. (2020). The Ornstein-Uhlenbeck process for estimating wind power under a memoryless transformation. Energy, C.213. https://doi.org/10.1016/j.energy.2020.118842
  • [12] Millefiori, L.M., Braca, P. ve Willett, P. (2016). Consistent Estimation of Randomly Sampled Ornstein-Uhlenbeck Process Long-Run Mean for Long-Term Target State Prediction. IEEE Signal Processing Letters, IEEE. C.23, Sayı 11, 1562–6. https://doi.org/10.1109/LSP.2016.2605705
  • [13] Nauta, J., Khaluf, Y. ve Simoens, P. (2019). Using the Ornstein-Uhlenbeck process for random exploration. COMPLEXIS 2019 - Proceedings of the 4th International Conference on Complexity, Future Information Systems and Risk, s. 59–66. https://doi.org/10.5220/0007724500590066
  • [14] Wu, R., Zhou, C., Chao, F., Yang, L., Lin, C.M. ve Shang, C. (2020). Integration of an actor-critic model and generative adversarial networks for a Chinese calligraphy robot. Neurocomputing, Elsevier B.V. C.388, 12–23. https://doi.org/10.1016/j.neucom.2020.01.043
  • [15] Hou, Y., Liu, L., Wei, Q., Xu, X. ve Chen, C. (2017). A novel DDPG method with prioritized experience replay. 2017 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2017, s. 316–21. https://doi.org/10.1109/SMC.2017.8122622
  • [16] Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y. vd. (2016). Continuous control with deep reinforcement learning. 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings,.
There are 16 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Tasarım ve Teknoloji
Authors

Mustafa Can Bingol 0000-0001-5448-8281

Publication Date June 27, 2021
Submission Date February 1, 2021
Published in Issue Year 2021 Volume: 9 Issue: 2

Cite

APA Bingol, M. C. (2021). Investigation of the Standard Deviation of Ornstein - Uhlenbeck Noise in the DDPG Algorithm. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım Ve Teknoloji, 9(2), 200-210. https://doi.org/10.29109/gujsc.872646

Cited By


Evaluatıon of DDPG and PPO Algorıthms for Bıpedal Robot Control
Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi
https://doi.org/10.47495/okufbed.1031976

                                TRINDEX     16167        16166    21432    logo.png

      

    e-ISSN:2147-9526