Research Article
BibTex RIS Cite

Perturbation Augmentation for Adversarial Training with Diverse Attacks

Year 2024, Volume: 11 Issue: 2, 274 - 288, 29.06.2024
https://doi.org/10.54287/gujsa.1458880

Abstract

Adversarial Training (AT) aims to alleviate the vulnerability of deep neural networks to adversarial perturbations. However, the AT techniques struggle to maintain the performance on natural samples while improving the deep model’s robustness. The absence of perturbation diversity in generated during the adversarial training degrades the generalizability of the robust models, causing overfitting to particular perturbations and a decrease in natural performance. This study proposes an adversarial training framework that augments adversarial directions from a single-step attack to address the trade-off between robustness and generalization. Inspired by feature scattering adversarial training, the proposed framework computes a principal adversarial direction with a single-step attack that finds a perturbation disrupting the inter-sample relationships in the mini-batch during adversarial training. The principal direction obtained at each iteration is augmented by sampling new adversarial directions within a region spanning 45 degrees from the principal adversarial direction. The proposed adversarial training approach does not require extra backpropagation steps in adversarial direction augmentation. Therefore, generalization of the robust model is improved without posing an additional burden on the feature scattering adversarial training. Experiments on CIFAR-10, CIFAR-100, SVHN, Tiny-ImageNet, and The German Traffic Sign Recognition Benchmark consistently improve the accuracy on adversarial with an almost pristine natural performance.

References

  • Alzantot, M., Sharma, Y., Elgohary, A., Ho, B., Srivastava, M., & Chang, K. (2018). Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (pp. 2890–2896).
  • Andriushchenko, M., & Flammarion, N. (2020). Understanding and improving fast adversarial training. In: Proceedings of Advances in Neural Information Processing Systems, 33, (pp. 16048-16059).
  • Athalye, A., Carlini, N., & Wagner, D. (2018, July). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: International Conference on Machine Learning (pp. 274-283).
  • Baytaş, İ. M., & Deb, D. (2023). Robustness-via-synthesis: Robust training with generative adversarial perturbations. Neurocomputing, 516, 49-60. https://doi.org/10.1016/j.neucom.2022.10.034
  • Carlini, N., Mishra, P., Vaidya, T., Zhang, Y., Sherr, M., Shields, C., ... & Zhou, W. (2016). Hidden voice commands. In: 25th USENIX security symposium (USENIX security 16), (pp. 513-530).
  • Carlini, N., & Wagner, D. (2017, May). Towards evaluating the robustness of neural networks. In: Proceedings of the IEEE Symposium on Security and Privacy. (pp. 39-57).
  • Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26.
  • Etmann, C., Lunz, S., Maass, P., & Schönlieb, C. B. (2019). On the connection between adversarial robustness and saliency map interpretability. In: Proceedings of the 36th International Conference on Machine Learning, 97, (pp. 1823-1832).
  • Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., & Kohane, I. S. (2019). Adversarial attacks on medical machine learning. Science, 363(6433), 1287-1289. https://doi.org/10.1126%2Fscience.aaw4399
  • Fursov, I., Morozov, M., Kaploukhaya, N., Kovtun, E., Rivera-Castro, R., Gusev, G., ... & Burnaev, E. (2021). Adversarial attacks on deep models for financial transaction records. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (pp. 2868-2878).
  • Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. In: Proceedings of the 3th International Conference on Learning Representations. https://arxiv.org/abs/1412.6572
  • Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., & Igel, C. (2013, August). Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In: Proceedings of The 2013 International Joint Conference on Neural Networks. (pp. 1-8).
  • Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial examples are not bugs, they are features. In: Proceedings of Advances in Neural Information Processing Systems, 32.
  • Jang, Y., Zhao, T., Hong, S., & Lee, H. (2019). Adversarial defense via learning to generate diverse attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 2740-2749).
  • Kim, H., Lee, W., & Lee, J. (2021). Understanding catastrophic overfitting in single-step adversarial training. In: Proceedings of the AAAI Conference on Artificial Intelligence, (pp. 8119-8127).
  • Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. University of Toronto.
  • Kurakin, A., Goodfellow, I. J. & Bengio, S. (2017). Adversarial machine learning at scale. In: Proceedings of the 5th International Conference on Learning Representations. https://arxiv.org/abs/1611.01236
  • Le, Y., & Yang, X. (2015). Tiny imagenet visual recognition challenge. CS 231N, 7(7), 3.
  • Lee, S., Lee, H., & Yoon, S. (2020). Adversarial vertex mixup: Toward better adversarially robust generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 272-281).
  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In: Proceedings of the International Conference on Learning Representations. https://arxiv.org/abs/1706.06083
  • Moosavi-Dezfooli, S. M., Fawzi, A., & Frossard, P. (2016). Deepfool: A Simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2574-2582).
  • Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. (pp. 506-519).
  • Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., & Madry, A. (2018). Adversarially robust generalization requires more data. In: Proceedings of Advances in Neural Information Processing Systems, (pp. 5019-5031).
  • Shafahi, A., Najibi, M., Ghiasi, M. A., Xu, Z., Dickerson, J., Studer, C., ... & Goldstein, T. (2019). Adversarial training for free!. In: Proceedings of Advances in Neural Information Processing Systems, 32.
  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In: Proceedings of International Conference on Learning Representations. http://arxiv.org/abs/1312.6199
  • Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I. J., Boneh, D. & McDaniel, P. D. (2018). Ensemble adversarial training: Attacks and defenses. In: Proceedings of the 6th International Conference on Learning Representations. https://arxiv.org/abs/1705.07204
  • Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., & Bengio, Y. (2019, May). Manifold mixup: Better representations by interpolating hidden states. In: International Conference on Machine Learning, (pp. 6438-6447).
  • Villani, C. (2009). Optimal transport: old and new (Vol. 338, p. 23). Berlin: Springer.
  • Wang, J., & Zhang, H. (2019). Bilateral adversarial training: Towards fast training of more robust models against adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 6629-6638).
  • Wang, K., Li, F., Chen, C. M., Hassan, M. M., Long, J., & Kumar, N. (2021). Interpreting adversarial examples and robustness for deep learning-based auto-driving systems. IEEE Transactions on Intelligent Transportation Systems, 23(7), 9755-9764. https://doi.org/10.1109/TITS.2021.3108520
  • Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., & Gu, Q. (2019, September). Improving adversarial robustness requires revisiting misclassified examples. In: Proceedings of International Conference on Learning Representations. https://openreview.net/forum?id=rklOg6EFwS
  • Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., & Yan, S. (2023). Better diffusion models further improve adversarial training. In: Proceedings of the 40th International Conference on Machine Learning. (202:36246-36263) https://proceedings.mlr.press/v202/wang23ad.html
  • Wong, E., & Kolter, Z. (2018, July). Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Proceeding of International Conference on Machine Learning, (pp. 5286-5295).
  • Wong, E., Rice, L., Kolter, J. Z. (2020). Fast is better than free: Revisiting adversarial training. In: Proceedings of the 8th International Conference on Learning Representations. https://arxiv.org/abs/2001.03994
  • Xie, Y., Wang, X., Wang, R., & Zha, H. (2020, August). A fast proximal point method for computing exact Wasserstein distance. In: Proceedings of Uncertainty in Artificial Intelligence (pp. 433-453).
  • Yuval, N., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
  • Zagoruyko, S., & Komodakis, N. (2016) Wide residual networks. In: Proceedings of the British Machine Vision Conference. (pp. 1-12).
  • Zhang, D., Zhang, T., Lu, Y., Zhu, Z., & Dong, B. (2019a). You only propagate once: Accelerating adversarial training via maximal principle. In: Proceedings of Advances in Neural Information Processing Systems, 32.
  • Zhang, H., Cisse, M., Dauphin, Y. N., Lopez-Paz, D. (2018). Mixup: Beyond empirical risk minimization. In: Proceedings of the 6th International Conference on Learning Representations. https://arxiv.org/abs/1710.09412
  • Zhang, H., & Xu, W. (2020). Adversarial interpolation training: A simple approach for improving model robustness. https://openreview.net/forum
  • Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019b). Theoretically principled trade-off between robustness and accuracy. In: Proceedings of the International Conference on Machine Learning, (pp. 7472-7482).
  • Zhang, H., & Wang, J. (2019). Defense against adversarial attacks using feature scattering-based adversarial training. In: Proceedings of the Advances in Neural Information Processing Systems, 32.
  • Zhang, H. (2019). Feature Scattering Adversarial Training (NeurIPS 2019) (Accessed: 24/03/2024) https://github.com/Haichao-Zhang/FeatureScatter
Year 2024, Volume: 11 Issue: 2, 274 - 288, 29.06.2024
https://doi.org/10.54287/gujsa.1458880

Abstract

References

  • Alzantot, M., Sharma, Y., Elgohary, A., Ho, B., Srivastava, M., & Chang, K. (2018). Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (pp. 2890–2896).
  • Andriushchenko, M., & Flammarion, N. (2020). Understanding and improving fast adversarial training. In: Proceedings of Advances in Neural Information Processing Systems, 33, (pp. 16048-16059).
  • Athalye, A., Carlini, N., & Wagner, D. (2018, July). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In: International Conference on Machine Learning (pp. 274-283).
  • Baytaş, İ. M., & Deb, D. (2023). Robustness-via-synthesis: Robust training with generative adversarial perturbations. Neurocomputing, 516, 49-60. https://doi.org/10.1016/j.neucom.2022.10.034
  • Carlini, N., Mishra, P., Vaidya, T., Zhang, Y., Sherr, M., Shields, C., ... & Zhou, W. (2016). Hidden voice commands. In: 25th USENIX security symposium (USENIX security 16), (pp. 513-530).
  • Carlini, N., & Wagner, D. (2017, May). Towards evaluating the robustness of neural networks. In: Proceedings of the IEEE Symposium on Security and Privacy. (pp. 39-57).
  • Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26.
  • Etmann, C., Lunz, S., Maass, P., & Schönlieb, C. B. (2019). On the connection between adversarial robustness and saliency map interpretability. In: Proceedings of the 36th International Conference on Machine Learning, 97, (pp. 1823-1832).
  • Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., & Kohane, I. S. (2019). Adversarial attacks on medical machine learning. Science, 363(6433), 1287-1289. https://doi.org/10.1126%2Fscience.aaw4399
  • Fursov, I., Morozov, M., Kaploukhaya, N., Kovtun, E., Rivera-Castro, R., Gusev, G., ... & Burnaev, E. (2021). Adversarial attacks on deep models for financial transaction records. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (pp. 2868-2878).
  • Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. In: Proceedings of the 3th International Conference on Learning Representations. https://arxiv.org/abs/1412.6572
  • Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., & Igel, C. (2013, August). Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In: Proceedings of The 2013 International Joint Conference on Neural Networks. (pp. 1-8).
  • Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial examples are not bugs, they are features. In: Proceedings of Advances in Neural Information Processing Systems, 32.
  • Jang, Y., Zhao, T., Hong, S., & Lee, H. (2019). Adversarial defense via learning to generate diverse attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 2740-2749).
  • Kim, H., Lee, W., & Lee, J. (2021). Understanding catastrophic overfitting in single-step adversarial training. In: Proceedings of the AAAI Conference on Artificial Intelligence, (pp. 8119-8127).
  • Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. University of Toronto.
  • Kurakin, A., Goodfellow, I. J. & Bengio, S. (2017). Adversarial machine learning at scale. In: Proceedings of the 5th International Conference on Learning Representations. https://arxiv.org/abs/1611.01236
  • Le, Y., & Yang, X. (2015). Tiny imagenet visual recognition challenge. CS 231N, 7(7), 3.
  • Lee, S., Lee, H., & Yoon, S. (2020). Adversarial vertex mixup: Toward better adversarially robust generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 272-281).
  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In: Proceedings of the International Conference on Learning Representations. https://arxiv.org/abs/1706.06083
  • Moosavi-Dezfooli, S. M., Fawzi, A., & Frossard, P. (2016). Deepfool: A Simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2574-2582).
  • Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. (pp. 506-519).
  • Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., & Madry, A. (2018). Adversarially robust generalization requires more data. In: Proceedings of Advances in Neural Information Processing Systems, (pp. 5019-5031).
  • Shafahi, A., Najibi, M., Ghiasi, M. A., Xu, Z., Dickerson, J., Studer, C., ... & Goldstein, T. (2019). Adversarial training for free!. In: Proceedings of Advances in Neural Information Processing Systems, 32.
  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In: Proceedings of International Conference on Learning Representations. http://arxiv.org/abs/1312.6199
  • Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I. J., Boneh, D. & McDaniel, P. D. (2018). Ensemble adversarial training: Attacks and defenses. In: Proceedings of the 6th International Conference on Learning Representations. https://arxiv.org/abs/1705.07204
  • Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitliagkas, I., Lopez-Paz, D., & Bengio, Y. (2019, May). Manifold mixup: Better representations by interpolating hidden states. In: International Conference on Machine Learning, (pp. 6438-6447).
  • Villani, C. (2009). Optimal transport: old and new (Vol. 338, p. 23). Berlin: Springer.
  • Wang, J., & Zhang, H. (2019). Bilateral adversarial training: Towards fast training of more robust models against adversarial attacks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 6629-6638).
  • Wang, K., Li, F., Chen, C. M., Hassan, M. M., Long, J., & Kumar, N. (2021). Interpreting adversarial examples and robustness for deep learning-based auto-driving systems. IEEE Transactions on Intelligent Transportation Systems, 23(7), 9755-9764. https://doi.org/10.1109/TITS.2021.3108520
  • Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., & Gu, Q. (2019, September). Improving adversarial robustness requires revisiting misclassified examples. In: Proceedings of International Conference on Learning Representations. https://openreview.net/forum?id=rklOg6EFwS
  • Wang, Z., Pang, T., Du, C., Lin, M., Liu, W., & Yan, S. (2023). Better diffusion models further improve adversarial training. In: Proceedings of the 40th International Conference on Machine Learning. (202:36246-36263) https://proceedings.mlr.press/v202/wang23ad.html
  • Wong, E., & Kolter, Z. (2018, July). Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Proceeding of International Conference on Machine Learning, (pp. 5286-5295).
  • Wong, E., Rice, L., Kolter, J. Z. (2020). Fast is better than free: Revisiting adversarial training. In: Proceedings of the 8th International Conference on Learning Representations. https://arxiv.org/abs/2001.03994
  • Xie, Y., Wang, X., Wang, R., & Zha, H. (2020, August). A fast proximal point method for computing exact Wasserstein distance. In: Proceedings of Uncertainty in Artificial Intelligence (pp. 433-453).
  • Yuval, N., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A. Y. (2011). Reading digits in natural images with unsupervised feature learning. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
  • Zagoruyko, S., & Komodakis, N. (2016) Wide residual networks. In: Proceedings of the British Machine Vision Conference. (pp. 1-12).
  • Zhang, D., Zhang, T., Lu, Y., Zhu, Z., & Dong, B. (2019a). You only propagate once: Accelerating adversarial training via maximal principle. In: Proceedings of Advances in Neural Information Processing Systems, 32.
  • Zhang, H., Cisse, M., Dauphin, Y. N., Lopez-Paz, D. (2018). Mixup: Beyond empirical risk minimization. In: Proceedings of the 6th International Conference on Learning Representations. https://arxiv.org/abs/1710.09412
  • Zhang, H., & Xu, W. (2020). Adversarial interpolation training: A simple approach for improving model robustness. https://openreview.net/forum
  • Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019b). Theoretically principled trade-off between robustness and accuracy. In: Proceedings of the International Conference on Machine Learning, (pp. 7472-7482).
  • Zhang, H., & Wang, J. (2019). Defense against adversarial attacks using feature scattering-based adversarial training. In: Proceedings of the Advances in Neural Information Processing Systems, 32.
  • Zhang, H. (2019). Feature Scattering Adversarial Training (NeurIPS 2019) (Accessed: 24/03/2024) https://github.com/Haichao-Zhang/FeatureScatter
There are 43 citations in total.

Details

Primary Language English
Subjects Deep Learning
Journal Section Computer Engineering
Authors

Duygu Serbes 0000-0003-1067-866X

İnci M. Baytaş 0000-0003-4765-2615

Early Pub Date June 4, 2024
Publication Date June 29, 2024
Submission Date March 26, 2024
Acceptance Date May 21, 2024
Published in Issue Year 2024 Volume: 11 Issue: 2

Cite

APA Serbes, D., & Baytaş, İ. M. (2024). Perturbation Augmentation for Adversarial Training with Diverse Attacks. Gazi University Journal of Science Part A: Engineering and Innovation, 11(2), 274-288. https://doi.org/10.54287/gujsa.1458880