Using Up-to-Date GAN Methods for Aerial Images

Sara Altun Güven; Buket Toptaş

doi:10.24012/dumf.1386384

Araştırma Makalesi

Using Up-to-Date GAN Methods for Aerial Images

Yıl 2024, , 87 - 97, 29.03.2024

Sara Altun Güven , Buket Toptaş

https://doi.org/10.24012/dumf.1386384

Öz

Object detection and segmentation in aerial images is currently a vibrant and significant field of research. The iSAID dataset has been created for object detection in images captured by aerial vehicles. In this study, image semantic segmentation was performed on the iSAID dataset using Generative Adversarial Networks (GANs). The compared GAN methods are CycleGAN, DCLGAN, SimDCL, and SSimDCL. All methods operate on unpaired images. DCLGAN and SimDCL methods are derived by taking inspiration from the CycleGAN method. In these methods, cost functions and network structures vary. This study thoroughly examines the methods, and their similarities and differences are observed. After semantic segmentation is performed, the results are presented using both visual and measurement metrics. Measurement metrics such as FID, KID, PSNR, FSIM, SSIM, and MAE are used. Experimental studies show that SSimDCL and SimDCL methods outperform other methods in iSAID image semantic segmentation. CycleGAN method, on the other hand, is observed to be less successful compared to other methods. The aim of this study is to perform automatic semantic segmentation in aerial images.

Anahtar Kelimeler

deep learning, semantic segmentation, aerial images, GANs.

Kaynakça

[1] R. S. A. V. K. V. Shrimali, "Current trends in segmentation of medical ultrasound B-mode images: A review," IETE Tech. Rev., cilt 1, no. 817, ss. 26, 2009.
[2] G. Hu and Mageras, "Survey of recent volumetric medical image segmentation techniques," Biomedical Engineering, Vukovar, Croatia: In-Tech, ss. 3216, 2009.
[3] A. A. Taha and A. Hanbury, "Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool," BMC medical imaging, cilt 15, no. 1, ss. 1-28, 2015.
[4] G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen et al., "Interactive medical image segmentation using deep learning with image-specific fine tuning," IEEE transactions on medical imaging, cilt 37, no. 7, ss. 1562-1573, 2018.
[5] Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao et al., "Ce-net: Context encoder network for 2d medical image segmentation," IEEE transactions on medical imaging, cilt 38, no. 10, ss. 2281-2292, 2019.
[6] B. Kayalibay, G. Jensen and P. van der Smagt, "CNN-based segmentation of medical imaging data," arXiv preprint arXiv:1701.03056, 2017.
[7] Y. Xue, T. Xu, H. Zhang, L. R. Long, X. Huang, "SegAN: Adversarial Network with Multi-scale L1 Loss for Medical Image Segmentation," Neuroinformatics, cilt 16, ss. 383–392, 2018.
[8] N. Khosravan, A. Mortazi, M. Wallace, and U. Bagci, "Pan: Projective adversarial network for medical image segmentation," Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, ss. 68–76, 2019.
[9] M. Zhao, L. Wang, J. Chen, D. Nie, Y. Cong, S. Ahmad et al., "Craniomaxillofacial bony structures segmentation from MRI with deep-supervision adversarial learning," Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, ss. 720–727, 2018.
[10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair et al., "Generative adversarial nets," Advances in neural information processing systems, ss. 2672-2680, 2014.
[11] S. Saito, R. Arai and Y. Aoki, "Seamline determination based on semantic segmentation for aerial image mosaicking," IEEE Access, cilt 3, ss. 2847–2856, 2015.
[12] B. Yu, L. Yang and F. Chen, "Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, cilt 11, no. 9, ss. 3252–3261, 2018.
[13] D. Chai, S. Newsam and J. Huang, "Aerial image semantic segmentation using DCNN predicted distance maps," ISPRS Journal of Photogrammetry and Remote Sensing, cilt 161, ss. 309–322, 2020.
[14] A. Abdollahi, B. Pradhan, G. Sharma, K. N. A. Maulud and A. Alamri, "Improving road semantic segmentation using generative adversarial network," IEEE Access, cilt 9, ss. 64381–64392, 2021.
[15] F. Wang, X. Luo, Q. Wang and L. Li, "Aerial-BiSeNet: A real-time semantic segmentation network for high resolution aerial imagery," Chinese Journal of Aeronautics, cilt 34, no. 9, ss. 47–59, 2021.
[16] C., KOÇ., &, F. Özyurt., An examination of synthetic images produced with DCGAN according to the size of data and epoch. Firat University Journal of Experimental and Computational Engineering, 2(1), 32-37, 2023.
[17] E. Şahin,, & Talu, M. F. Talu, Bıyık Deseni Üretiminde Çekişmeli Üretici Ağların Performans Karşılaştırması. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(4), 1575-1589, 2022.
[18] A., ŞENER, & B. ERGEN, Enhancing Image Classification Performance through Discrete Cosine Transformation on Augmented Facial Images using GANs. Computer Science, (IDAP-2023), 7-18, 2023.
[19] S. Desai and D. Ghose, "Active learning for improved semi-supervised semantic segmentation in satellite images," Proceedings of the IEEE/CVF winter conference on applications of computer vision, ss. 553-563, 2022.
[20] R. Abdelfattah, X. Wang and S. Wang, "Plgan: Generative adversarial networks for power-line segmentation in aerial images," arXiv preprint arXiv:2204.07243, 2022.
[21] R. Zhou, Z. Yuan, X. Rong, W. Ma, X. Sun, K. Fu et al., "Weakly Supervised Semantic Segmentation in Aerial Imagery via Cross-Image Semantic Mining," Remote Sensing, cilt 15, no. 4, ss. 986, 2023.
[22] J. Y. Zhu, T. Park, P. Isola and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," Proceedings of the IEEE international conference on computer vision, ss. 2223-2232, 2017.
[23] J. Han, M. Shoeiby, L. Petersson and M. A. Armin, "Dual Contrastive Learning for Unsupervised Image-to-Image Translation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, ss. 746-755, 2021.
[24] S. A. Güven and M. F. Talu, "Brain MRI high resolution image creation and segmentation with the new GAN method," Biomedical Signal Processing and Control, cilt 80, ss. 104246, 2023.
[25] S. Waqas Zamir, A. Arora, A. Gupta, S. Khan, G. Sun, F. Shahbaz Khan et al., "iSAID: A large-scale dataset for instance segmentation in aerial images," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, ss. 28-37, 2019.
[26] M. Heusel, H. Ramsauer, T. Unterthiner et al., "Gans trained by a two time-scale update rule converge to a local nash equilibrium," Advances in neural information processing systems, cilt 30, 2017.
[27] M. Bińkowski, D. J. Sutherland, M. Arbel and A. Gretton, "Demystifying mmd gans," arXiv preprint arXiv:1801.01401, 2018.
[28] L. Zhang, L. Zhang, X. Mou et al., "FSIM: A feature similarity index for image quality assessment," IEEE transactions on Image Processing, cilt 20, no. 8, ss. 2378-2386, 2011.
[29] Z. Wang, A. C. Bovik, H. R. Sheikh et al., "Image quality assessment: from error visibility to structural similarity," IEEE transactions on image processing, cilt 13, no. 4, ss. 600-612, 2004.
[30] PSNR (Peak Signal-to-Noise Ratio), IEEE transactions on Image Processing, cilt 20, no. 8, ss. 2378-2386, 2011.
[31] D. P. Fan, S. C. Zhang, Y. H. Wu et al., "Scoot: A perceptual metric for facial sketches," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, ss. 5612-5622.
[32] Diederik P. Kingma and Jimmy Ba, "Adam: A method for stochastic optimization," International Conference on Learning Representations (ICLR), 2014.
[33] Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, "Deep residual learning for image recognition," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), sayfalar 770-778, 2016.
[34] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou and Alexei A. Efros, "Image-to-image translation with conditional adversarial networks," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

Havadan Görüntüler İçin Güncel GAN Yöntemlerinin Kullanımı

Yıl 2024, , 87 - 97, 29.03.2024

Sara Altun Güven , Buket Toptaş

https://doi.org/10.24012/dumf.1386384

Öz

Hava görüntülerinde nesne tespiti ve segmentasyonu şu anda canlı ve önemli bir araştırma alanıdır. iSAID veri seti, hava araçlarıyla çekilen görüntülerde nesne tespiti amacıyla oluşturulmuştur. Bu çalışmada, Generative Adversarial Networks (GAN'ler) kullanılarak iSAID veri seti üzerinde görüntü semantik bölütleme yapılmıştır. Karşılaştırılan GAN yöntemleri CycleGAN, DCLGAN, SimDCL ve SSimDCL'dir. Tüm yöntemler eşleştirilmemiş görüntüler üzerinde çalışır. DCLGAN ve SimDCL yöntemleri CycleGAN yönteminden ilham alınarak türetilmiştir. Bu yöntemlerde maliyet fonksiyonları ve ağ yapıları farklılık göstermektedir. Bu çalışmada yöntemler ayrıntılı olarak incelenmekte, benzerlikleri ve farklılıkları gözlemlenmektedir. Semantik segmentasyon yapıldıktan sonra sonuçlar hem görsel hem de nicel metrikler kullanılarak sunulmuştur. Bu çalışmada; FID, KID, PSNR, FSIM, SSIM ve MAE gibi ölçüm metrikleri kullanılmıştır. Deneysel çalışmalar, SSimDCL ve SimDCL yöntemlerinin iSAID görüntü semantik segmentasyonunda diğer yöntemlerden daha iyi performans ile sonuçlandığını göstermektedir. CycleGAN yönteminin ise diğer yöntemlere göre daha az başarılı olduğu görülmektedir. Bu çalışmanın amacı hava görüntülerinde otomatik semantik bölütleme gerçekleştirmektir.

Anahtar Kelimeler

GANs., derin öğrenme, Semantik bölütleme, havadan görüntüler

Kaynakça

[1] R. S. A. V. K. V. Shrimali, "Current trends in segmentation of medical ultrasound B-mode images: A review," IETE Tech. Rev., cilt 1, no. 817, ss. 26, 2009.
[2] G. Hu and Mageras, "Survey of recent volumetric medical image segmentation techniques," Biomedical Engineering, Vukovar, Croatia: In-Tech, ss. 3216, 2009.
[3] A. A. Taha and A. Hanbury, "Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool," BMC medical imaging, cilt 15, no. 1, ss. 1-28, 2015.
[4] G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen et al., "Interactive medical image segmentation using deep learning with image-specific fine tuning," IEEE transactions on medical imaging, cilt 37, no. 7, ss. 1562-1573, 2018.
[5] Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao et al., "Ce-net: Context encoder network for 2d medical image segmentation," IEEE transactions on medical imaging, cilt 38, no. 10, ss. 2281-2292, 2019.
[6] B. Kayalibay, G. Jensen and P. van der Smagt, "CNN-based segmentation of medical imaging data," arXiv preprint arXiv:1701.03056, 2017.
[7] Y. Xue, T. Xu, H. Zhang, L. R. Long, X. Huang, "SegAN: Adversarial Network with Multi-scale L1 Loss for Medical Image Segmentation," Neuroinformatics, cilt 16, ss. 383–392, 2018.
[8] N. Khosravan, A. Mortazi, M. Wallace, and U. Bagci, "Pan: Projective adversarial network for medical image segmentation," Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, ss. 68–76, 2019.
[9] M. Zhao, L. Wang, J. Chen, D. Nie, Y. Cong, S. Ahmad et al., "Craniomaxillofacial bony structures segmentation from MRI with deep-supervision adversarial learning," Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, ss. 720–727, 2018.
[10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair et al., "Generative adversarial nets," Advances in neural information processing systems, ss. 2672-2680, 2014.
[11] S. Saito, R. Arai and Y. Aoki, "Seamline determination based on semantic segmentation for aerial image mosaicking," IEEE Access, cilt 3, ss. 2847–2856, 2015.
[12] B. Yu, L. Yang and F. Chen, "Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, cilt 11, no. 9, ss. 3252–3261, 2018.
[13] D. Chai, S. Newsam and J. Huang, "Aerial image semantic segmentation using DCNN predicted distance maps," ISPRS Journal of Photogrammetry and Remote Sensing, cilt 161, ss. 309–322, 2020.
[14] A. Abdollahi, B. Pradhan, G. Sharma, K. N. A. Maulud and A. Alamri, "Improving road semantic segmentation using generative adversarial network," IEEE Access, cilt 9, ss. 64381–64392, 2021.
[15] F. Wang, X. Luo, Q. Wang and L. Li, "Aerial-BiSeNet: A real-time semantic segmentation network for high resolution aerial imagery," Chinese Journal of Aeronautics, cilt 34, no. 9, ss. 47–59, 2021.
[16] C., KOÇ., &, F. Özyurt., An examination of synthetic images produced with DCGAN according to the size of data and epoch. Firat University Journal of Experimental and Computational Engineering, 2(1), 32-37, 2023.
[17] E. Şahin,, & Talu, M. F. Talu, Bıyık Deseni Üretiminde Çekişmeli Üretici Ağların Performans Karşılaştırması. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(4), 1575-1589, 2022.
[18] A., ŞENER, & B. ERGEN, Enhancing Image Classification Performance through Discrete Cosine Transformation on Augmented Facial Images using GANs. Computer Science, (IDAP-2023), 7-18, 2023.
[19] S. Desai and D. Ghose, "Active learning for improved semi-supervised semantic segmentation in satellite images," Proceedings of the IEEE/CVF winter conference on applications of computer vision, ss. 553-563, 2022.
[20] R. Abdelfattah, X. Wang and S. Wang, "Plgan: Generative adversarial networks for power-line segmentation in aerial images," arXiv preprint arXiv:2204.07243, 2022.
[21] R. Zhou, Z. Yuan, X. Rong, W. Ma, X. Sun, K. Fu et al., "Weakly Supervised Semantic Segmentation in Aerial Imagery via Cross-Image Semantic Mining," Remote Sensing, cilt 15, no. 4, ss. 986, 2023.
[22] J. Y. Zhu, T. Park, P. Isola and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," Proceedings of the IEEE international conference on computer vision, ss. 2223-2232, 2017.
[23] J. Han, M. Shoeiby, L. Petersson and M. A. Armin, "Dual Contrastive Learning for Unsupervised Image-to-Image Translation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, ss. 746-755, 2021.
[24] S. A. Güven and M. F. Talu, "Brain MRI high resolution image creation and segmentation with the new GAN method," Biomedical Signal Processing and Control, cilt 80, ss. 104246, 2023.
[25] S. Waqas Zamir, A. Arora, A. Gupta, S. Khan, G. Sun, F. Shahbaz Khan et al., "iSAID: A large-scale dataset for instance segmentation in aerial images," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, ss. 28-37, 2019.
[26] M. Heusel, H. Ramsauer, T. Unterthiner et al., "Gans trained by a two time-scale update rule converge to a local nash equilibrium," Advances in neural information processing systems, cilt 30, 2017.
[27] M. Bińkowski, D. J. Sutherland, M. Arbel and A. Gretton, "Demystifying mmd gans," arXiv preprint arXiv:1801.01401, 2018.
[28] L. Zhang, L. Zhang, X. Mou et al., "FSIM: A feature similarity index for image quality assessment," IEEE transactions on Image Processing, cilt 20, no. 8, ss. 2378-2386, 2011.
[29] Z. Wang, A. C. Bovik, H. R. Sheikh et al., "Image quality assessment: from error visibility to structural similarity," IEEE transactions on image processing, cilt 13, no. 4, ss. 600-612, 2004.
[30] PSNR (Peak Signal-to-Noise Ratio), IEEE transactions on Image Processing, cilt 20, no. 8, ss. 2378-2386, 2011.
[31] D. P. Fan, S. C. Zhang, Y. H. Wu et al., "Scoot: A perceptual metric for facial sketches," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, ss. 5612-5622.
[32] Diederik P. Kingma and Jimmy Ba, "Adam: A method for stochastic optimization," International Conference on Learning Representations (ICLR), 2014.
[33] Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, "Deep residual learning for image recognition," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), sayfalar 770-778, 2016.
[34] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou and Alexei A. Efros, "Image-to-image translation with conditional adversarial networks," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

Toplam 34 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Görüntü İşleme, Derin Öğrenme, Nöral Ağlar
Bölüm	Makaleler
Yazarlar	Sara Altun Güven 0000-0003-2877-7105 Buket Toptaş 0000-0003-2556-8199
Erken Görünüm Tarihi	29 Mart 2024
Yayımlanma Tarihi	29 Mart 2024
Gönderilme Tarihi	5 Kasım 2023
Kabul Tarihi	11 Şubat 2024
Yayımlandığı Sayı	Yıl 2024

Kaynak Göster

IEEE	S. Altun Güven ve B. Toptaş, “Using Up-to-Date GAN Methods for Aerial Images”, DÜMF MD, c. 15, sy. 1, ss. 87–97, 2024, doi: 10.24012/dumf.1386384.

Makale Dosyaları

Tam Metin

DUJE tarafından yayınlanan tüm makaleler, Creative Commons Atıf 4.0 Uluslararası Lisansı ile lisanslanmıştır. Bu, orijinal eser ve kaynağın uygun şekilde belirtilmesi koşuluyla, herkesin eseri kopyalamasına, yeniden dağıtmasına, yeniden düzenlemesine, iletmesine ve uyarlamasına izin verir. 24456