The Effect of Latent Space Vector on Generating Animal Faces in Deep Convolutional GAN: An Analysis

İsa Ataş

doi:10.24012/dumf.1393797

Araştırma Makalesi

The Effect of Latent Space Vector on Generating Animal Faces in Deep Convolutional GAN: An Analysis

Yıl 2024, , 99 - 106, 29.03.2024

İsa Ataş

https://doi.org/10.24012/dumf.1393797

Öz

Researchers are showing great interest in Generative Adversarial Networks (GANs), which use deep learning techniques to mimic the content of datasets and are particularly adept at data generation. Despite their impressive performance, there is uncertainty about how GANs precisely map latent space vectors to realistic images and how the chosen dimensionality of the latent space affects the quality of the generated images. In this paper, we explored the potential of generative models in generating animal face images. For this purpose, we used the Deep Convolutional Generative Adversarial Network (DCGAN) model as a reference. To analyze the impact of selected latent space vectors, we synthesized animal face images by training data representations in the DCGAN model with the well-known AFHQ dataset from the literature. We compared the quantitative evaluation of the produced images using Fréchet Inception Distance (FID) and Inception Score (IS). As a result, we demonstrated that generative models can produce images with latent sizes significantly smaller and larger than the standard size of 100.

Anahtar Kelimeler

deep learning, FID, GAN, image generation, IS, latent space

Kaynakça

[1] M. Ucan, B. Kaya and M. Kaya, “Comparison of deep learning models for body cavity fluid cytology images classification.” In 2022 International Conference on Data Analytics for Business and Industry (ICDABI) 2022, pp. 151–155.
[2] M. Atas, Y. Yardimci and A. Temizel, “Aflatoxin contaminated chili pepper detection by hyperspectral imaging and machine learning.” Sensing for Agriculture and Food Quality and Safety III. vol. 8027. SPIE, 2011.
[3] M. Ataş et al., “Use of interactive multisensor snow and ice mapping system snow cover maps (IMS) and artificial neural networks for simulating river discharges in Eastern Turkey.” Arabian Journal of Geosciences 9, 2016, pp. 1-17.
[4] H. Acar, M. S. Özerdem and E. Acar, “Soil moisture inversion via semiempirical and machine learning methods with full-polarization radarsat-2 and polarimetric target decomposition data: A comparative study.” IEEE Access, vol. 8, pp. 197896–197907, 2020.
[5] I. Goodfellow et al., “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, Nov. 2020. DOI: 10.1145/3422622.
[6] A. Radford, L. Metz and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434, 2015.
[7] K. N. Mai, and M. Hwang, “Finding the best k for the dimension of the latent space in autoencoders,” in Computational Collective Intelligence: 12th International Conference, 2020, pp. 453–464.
[8] D. S. Ayvaz, and M. I. Baytas, "Investigating conversion from mild cognitive impairment to alzheimer's disease using latent space manipulation." arXiv preprint arXiv:2111.08794 2021.
[9] K. Shimizu et al. "Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images." ACM Symposium on Applied Perception, 2022, pp. 1–10.
[10] Z. Zhang and L. Schomaker, Optimizing and interpreting the latent space of the conditional text-to-image GANs. Neural Comput & Applic, Springer 2023, vol. 36, pp. 2549–2572. https://doi.org/10.1007/s00521-023-09185-6.
[11] E. Ntavelis et al. "StyleGenes: Discrete and efficient latent distributions for GANs." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024, pp. 4077 – 4086.
[12] AFHQ, Animal faces, 2020. [Online]. Available: https://www.kaggle.com/datasets/undrewmvd/animal-faces.
[13] N. Nekamiche et al., “A deep convolution generative adversarial network for the production of images of human faces,” in Intelligent Information and Database Systems: 14th Asian 19 Conference, ACIIDS, 2022, pp. 313–326.
[14] K. Cheng et al., “An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset,” Multimedia Tools and Applications, vol. 79, no. 19, pp. 13725–13752, Feb. 2020. DOI: 10.1007/s11042-019-08600-2.
[15] T. Salimans et al., “Improved techniques for training GANs,” NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 2234–2242.
[16] M. Heusel et al., “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6629–6640.
[17] C. Szegedy et al., “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
[18] J. Deng et al., “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, 2009, pp. 248–255.
[19] S. Barratt and R. Sharma, “A note on the inception score,” arXiv preprint arXiv:1801.01973, 2018.
[20] O. Kramer, “Dimensionality reduction with unsupervised nearest neighbors,” Berlin, Springer, 2013, vol. 51, pp. 13-23.
[21] M. Ataş, "Open Cezeri Library: A novel java based matrix and computer vision framework." Computer Applications in Engineering Education, pp. 736-743, 2016.
[22] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[23] I. Marin, S. Gotovac and M. Russo, “Evaluation of generative adversarial network for human face image synthesis,” in International Conference on Software, Telecommunications and Computer Networks (SoftCOM), IEEE, 2020, pp. 1–6.
[24] I. Marin et al., “The effect of latent space dimension on the quality of synthesized human face images,” Journal of Communications Software and Systems, vol. 17, no. 2, pp. 124–133, May 2021. DOI: 10.24138/jcomss-2021-0035.

The Effect of Latent Space Vector on Generating Animal Faces in Deep Convolutional GAN: An Analysis

Yıl 2024, , 99 - 106, 29.03.2024

İsa Ataş

https://doi.org/10.24012/dumf.1393797

Öz

Researchers have shown great interest in Generative Adversarial Networks (GANs), which utilize deep learning techniques to mimic the content of datasets and particularly excel in data generation. Despite their impressive performance, there remains uncertainty about how GANs precisely map latent space vectors to realistic images and how the chosen dimensionality of the latent space affects the quality of the generated images. In this study, we analyze the potential of learned data representations to generate different animal face images and examine the impact of the selected latent space dimension on the synthesized image quality using a Deep Convolutional GAN (DCGAN). For quantitative evaluation of the generated synthetic images, we employ metrics such as Fr´echet Inception Distance (FID) and Inception Score (IS). In addition to quantitative assessment results, we also utilize qualitative evaluation methods to assess whether overfitting is present and to form intuitive perception about data samples extracted from the model and the ability to disseminate. Finally, we compare and evaluate the results of generative outputs by training the DCGAN on well-known AFHQ Cat, AFHQ Dog, and AFHQ Wild Animals datasets, measuring the impact of latent space dimensions and image feature quality through a comparative analysis.

Anahtar Kelimeler

deep learning, FID, GAN, image generation, IS, latent space

Kaynakça

[1] M. Ucan, B. Kaya and M. Kaya, “Comparison of deep learning models for body cavity fluid cytology images classification.” In 2022 International Conference on Data Analytics for Business and Industry (ICDABI) 2022, pp. 151–155.
[2] M. Atas, Y. Yardimci and A. Temizel, “Aflatoxin contaminated chili pepper detection by hyperspectral imaging and machine learning.” Sensing for Agriculture and Food Quality and Safety III. vol. 8027. SPIE, 2011.
[3] M. Ataş et al., “Use of interactive multisensor snow and ice mapping system snow cover maps (IMS) and artificial neural networks for simulating river discharges in Eastern Turkey.” Arabian Journal of Geosciences 9, 2016, pp. 1-17.
[4] H. Acar, M. S. Özerdem and E. Acar, “Soil moisture inversion via semiempirical and machine learning methods with full-polarization radarsat-2 and polarimetric target decomposition data: A comparative study.” IEEE Access, vol. 8, pp. 197896–197907, 2020.
[5] I. Goodfellow et al., “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, Nov. 2020. DOI: 10.1145/3422622.
[6] A. Radford, L. Metz and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks.” arXiv preprint arXiv:1511.06434, 2015.
[7] K. N. Mai, and M. Hwang, “Finding the best k for the dimension of the latent space in autoencoders,” in Computational Collective Intelligence: 12th International Conference, 2020, pp. 453–464.
[8] D. S. Ayvaz, and M. I. Baytas, "Investigating conversion from mild cognitive impairment to alzheimer's disease using latent space manipulation." arXiv preprint arXiv:2111.08794 2021.
[9] K. Shimizu et al. "Human Latent Metrics: Perceptual and Cognitive Response Correlates to Distance in GAN Latent Space for Facial Images." ACM Symposium on Applied Perception, 2022, pp. 1–10.
[10] Z. Zhang and L. Schomaker, Optimizing and interpreting the latent space of the conditional text-to-image GANs. Neural Comput & Applic, Springer 2023, vol. 36, pp. 2549–2572. https://doi.org/10.1007/s00521-023-09185-6.
[11] E. Ntavelis et al. "StyleGenes: Discrete and efficient latent distributions for GANs." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024, pp. 4077 – 4086.
[12] AFHQ, Animal faces, 2020. [Online]. Available: https://www.kaggle.com/datasets/undrewmvd/animal-faces.
[13] N. Nekamiche et al., “A deep convolution generative adversarial network for the production of images of human faces,” in Intelligent Information and Database Systems: 14th Asian 19 Conference, ACIIDS, 2022, pp. 313–326.
[14] K. Cheng et al., “An analysis of generative adversarial networks and variants for image synthesis on MNIST dataset,” Multimedia Tools and Applications, vol. 79, no. 19, pp. 13725–13752, Feb. 2020. DOI: 10.1007/s11042-019-08600-2.
[15] T. Salimans et al., “Improved techniques for training GANs,” NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 2234–2242.
[16] M. Heusel et al., “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6629–6640.
[17] C. Szegedy et al., “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
[18] J. Deng et al., “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, 2009, pp. 248–255.
[19] S. Barratt and R. Sharma, “A note on the inception score,” arXiv preprint arXiv:1801.01973, 2018.
[20] O. Kramer, “Dimensionality reduction with unsupervised nearest neighbors,” Berlin, Springer, 2013, vol. 51, pp. 13-23.
[21] M. Ataş, "Open Cezeri Library: A novel java based matrix and computer vision framework." Computer Applications in Engineering Education, pp. 736-743, 2016.
[22] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[23] I. Marin, S. Gotovac and M. Russo, “Evaluation of generative adversarial network for human face image synthesis,” in International Conference on Software, Telecommunications and Computer Networks (SoftCOM), IEEE, 2020, pp. 1–6.
[24] I. Marin et al., “The effect of latent space dimension on the quality of synthesized human face images,” Journal of Communications Software and Systems, vol. 17, no. 2, pp. 124–133, May 2021. DOI: 10.24138/jcomss-2021-0035.

Toplam 24 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Görüntü İşleme, Derin Öğrenme
Bölüm	Makaleler
Yazarlar	İsa Ataş 0000-0003-4094-9598
Erken Görünüm Tarihi	29 Mart 2024
Yayımlanma Tarihi	29 Mart 2024
Gönderilme Tarihi	21 Kasım 2023
Kabul Tarihi	19 Mart 2024
Yayımlandığı Sayı	Yıl 2024

Kaynak Göster

IEEE	İ. Ataş, “The Effect of Latent Space Vector on Generating Animal Faces in Deep Convolutional GAN: An Analysis”, DÜMF MD, c. 15, sy. 1, ss. 99–106, 2024, doi: 10.24012/dumf.1393797.

Makale Dosyaları

Tam Metin

DUJE tarafından yayınlanan tüm makaleler, Creative Commons Atıf 4.0 Uluslararası Lisansı ile lisanslanmıştır. Bu, orijinal eser ve kaynağın uygun şekilde belirtilmesi koşuluyla, herkesin eseri kopyalamasına, yeniden dağıtmasına, yeniden düzenlemesine, iletmesine ve uyarlamasına izin verir. 24456