Photo-to-Cartoon Image Translation Using CartoonGAN with a Joint Learning Approach

Muhamad Shiddiq; Ahmad Tri Hidayat

doi:10.58526/jsret.v5i2.1156

Authors

Muhamad Shiddiq Universitas Teknologi Yogyakarta
Ahmad Tri Hidayat

DOI:

https://doi.org/10.58526/jsret.v5i2.1156

Keywords:

Photo-to-Cartoon, CartoonGAN, Joint Learning, image Stylization, Image Translation

Abstract

Photo-to-cartoon translation is a non-photorealistic rendering task that generates illustrative visuals while preserving fundamental object structures. This study proposes a CartoonGAN-based approach employing a joint learning scheme that integrates a lightweight denoising module into the generator. Trained end-to-end alongside the stylization process, this module suppresses noise and irrelevant textures without losing critical semantic information from input photographs. Using unpaired photo and cartoon images from the Hugging Face platform, the model is trained with a combination of adversarial and L1-based content losses to balance style generation and structural preservation. Experimental results indicate a stable and convergent training process, achieving an average content loss of 0.0286 and a generator adversarial loss of 0.3982 at epoch 50. Qualitatively, the generated images exhibit sharper contours, uniform color regions, and reduced fine textures compared to the original photographs. These findings demonstrate that integrating a denoising module via joint learning significantly improves visual consistency and training stability, providing an effective deep learning-based solution for photo-to-cartoon translation.

Downloads

Download data is not yet available.

References

Chen, Y., Lai, Y.-K., & Liu, Y.-J. (2018). CartoonGAN: Generative adversarial networks for photo cartoonization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9465–9474. https://openaccess.thecvf.com/content_cvpr_2018/html/Chen_CartoonGAN_Generative_Adversarial_CVPR_2018_paper

Chen, Y., Zhou, H., Chen, J., Yang, N., Zhao, J., & Chao, Y. (2025). Diffusion model-based cartoon style transfer for real-world 3D scenes. ISPRS International Journal of Geo-Information, 14(8), 303. https://doi.org/10.3390/ijgi14080303

Gao, X., Zhang, Y., & Tian, Y. (2022). Learning to incorporate texture saliency adaptive attention to image cartoonization. arXiv preprint arXiv:2208.01587. https://doi.org/10.48550/arXiv.2208.01587

Huang, M. (2024). A survey on image style transfer based on deep learning. Journal of Computing and Electronic Information Management, 15(3), 66–70. https://doi.org/10.54097/mxgtcj89

Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5967–5976. https://doi.org/10.1109/CVPR.2017.632

Jiménez-Gaona, Y., Rodríguez-Alvarez, M. J., Escudero, L., Sandoval, C., & Lakshminarayanan, V. (2024). Ultrasound breast images denoising using generative adversarial networks (GANs). Intelligent Data Analysis, 28(6), 1661–1678. https://doi.org/10.3233/IDA-230631

Jo, Y., Chun, S. Y., & Choi, J. (2021). Rethinking deep image prior for denoising. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 5067–5076. https://doi.org/10.1109/ICCV48922.2021.00504

Karras, T., Laine, S., & Aila, T. (2021). A style-based generator architecture for generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12), 4217–4228. https://doi.org/10.1109/TPAMI.2020.2970919

Men, Y., Yao, Y., Cui, M., Lian, Z., Xie, X., & Hua, X.-S. (2022). Unpaired cartoon image synthesis via gated cycle mapping. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3491–3500. https://doi.org/10.1109/CVPR52688.2022.00349

Shu, Y., Yi, R., Xia, M., Ye, Z., Zhao, W., Chen, Y., . . . Liu, Y.-J. (2022). GAN-based multi-style photo cartoonization. IEEE Transactions on Visualization and Computer Graphics, 28(10), 3376–3390. https://doi.org/10.1109/TVCG.2021.3067201

Tang, Y. (2023). ECGAN: Translate real world to cartoon style using enhanced cartoon generative adversarial network. Computers, Materials & Continua, 76(1), 1195–1212. https://doi.org/10.32604/cmc.2023.039182

Thakur, A., Rizvi, H., & Satish, M. (2021). White-box cartoonization using an extended GAN framework. International Journal of Engineering Applied Sciences and Technology, 5(12). https://doi.org/10.33564/IJEAST.2021.v05i12.049

Tiantian, W., Hu, Z., & Guan, Y. (2024). An efficient lightweight network for image denoising using progressive residual and convolutional attention feature fusion. Scientific Reports, 14(1), 9554. https://doi.org/10.1038/s41598-024-60139-x

Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2020). Deep image prior. International Journal of Computer Vision, 128(7), 1867–1888. https://doi.org/10.1007/s11263-020-01303-4

Wang, L. (2022). Cartoon-style image rendering transfer based on neural networks. Computational Intelligence and Neuroscience, 2022, 1–10. https://doi.org/10.1155/2022/2958338

Wu, W., Chen, M., Xiang, Y., Zhang, Y., & Yang, Y. (2023). Recent progress in image denoising: A training strategy perspective. IET Image Processing, 17(6), 1627–1657. https://doi.org/10.1049/ipr2.12748

Xu, Y., Xia, M., Hu, K., Zhou, S., & Weng, L. (2025). Style transfer review: Traditional machine learning to deep learning. Information, 16(2), 157. https://doi.org/10.3390/info16020157

Zhang, F., Zhao, H., Li, Y., Wu, Y., & Sun, X. (2023). CBA-GAN: Cartoonization style transformation based on the convolutional attention module. Computers and Electrical Engineering, 106, 108575. https://doi.org/10.1016/j.compeleceng.2022.108575

Zheng, Z., Wang, C., Yu, Z., Wang, N., Zheng, H., & Zheng, B. (2019). Unpaired photo-to-caricature translation on faces in the wild. Neurocomputing, 355, 71–81. https://doi.org/10.1016/j.neucom.2019.04.032