Photo-to-Cartoon Image Translation Using CartoonGAN with a Joint Learning Approach
DOI:
https://doi.org/10.58526/jsret.v5i2.1156Keywords:
Photo-to-Cartoon, CartoonGAN, Joint Learning, image Stylization, Image TranslationAbstract
Photo-to-cartoon translation is a non-photorealistic rendering task that generates illustrative visuals while preserving fundamental object structures. This study proposes a CartoonGAN-based approach employing a joint learning scheme that integrates a lightweight denoising module into the generator. Trained end-to-end alongside the stylization process, this module suppresses noise and irrelevant textures without losing critical semantic information from input photographs. Using unpaired photo and cartoon images from the Hugging Face platform, the model is trained with a combination of adversarial and L1-based content losses to balance style generation and structural preservation. Experimental results indicate a stable and convergent training process, achieving an average content loss of 0.0286 and a generator adversarial loss of 0.3982 at epoch 50. Qualitatively, the generated images exhibit sharper contours, uniform color regions, and reduced fine textures compared to the original photographs. These findings demonstrate that integrating a denoising module via joint learning significantly improves visual consistency and training stability, providing an effective deep learning-based solution for photo-to-cartoon translation.Downloads
References
Chen, Y., Lai, Y.-K., & Liu, Y.-J. (2018). CartoonGAN: Generative adversarial networks for photo cartoonization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9465–9474. https://openaccess.thecvf.com/content_cvpr_2018/html/Chen_CartoonGAN_Generative_Adversarial_CVPR_2018_paper
Chen, Y., Zhou, H., Chen, J., Yang, N., Zhao, J., & Chao, Y. (2025). Diffusion model-based cartoon style transfer for real-world 3D scenes. ISPRS International Journal of Geo-Information, 14(8), 303. https://doi.org/10.3390/ijgi14080303
Gao, X., Zhang, Y., & Tian, Y. (2022). Learning to incorporate texture saliency adaptive attention to image cartoonization. arXiv preprint arXiv:2208.01587. https://doi.org/10.48550/arXiv.2208.01587
Huang, M. (2024). A survey on image style transfer based on deep learning. Journal of Computing and Electronic Information Management, 15(3), 66–70. https://doi.org/10.54097/mxgtcj89
Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5967–5976. https://doi.org/10.1109/CVPR.2017.632
Jiménez-Gaona, Y., Rodríguez-Alvarez, M. J., Escudero, L., Sandoval, C., & Lakshminarayanan, V. (2024). Ultrasound breast images denoising using generative adversarial networks (GANs). Intelligent Data Analysis, 28(6), 1661–1678. https://doi.org/10.3233/IDA-230631
Jo, Y., Chun, S. Y., & Choi, J. (2021). Rethinking deep image prior for denoising. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 5067–5076. https://doi.org/10.1109/ICCV48922.2021.00504
Karras, T., Laine, S., & Aila, T. (2021). A style-based generator architecture for generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12), 4217–4228. https://doi.org/10.1109/TPAMI.2020.2970919
Men, Y., Yao, Y., Cui, M., Lian, Z., Xie, X., & Hua, X.-S. (2022). Unpaired cartoon image synthesis via gated cycle mapping. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3491–3500. https://doi.org/10.1109/CVPR52688.2022.00349
Shu, Y., Yi, R., Xia, M., Ye, Z., Zhao, W., Chen, Y., . . . Liu, Y.-J. (2022). GAN-based multi-style photo cartoonization. IEEE Transactions on Visualization and Computer Graphics, 28(10), 3376–3390. https://doi.org/10.1109/TVCG.2021.3067201
Tang, Y. (2023). ECGAN: Translate real world to cartoon style using enhanced cartoon generative adversarial network. Computers, Materials & Continua, 76(1), 1195–1212. https://doi.org/10.32604/cmc.2023.039182
Thakur, A., Rizvi, H., & Satish, M. (2021). White-box cartoonization using an extended GAN framework. International Journal of Engineering Applied Sciences and Technology, 5(12). https://doi.org/10.33564/IJEAST.2021.v05i12.049
Tiantian, W., Hu, Z., & Guan, Y. (2024). An efficient lightweight network for image denoising using progressive residual and convolutional attention feature fusion. Scientific Reports, 14(1), 9554. https://doi.org/10.1038/s41598-024-60139-x
Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2020). Deep image prior. International Journal of Computer Vision, 128(7), 1867–1888. https://doi.org/10.1007/s11263-020-01303-4
Wang, L. (2022). Cartoon-style image rendering transfer based on neural networks. Computational Intelligence and Neuroscience, 2022, 1–10. https://doi.org/10.1155/2022/2958338
Wu, W., Chen, M., Xiang, Y., Zhang, Y., & Yang, Y. (2023). Recent progress in image denoising: A training strategy perspective. IET Image Processing, 17(6), 1627–1657. https://doi.org/10.1049/ipr2.12748
Xu, Y., Xia, M., Hu, K., Zhou, S., & Weng, L. (2025). Style transfer review: Traditional machine learning to deep learning. Information, 16(2), 157. https://doi.org/10.3390/info16020157
Zhang, F., Zhao, H., Li, Y., Wu, Y., & Sun, X. (2023). CBA-GAN: Cartoonization style transformation based on the convolutional attention module. Computers and Electrical Engineering, 106, 108575. https://doi.org/10.1016/j.compeleceng.2022.108575
Zheng, Z., Wang, C., Yu, Z., Wang, N., Zheng, H., & Zheng, B. (2019). Unpaired photo-to-caricature translation on faces in the wild. Neurocomputing, 355, 71–81. https://doi.org/10.1016/j.neucom.2019.04.032
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Muhamad Shiddiq, Ahmad Tri Hidayat

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright @2022. This is an open-access article distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/by-sa/4.0/) which permits unrestricted commercial used, distribution and reproduction in any medium
JRSET is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


