High-fidelity Diffusion Model for Huizhou Fish Lantern Image Generation

ZHU Lei; ZHAQ Xingchen; LIU Gang

doi:10.19554/j.cnki.1001-3563.2026.08.033

PDF(13034 KB)

Packaging Engineering ›› 2026, Vol. 47 ›› Issue (8) : 392-403. DOI: 10.19554/j.cnki.1001-3563.2026.08.033

Design Discussion

High-fidelity Diffusion Model for Huizhou Fish Lantern Image Generation

ZHU Lei¹, ZHAQ Xingchen², LIU Gang^1*

Author information +

History +

Abstract

The work aims to propose a high-fidelity diffusion model for controllable generation and creative design of Huizhou fish lantern patterns to address the content homogenization in lantern events and enhance the creative potential of cultural and tourism derivative products. An expert-level Huizhou fish lantern dataset was constructed to design a controllable generation framework that leveraged structured text-based conditional control and the multi-scale feature fusion mechanism to synthesize fish lantern patterns with stable structures and high visual recognizability. Experiments demonstrated that the proposed method could achieve high-quality image generation and style reconstruction. The generated Huizhou fish lantern patterns had superior realism, stability, novelty, and controllability. In the field of image design and intelligent generation for Huizhou fish lanterns, integration of generative AI methods with the visual feature modeling mechanism for intangible cultural heritage not only enhances the efficiency of intelligent image creation and the impact of visualization applications for fish lantern patterns, but also provides a practical reference for the digital transmission and preservation of other intangible cultural heritage.

Key words

high-fidelity diffusion model / Huizhou fish lantern / image generation / digital preservation of intangible cultural heritage

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

ZHU Lei, ZHAQ Xingchen, LIU Gang. High-fidelity Diffusion Model for Huizhou Fish Lantern Image Generation[J]. Packaging Engineering. 2026, 47(8): 392-403 https://doi.org/10.19554/j.cnki.1001-3563.2026.08.033

References

[1] 朱蕾. 徽州鱼灯的传承与发展研究——以汪满田鱼灯的创新设计为例[D]. 合肥: 安徽大学, 2020: 35-43.
ZHU L.Research on the inheritance and development of Huizhou fish lantern—Take the innovative design of Wangmantian fish lantern as an example[D]. Hefei: Anhui University, 2020: 35-43.
[2] 黄山市农业农村局. 歙县:“一条鱼”游出的乡村“振兴路”[EB/OL]. (2025-03-31)[2025-11-30]. https://nync. huangshan.gov.cn/qxdt/9249763.html
Huangshan Municipal Bureau of Agriculture and Rural Affairs. Shexian County: "A Fish" Swims Out a Road to Rural Revitalization[EB/OL]. (2025-03-31)[2025-11- 30]. https://nync.huangshan.gov.cn/qxdt/9249763.html
[3] SHI Z H, LIANG X Z, WANG J.LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence[C]. ICLR 2023 Conference, 2023, 46(12): 11273-11286.
[4] YAN M, XIONG R, SHEN, Y, et al.Intelligent generation of Peking opera facial masks with deep learning frameworks[J]. Scientific Reports, 2023, 11: 20.
[5] 邬开俊, 梅源. VAE-Fuse:一种无监督的多聚焦融合模型[J]. 西安电子科技大学学报, 2022, 49(6): 129-138.
WU K J, MEI Y.VAE-Fuse: an unsupervised multi-focus fusion model[J]. Journal of Xidian University, 2022, 49(6): 129-138.
[6] 黄颖, 彭慧, 李昌盛, 等. LLFlowGAN: 以生成对抗方式约束可逆流的低照度图像增强[J]. 中国图象图形学报, 2024, 29(1): 65-79.
HUANG Y, PENG H, LI C S, et al. LLFlowGAN: a low-light image enhancement method for constraining invertible flow in a generative adversarial manner[J]. Journal of Image and Graphics, 2024, 29(01): 65-79.
[7] 林予松, 李孟娅, 李英豪, 等. 基于GAN和多尺度空间注意力的多模态医学图像融合[J]. 郑州大学学报(工学版), 2025, 46(1): 1-8.
LIN Y S, LI M Y, LI Y H, et al. Multimodal medical image fusion based on gan and multiscale spatial attention[J]. Journal of Zhengzhou University (Engineering Science), 2025, 46(1): 1-8.
[8] 曹寅, 秦俊平, 高彤, 等. 基于生成对抗网络的文本两阶段生成高质量图像方法[J]. 浙江大学学报(工学版), 2024, 58(4): 674-683.
CAO Y, QIN J P, GAO T, et al.A Two-Stage text-to-high-quality-image generation method based on generative adversarial networks[J]. Journal of Zhejiang University (Engineering Science), 2024, 58(4): 674-683.
[9] 钦松, 刘宝骏, 白晓波, 等. 基于生成式人工智能的耀州青瓷传统牡丹纹饰构建与再塑研究[J]. 包装工程, 2025, 46(18): 380-389.
QIN S, LIU B J, BAI X B, et al.Research on the construction and reshaping of traditional peony patterns of Yaozhou celadon based on generative artificial intelligence[J]. Packaging Engineering, 2025, 46(18): 380-389.
[10] 吴卫, 丁雨欣, 李芷仪. 基于SD-LoRA模型的西兰卡普纹样智能生成研究[J]. 包装工程, 2025, 46(16): 360-371+430.
WU W, DING Y X, LI Z Y. Intelligent generation of Xilankapu patterns based on the SD-LoRA model[J]. Packaging Engineering, 2025, 46(16): 360-371+430.
[11] DARAS G, RODRIGUEZ-MUNOZ A, KLIVANS A, et al.Ambient Diffusion Omni: Training Good Models with Bad Data[C]//Advances in Neural Information Processing Systems (NeurIPS). 2025.
[12] LIANG M, HE Q, YU X, et al.A dual multi-head contextual attention network for hyperspectral image classification[J]. Remote Sensing, 2022, 14: 3091.
[13] LI H, HUSSIN N, HE D, et al.Design of image segmentation model based on residual connection and feature fusion[J]. PLoS ONE, 2024, 19(10): e0309434.
[14] 侯哲晓, 李弼程, 蔡炳炎,等. 基于改进扩散模型的高质量图像生成方法[J]. 计算机科学, 2025, 52(S1): 461-469.
HOU Z X, LI B C, CAI B Y, et al. A high-quality image generation method based on an improved diffusion model[J]. Computer Science, 2025, 52(S1): 461-469.
[15] 孙禾衣, 李艺潇, 田希, 等. 结合程序内容生成与扩散模型的图像到三维瓷瓶生成技术[J]. 图学学报, 2025, 46(2): 332-344.
SUN H Y, LI Y X, TIAN X, et al. A Image to 3D vase generation technology combining procedural content generation and diffusion models[J]. Journal of Graphic, 2025, 46(2): 332-344 .
[16] CHILD R.Very deep VAEs generalize autoregressive models and can outperform them on images[C]//International Conference on Learning Representations (ICLR), 2021.
[17] SUTTER T M, MENG Y, AGOSTINI A, et al.Unity by Diversity: Improved Representation Learning for Multimodal VAEs[C]//Advances in Neural Information Processing Systems (NeurIPS), 2024: 74262-74297.
[18] GUR S, BENAIM S, WOLF L.Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample[C]//Advances in Neural Information Processing Systems (NeurIPS), 2020.
[19] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al.Generative Adversarial Nets[C]//Advances in Neural Information Processing Systems (NeurIPS), 2014: 2672-2680
[20] ARJOVSKY M, CHINTALA S, BOTTOU L.Wasserstein Generative Adversarial Networks[C]//Proceedings of the 34th International Conference on Machine Learning (ICML), 2017: 214-223.
[21] SCHÖNFELD E, SCHIELE B, KHOREVA A. A U-Net Based Discriminator for Generative Adversarial Networks[C]//Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 8207-8216.
[22] ZHAO S, LIU Z, LIN J, et al.Differentiable Augmentation for Data-Efficient GAN Training[C]//Advances in Neural Information Processing Systems (NeurIPS), 2020.
[23] ZHANG L, RAO A, AGRAWALA M.Adding Conditional Control to Text-to-Image Diffusion Models[C]//International Conference on Computer Vision (ICCV), 2023: 3836-3847.
[1] MOU C, WANG X T, XIE L B, et al.T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models[C]//Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence(AAAI-24), 2024: 4296-4304.
[24] LI Y, LIU H, WU Q, et al.GLIGEN: Open-Set Grounded Text-to-Image Generation[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023: 22511-22521
[25] HU E, SHEN Y, WALLIS P, et al.LoRA: Low-Rank Adaptation of Large Language Models[C]//International Conference on Learning Representations (ICLR), 2022.
[26] ZHONG M, SHEN Y L, WANG S H, et al.Multi-LoRA composition for image generation[J]. Transactions on Machine Learning Research(TMLR), 2024.
[27] STRACKE N, BAUMANN S A, SUSSKIND J, et al.CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control and Altering of T2I Models[C]//European Conference on Computer Vision (ECCV), 2024: 87-103.
[28] FARMAKIS-SEREBRYAKOVA M, HEITZLER M, HURNI L.Scale-and resolution-adapted shaded relief generation using U-Net[J]. ISPRS International Journal of Geo-Information, 2024, 13: 326.
[29] Akter A, Nosheen N, Ahmed S, et al. Almoyad MAA, Hasan KF, Moni MA. Robust clinical applicable CNN and U-Net based algorithm for MRI classification and segmentation for brain tumor[J]. Expert Systems with Applications, 2024, 238: 122347.
[30] 顾婷婷, 黄亦露, 王亚男, 等. 基于多头注意力机制的ResNet-UNet短期风电功率预测[J/OL]. 太阳能学报, 2025: 1-6.
GU T T, HUANG Y L, WANG Y N, et al. Short-Term wind power forecasting using resnet-unet based on multi-head attention mechanism[J/OL]. Acta Energiae Solaris Sinica, 2025: 1-6.
[31] MAK HWL, HAN R, YIN H H F. Application of Variational AutoEncoder (VAE) Model and Image Processing Approaches in Game Design[J]. Sensors , 2023, 23(7): 3457.
[32] IQBAL M A, JADOON W, KIM S K.Synthetic image generation using conditional GAN-provided single- sample face image[J]. Applied Sciences, 2024, 14(12): 5049.
[33] ZHAO A X, ZHENG Q H, LI L.Application of CycleGAN-based low-light image enhancement algorithm in foreign object detection on belt conveyors in underground mines[J]. Scientific Reports, 2025, 15: 27897.
[34] ROMBACH R, BLATTMANN A, LORENZ D, et al.High-resolution image synthesis with latent diffusion models[C]//Computer Vision and Pattern Recognition (CVPR), 2022: 10674-10685.
[35] HO J, JAIN A, ABBEEL P.Denoising diffusion probabilistic models[C]//Neural Information Processing Systems, 2020, 33: 6840-6851.