|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 187 - Issue 77 |
| Published: January 2026 |
| Authors: Jundi Yang, Heng Yao |
10.5120/ijca2026926252
|
Jundi Yang, Heng Yao . A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage. International Journal of Computer Applications. 187, 77 (January 2026), 1-8. DOI=10.5120/ijca2026926252
@article{ 10.5120/ijca2026926252,
author = { Jundi Yang,Heng Yao },
title = { A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage },
journal = { International Journal of Computer Applications },
year = { 2026 },
volume = { 187 },
number = { 77 },
pages = { 1-8 },
doi = { 10.5120/ijca2026926252 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2026
%A Jundi Yang
%A Heng Yao
%T A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage%T
%J International Journal of Computer Applications
%V 187
%N 77
%P 1-8
%R 10.5120/ijca2026926252
%I Foundation of Computer Science (FCS), NY, USA
Intangible Cultural Heritage (ICH) encompasses complex layers of symbolic meaning expressed through motifs, crafts, rituals, and regional traditions. Contemporary multimodal generative models frequently overlook such domain-specific semantics, leading to visually appealing but culturally inaccurate outputs. To address this limitation, this paper introduces a unified knowledge-graph–driven multimodal generation framework that couples a structured ICH Knowledge Graph (KG), a domain-adapted Large Language Model (LLM), and a controllable diffusion-based text-to-image generator. The KG organizes motifs, techniques, symbolic associations, and regional contexts into a structured semantic space, which the LLM leverages to interpret user queries and retrieve culturally grounded constraints. These constraints are injected into the diffusion model through a multi-stage semantic fusion mechanism, enabling culturally faithful and controllable image synthesis. Experimental results across three curated ICH datasets demonstrate that the proposed framework outperforms representative baselines in cultural semantic accuracy, text–image alignment, and robustness to linguistic variation. The proposed approach provides a principled pathway for integrating symbolic cultural knowledge with modern generative models, supporting large-scale preservation, computational interpretation, and creative revitalization of intangible cultural heritage.