目的 复杂产品装配过程涉及图像、文本、传感器记录等多源异构数据,存在噪声大、缺失多、语义粒度不一致与跨模态可解释性不足等问题,制约了装配质量控制与故障诊断的可靠性。为提升复杂装配场景中的信息利用效率和知识表达能力,亟需构建一套可解释、可扩展且适应多模态特性的统一知识建模框架。方法 本文提出一套面向复杂装配的多源信息融合与知识图谱构建方法。首先,设计规则化模态判别与差异化预处理,对视觉/结构化/文本数据进行标准化清洗;其次,提出知识图谱驱动的晚期融合策略,将各模态输出映射为“实体-关系-属性”,提升跨模态推理的鲁棒性与可解释性;最后,在此基础上引入大语言模型对装配知识图谱进行补充与增量更新(实体识别与关系抽取),以相似度去冗与一致性校核保持图谱演化的准确性。结论 研究为复杂产品装配场景提供了可解释、可扩展且可部署的多源融合与知识建模通用方案。
Abstract
The assembly process of complex products involves multi-source heterogeneous data such as images, texts, and sensor records, which are often characterized by high noise levels, missing information, inconsistent semantic granularity, and limited cross-modal interpretability. These challenges constrain the reliability of assembly quality control and fault diagnosis. To enhance information utilization and knowledge representation in complex assembly scenarios, it is necessary to establish a unified knowledge-modeling framework that is interpretable, extensible, and adaptable to multi-modal characteristics. A multi-source information fusion and knowledge graph construction method tailored for complex product assembly was proposed. First, a rule-based modality discrimination and differentiated preprocessing strategy was designed to standardize and cleanse visual, structured, and textual data. Second, a knowledge graph-driven late fusion strategy was introduced, which mapped outputs from different modalities into "entity-relation-attribute" triples, thereby enhancing the robustness and interpretability of cross-modal reasoning. Finally, a large language model was integrated to supplement and incrementally update the assembly knowledge graph (through entity recognition and relation extraction), while redundancy removal via similarity filtering and consistency checking ensures the accuracy of knowledge graph evolution. In conclusion, this study provides an interpretable, extensible, and deployable general solution for multi-source information fusion and knowledge modeling in complex product assembly.
关键词
多模信息融合 /
复杂产品装配 /
装配知识图谱
Key words
multi-modal information fusion /
complex product assembly /
assembly knowledge graph
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] RADFORD A, KIM J W, HALLACY C, et al.Learning Transferable Visual Models from Natural Language Supervision[C]//Proceedings of the 38th International Conference on Machine Learning. Cambridge: PMLR, 2021: 8748-8763.
[2] LI L H, YATSKAR M, YIN D, et al. VisualBERT: a Simple and Performant Baseline for Vision and Language[EB/OL]. (2019-08-09)[2024-11-17]. http://arxiv.org/abs/1908.03557.
[3] CAO N, LIN Y, SUN X, et al.Whisper: Tracing the Spatiotemporal Process of Information Diffusion in Real Time[J]. IEEE Transactions on Visualization and Computer Graphics, 2012,18(12): 2649-2658.
[4] XU X, WANG T, YANG Y, et al.Cross-modal Attention with Semantic Consistence for Image-text Matching[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(12): 5412-5425.
[5] WU Y, LIU F, WAN L, et al.Intelligent Fault Diagnostic Model for Industrial Equipment Based on Multimodal Knowledge Graph[J]. IEEE Sensors Journal, 2023, 23(21): 26269-26278.
[6] LI H, HUANG J, HUANG J, et al.Deep Multimodal Learning and Fusion Based Intelligent Fault Diagnosis Approach[J]. Journal of Beijing Institute of Technology, 2021,30(2): 172-185.
[7] 沈伟豪, 钟燕飞, 王俊珏, 等. 多模态数据的洪涝灾害知识图谱构建与应用[J]. 武汉大学学报 (信息科学版), 2023,48(12): 2009-2018.
SHEN W H, ZHONG F Y, WANG J J, et al.Construction and Application of Flood Disaster Knowledge Graph Based on Multi-modal Data[J]. Geomatics and Information Science of Wuhan University, 2023, 48(12): 2009-2018.
[8] 阴彦磊, 唐进, 顾文娟. 融合时序知识图谱与cnn-lstm的流程生产工艺质量预测[J]. 计算机集成制造系统, 2025, 31(10): 3773-3784.
YIN Y L, TANG J, GU W J.Process Production Technology Quality Prediction Integrating Time Series Knowledge Graph and cnn-lstm[J]. Computer Integrated Manufacturing Systems, 2025, 31(10): 3773-3784.
[9] 张恒郡. 融合多模态图谱和大语言模型的风机装配工艺生成研究[D]. 上海: 东华大学, 2025.
ZHANG H J.Research on Wind Turbine Assembly Process Generation Based on Multimodal Knowledge Graph and Large Language Model[D]. Shanghai: Donghua University, 2025.
[10] HU Y, CHEN Q, DU J, et al.Improving Large Language Models for Clinical Named Entity Recognition via Prompt Engineering[J]. Journal of the American Medical Informatics Association, 2024, 31(9): 1812-1820.
[11] JUNG S J, KIM H, JANG K S.LLM Based Biological Named Entity Recognition from Scientific Literature[C]//2024 IEEE International Conference on Big Data and Smart Computing (BigComp). Paris: IEEE, 2024: 433-435.
[12] CHEN H, SHEN X, LV Q, et al. SAC-KG: Exploiting Large Language Models as Skilled Automatic Constructors for Domain Lnowledge Graphs[EB/OL]. (2024-09-22)[2024-10-13]. http://arxiv.org/abs/2410.02811.
[13] HU Y, ZOU F, HAN J, et al.LLM-TIKG: Threat Intelligence Knowledge Graph Construction Utilizing Large Language Model[J]. Computers & Security, 2024(145): 103999.
[14] SHU D, CHEN T, JIN M, et al. Knowledge Graph Large Language Model (KG-LLM) for Link Prediction [EB/OL]. (2024-11-16) [2024-12-17]. http://arxiv.org/abs/2403.07311.
[15] YAN S Q, GU J C, ZHU Y, et al. Corrective Retrieval Augmented Generation[EB/OL]. (2024-10-07) [2024-11-13]. http://arxiv.org/abs/2401.15884.
[16] GUO Z, XIA L, YU Y, et al. LightRAG: Simple and Fast Retrieval-augmented Generation[EB/OL]. (2025-04-28) [2025-05-10. http://arxiv.org/abs/2410.05779.
[17] LIU P, QIAN L, ZHAO X, et al.Joint Knowledge Graph and Large Language Model for Fault Diagnosis and Its Application in Aviation Assembly[J]. IEEE Transactions on Industrial Informatics, 2024, 20(6): 8160-8169.
[18] HU Z, LI X, PAN X, et al.A Question Answering System for Assembly Process of Wind Turbines Based on Multi-modal Knowledge Graph and Large Language Model[J]. Journal of Engineering Design, 2025, 36(7-9): 1093-1117.
[19] BOLLEGALA D, O’NEILL J. A Survey on Word Meta-embedding Learning[EB/OL]. (2022-04-25) [2024-11-17]. http://arxiv.org/abs/2204.11660.
[20] KE G, MENG Q, FINLEY T, et al.LightGBM: A Highly Efficient Gradient Boosting Decision Tree[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 3149-3157.
[21] WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors[EB/OL]. (2022-07-06) [2024-10-17]. http://arxiv.org/abs/2207.02696.
[22] HE K, ZHANG X, REN S, et al.Deep Residual Learning for Image Recognition[J]. IEEE Access, 2016(10): 11-20.
[23] BORDES A, USUNIER N, GARCIA-DURAN A, et al.Translating Embeddings for Modeling Multi-relational Data[J]. Curran Associates Inc., 2013(10): 20-22.
基金
国家重点研发计划(2024YFB3311803)