基于图文多模态的欠驱动机器人设计方案评估方法研究

孙军强, 刘建国, 刘涛, 史青

包装工程(设计栏目) ›› 2025, Vol. 46 ›› Issue (12) : 50-59.

PDF(2297 KB)
PDF(2297 KB)
包装工程(设计栏目) ›› 2025, Vol. 46 ›› Issue (12) : 50-59. DOI: 10.19554/j.cnki.1001-3563.2025.12.004
专题:融智于形·交叉赋能

基于图文多模态的欠驱动机器人设计方案评估方法研究

  • 孙军强1,2, 刘建国2, 刘涛1*, 史青2
作者信息 +

Evaluation Method of Underactuated Robot Design Schemes Based on Vision-Text Multimodal Models

  • SUN Junqiang1,2, LIU Jianguo2, LIU Tao1*, SHI Qing2
Author information +
文章历史 +

摘要

目的 针对传统欠驱动机器人设计方案评估方法在效率、准确性与自动化水平方面的不足,提出了一种基于图文多模态的大模型评估方法。方法 首先,构建由结构参数、设计文本与结构图像组成的三模态输入体系,并对原始数据进行清洗、标准化与统一预处理;其次,分别利用多层感知机、预训练语言模型BERT与视觉Transformer对三类模态信息进行特征提取,并通过交叉注意力机制实现多模态深度融合;随后,设计非线性映射网络完成融合特征与功能性、安全性、控制性能等核心指标之间的建模关联;最后,借助本地部署的DeepSeek-VL R1 7B大型语言模型,自动生成结构化评估报告,实现从“特征理解”到“语义输出”的智能化转化。结果 实验基于搭建的欠驱动机器人数据集,采用300组设计方案进行模型训练并以某型桥式吊车欠驱动系统设计为案例进行验证。结果表明,模型评估结果 与专家打分之间的平均偏差为1.8%,相关性系数为0.94,自动生成的文本报告具备良好的专业性与工程实用性。结论 所提出的评估方法融合多模态语义建模与语言生成能力,显著提升了欠驱动机器人设计方案评估的智能化、标准化与可解释性水平,为复杂工业装备的设计质量控制与智能决策提供了关键技术支撑。

Abstract

To address the limitations of traditional evaluation methods for underactuated robot design schemes in terms of efficiency, accuracy, and automation, the work aims to propose a novel evaluation method based on vision-text multimodal large models. Firstly, a tri-modal input system comprising structural parameters, design documentation, and structural images was constructed, followed by data cleaning, standardization, and unified preprocessing. Then, modality-specific feature extraction was performed with a multilayer perceptron for structured data, a pretrained BERT model for text, and a vision transformer for images. These features were fused through a cross-attention mechanism to capture deep multimodal correlations. A nonlinear mapping network was subsequently designed to model the relationship between the fused features and core evaluation metrics such as functionality, safety, and control performance. Finally, a structured evaluation report was automatically generated with the locally deployed DeepSeek-VL R1 7B large language model, enabling intelligent transformation from feature understanding to semantic output. Experiments were conducted on a self-constructed robot dataset, involving 300 complete design cases. A specific bridge crane control system design was selected as a test case to validate the model. The results showed that the proposed method achieved an average deviation of 1.8% from expert scores and a correlation coefficient of 0.94. The automatically generated reports demonstrated strong professionalism and engineering applicability. The proposed evaluation method integrates multimodal semantic modeling with language generation capabilities, which significantly enhances the intelligence, standardization, and interpretability of underactuated robot design evaluations, providing robust technical support for design quality control and intelligent decision-making in complex engineering systems.

关键词

图文多模态 / 欠驱动机器人 / 深度评估模型 / 交叉注意力机制 / 大语言模型

Key words

multi-modal vision and text / underactuated robots / deep evaluation model / cross-attention mechanism / large language model

引用本文

导出引用
孙军强, 刘建国, 刘涛, 史青. 基于图文多模态的欠驱动机器人设计方案评估方法研究[J]. 包装工程(设计栏目). 2025, 46(12): 50-59 https://doi.org/10.19554/j.cnki.1001-3563.2025.12.004
SUN Junqiang, LIU Jianguo, LIU Tao, SHI Qing. Evaluation Method of Underactuated Robot Design Schemes Based on Vision-Text Multimodal Models[J]. Packaging Engineering. 2025, 46(12): 50-59 https://doi.org/10.19554/j.cnki.1001-3563.2025.12.004
中图分类号: TB482   

参考文献

[1] MINOR M, DULIMARTA H, DANGHI G, et al.Design, Implementation, and Evaluation of an Under-actuated Miniature Biped Climbing Robot[C]//Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000). Paris: IEEE, 2000, 3: 1999-2005.
[2] 李懿, 秦鹏, 杨会丰. 悬挂式协作机器人设计与分析[J]. 重庆理工大学学报(自然科学), 2020, 34(4): 130-135.
LI Y, QIN P, YANG H F.Design and Simulation of Suspension Cooperative Robot[J]. Journal of Chongqing University of Technology (Science) , 2020, 34(4): 130-135.
[3] CHAPELLE F, BIDAUD P.Evaluation Functions Synthesis for Optimal Design of Hyper-redundant Robotic Systems[J]. Mechanism and Machine Theory, 2006, 41(10): 1196-1212.
[4] 朱娟娟, 蔡星娟. 基于多准则决策与 GO 法的工业机器人可靠性评估方法研究[J]. 机床与液压, 2024, 52(17): 39-45.
ZHU J J, CAI X J.Research on Reliability Method for Industrial Robot Based on Multi-criteria Decision-making and GO Method[J]. Machine tool and Hydraulics. 2024, 52(17): 39-45.
[5] 陈鹏. 基于改进四阶矩的机器人运动可靠性评估方法研究[D]. 邯郸: 河北工程大学, 2019: 13-18.
CHEN P.Research on Motion Reliability of Robot Based on Improved Fourth -order Moment Estimation Method[D]. Handan: Hebei University of Engineering, 2019: 13-18.
[6] URREA C, PASCAL J. Design, Simulation, Comparison and Evaluation of Parameter Identification Methods for an Industrial Robot[J]. Computers & Electrical Engineering, 2018(67): 791-806.
[7] REYES F, KELLY R.Experimental Evaluation of Identification Schemes on a Direct Drive Robot[J]. Robotica, 1997, 15(5): 563-571.
[8] ALT B, ZAHN J, KIENLE C, et al.Human-AI Interaction in Industrial Robotics: Design and Empirical Evaluation of a User Interface for Explainable AI-Based Robot Program Optimization[J]. Procedia CIRP, 2024(130): 591-596.
[9] 王家玮, 罗静静, 王洪波, 等. 软镜递送机器人平台的设计与性能评估[J]. 中国科技论文, 2023, 18(8): 921-926.
WANG J W, LUO J J, WANG H B, et al.Design and Performance Evaluation of Visual Endotracheal Intubation Robot Platform[J]. China SciencePaper, 2023, 18(8): 921-926.
[10] CARRARA G, KALAY Y E, NOVEMBRI G.Multi-modal Representation of Design Lnowledge[J]. Automation in Construction, 1992, 1(2): 111-121.
[11] FENG F, WANG X, LI R.Cross-modal Retrieval with Correspondence Autoencoder[C]//Proceedings of the 22nd ACM International Conference on Multimedia, Orlando: ACM, 2014: 7-16.
[12] 马进, 范明浩, 马良山, 等. 基于图文多模态融合推理的产品创新方案设计方法研究[J]. 包装工程, 2024, 45(8): 21-28.
MA J, FAN M H, MA L S, et al.Innovative Product Design Schemes Based on Image-text Multi-modal Fusion Reasoning[J]. Packaging Engineering, 2024, 45(8): 21-28.
[13] SONG B, MILLER S, AHMED F.Attention-enhanced Multimodal Learning for Conceptual Design Evaluations[J]. Journal of Mechanical Design, 2023, 145(4): 041410.
[14] SU H, SONG B, AHMED F.Multi-modal Machine Learning for Vehicle Rating Predictions Using Image, Text, and Parametric Data[C]//Proceedings of the International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Boston:ASME, 2023.
[15] FAN Y, ZHOU Y, YUAN Z.Interior Design Evaluation Based on Deep Learning: A Multi-Modal Fusion Evaluation Mechanism[J]. Mathematics, 2024, 12(10): 1560.
[16] HE B, WANG S, LIU Y.Underactuated Robotics: A Review[J]. International Journal of Advanced Robotic Systems, 2019, 16(4): 1729881419862164.
[17] DEVLIN J, CHANG M W, LEE K, et al.Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis:ACL , 2019: 4171-4186.
[18] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al.An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[EB/OL]. (2022-02-12) [2024- 03-23]. arXiv preprint arXiv: 2010.11929, 2020.
[19] LU H, LIU W, ZHANG B, et al. Deepseek-vl: towards Real-world Vision-language Understanding[EB/OL]. (2022-02-12) [2024-09-23]. arXiv preprint arXiv: 2403. 05525, 2024.

基金

甘肃省自然科学基金项目(24JRRA710)

PDF(2297 KB)

Accesses

Citation

Detail

段落导航
相关文章

/