In the real world, multimodal sentiment analysis (在现实世界中,多模态情感分析(MSA) enables the capture and analysis of sentiments by fusing multimodal information, thereby enhancing the understanding of real-world environments. The key challenges lie in handling the noise in the acquired data and achieving effective multimodal fusion. When processing the noise in data, existing methods utilize the combination of multimodal features to mitigate errors in sentiment word recognition caused by the performance limitations of automatic speech recognition (ASR) models.)通过融合多模态信息来捕获和分析情感,从而增强对真实世界环境的理解。关键挑战在于处理采集数据中的噪声并实现有效的多模态融合。在处理数据中的噪声时,现有方法利用多模态特征的组合来减轻由自动语音识别(ASR)模型的性能限制引起的情感词识别错误。