实验室一篇论文被ACM MM 2023接收
时间:2023年07月29日 | 栏目:新闻动态
近日,实验室博士生李波波关于多模态对话情感分析的研究成果被ACM MM 2023接收。ACM MM是多媒体领域顶级会议,将于2023年10月在加拿大首都渥太华召开。
Title: Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
Abstract: It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Existing works mostly treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for feature utility maximization. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM), decoupling the features into both the modality space and utterance space respectively. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively, both of which together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.