实验室3篇论文被CCF A类会议TheWebConf和自然语言处理顶级会议LREC-COLING 2024接收
时间:2024年02月26日 | 栏目:新闻动态
实验室孟子翔同学的论文《MMLSCU: A Dataset for Multi-modal Multi-domain Live Streaming Comment Understanding》被CCF A类会议TheWebConf接收。以下是论文的摘要。
With the increasing popularity of live streaming, the interactions from viewers during a live streaming can provide more specific and constructive feedback for both the streamer and platform. In such scenario, the primary and most direct feedback method from the audience is through comments. Thus, mining these live streaming comments to unearth the intentions behind them and, in turn, aiding streamers to enhance their live streaming quality is significant for the well development of live streaming ecosystem. To this end, we introduce the MMLSCU dataset, containing 50,129 intention-annotated comments across multiple modalities (text, images, vi-deos, audio) from eight streaming domains. Using multimodal pretrained large model and drawing inspiration from the Chain of Thoughts (CoT) concept, we implement an end-to-end model to sequentially perform the following tasks: viewer comment intent detection $\Rightarrow$ intent cause mining $\Rightarrow$ viewer comment explanation $\Rightarrow$ streamer policy suggestion. We employ distinct branches for video and audio to process their respective modalities. After obtaining the video and audio representations, we conduct a multimodal fusion with the comment. This integrated data is then fed into the large language model to perform inference across the four tasks following the CoT framework. Experimental results indicate that our model outperforms three multimodal classification baselines on comment intent detection and streamer policy suggestion, and one multimodal generation baselines on intent cause mining and viewer comment explanation. Compared to the models using only text, our multimodal setting yields superior outcomes. Moreover, incorporating CoT allows our model to enhance comment interpretation and more precise suggestions for the streamers. Our proposed dataset and model will bring new research attention on multimodal live streaming comment understanding.
实验室高强同学的论文《Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information》被自然语言处理顶级会议LREC-COLING 2024接收。以下是论文的摘要。
Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, arg0, and arg1), lack the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for events that are highly similar, events described differently but with the same meaning, or events where the argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents separately. Subsequently, graphs were constructed separately for each, processed using GAT, and finally underwent co-reference clustering. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 documents. Our model outperformed the baseline by 5.6 and 5.7 $F_1$ points of $B^3$ on the English and Chinese datasets respectively.
实验室陈蕾同学的论文《What Factors Influence LLMs' Judgments? A Case Study on Question Answering》被自然语言处理顶级会议LREC-COLING 2024接收。以下是论文的摘要。
Large Language Models (LLMs) are now being considered as judges of high efficiency to evaluate the quality of answers generated by candidate models. However, their judgments may be influenced by complex scenarios and inherent biases, raising concerns about their reliability. This study aims to bridge this gap by introducing four unexplored factors and examining the performance of LLMs as judges, namely answer quantity, inducing statements, judging strategy, and judging style. Additionally, we introduce a new dimension of question difficulty to provide a more comprehensive understanding of LLMs' judgments across varying question intricacies. We employ ChatGPT and GPT-4 as judges and conduct experiments on Vicuna Benchmark and MT-bench. Our study reveals that LLMs' judging abilities are susceptible to the influence of these four factors, and analyzing from the newly proposed dimension of question difficulty is highly necessary. We also provide valuable insights into optimizing LLM performance as judges, enhancing their reliability and adaptability across diverse evaluation scenarios.