MCFC: A Momentum-Driven Clicked Feature Compressed Pre-trained Language Model for Information Retrieval

  • Dongyang Li
  • , Ruixue Ding
  • , Pengjun Xie
  • , Xiaofeng He*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Information Retrieval (IR) pre-trained language models are trained from large-scale retrieval-based corpora to promote the task-specific knowledge capacity. Previous works focus on general retrieval pre-trained datasets, which cover inter-document data and intra-document data, paying less attention to the important asset of clicked data which is commonly adopted in recommendation domain. However, the utilization of easily accessible clicked data is a non-trivial operation due to its characteristics of large volume and insufficient refinement, which affect model learning efficiency and imply the risk of distorting learning directions. In this paper, we propose a Momentum-Driven Clicked Feature Compressed Pre-trained Language Models for Information Retrieval (MCFC). Specifically, to tackle the effective learning pace on large amounts of data, we generalize multiple similar feature instances and compress the dispersed knowledge together at the query granularity, named Multi-Instance Information Integration. Meanwhile, more relevant detection between queries and documents is eager in coarse clicked data background, we leverage a momentum-driven adjusting mechanism to refine the text representations, named Continuous Debiasing Calibration. Extensive experiments on downstream datasets validate the superiority of our work to other recent strong baselines.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 13th National CCF Conference, NLPCC 2024, Proceedings
EditorsDerek F. Wong, Zhongyu Wei, Muyun Yang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages69-82
Number of pages14
ISBN (Print)9789819794300
DOIs
StatePublished - 2025
Event13th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2024 - Hangzhou, China
Duration: 1 Nov 20243 Nov 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15359 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2024
Country/TerritoryChina
CityHangzhou
Period1/11/243/11/24

Keywords

  • Clicked Feature
  • Information Retrieval
  • Pre-trained Language Model

Fingerprint

Dive into the research topics of 'MCFC: A Momentum-Driven Clicked Feature Compressed Pre-trained Language Model for Information Retrieval'. Together they form a unique fingerprint.

Cite this