Using surrounding text of formula towards more accurate mathematical information retrieval

  • Cheng Chen
  • , Yifan Dai
  • , Yuqi Shen
  • , Jinfang Cai
  • , Liangyu Chen*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Formula retrieval is an important research topic in Mathematical Information Retrieval (MIR). Most studies have focused on comparing formulae to determine the similarity between mathematical documents. However, two similar formulae may appear in completely different knowledge domains and have different meanings. Based on N-ary Tree-based Formula Embedding Model (NTFEM), we introduce a new hybrid retrieval model combining formula with its surrounding text for more accurate retrieval. Using keywords extraction technology, we extract keywords from text around the formula which can supplement the semantic information of formula. Then we get the representation vectors of keywords by FastText N-gram embedding model, and the representation vectors of formulae by NTFEM. Finally, documents are first sorted according to the similarity of keywords, and then the ranking results are optimized by formula similarity. Experimental results show that the accuracy of top-10 results is at least 20% higher than that of NTFEM and can be 50% in some specific topics.

Original languageEnglish
Title of host publicationProceedings - SEKE 2021
Subtitle of host publication33rd International Conference on Software Engineering and Knowledge Engineering
PublisherKnowledge Systems Institute Graduate School
Pages622-627
Number of pages6
ISBN (Electronic)1891706527
DOIs
StatePublished - 2021
Event33rd International Conference on Software Engineering and Knowledge Engineering, SEKE 2021 - Pittsburgh, United States
Duration: 1 Jul 202110 Jul 2021

Publication series

NameProceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE
Volume2021-July
ISSN (Print)2325-9000
ISSN (Electronic)2325-9086

Conference

Conference33rd International Conference on Software Engineering and Knowledge Engineering, SEKE 2021
Country/TerritoryUnited States
CityPittsburgh
Period1/07/2110/07/21

Keywords

  • Extraction

Fingerprint

Dive into the research topics of 'Using surrounding text of formula towards more accurate mathematical information retrieval'. Together they form a unique fingerprint.

Cite this