A Hybrid Model Combining Formulae with Keywords for Mathematical Information Retrieval

  • Yuqi Shen
  • , Cheng Chen
  • , Yifan Dai
  • , Jinfang Cai
  • , Liangyu Chen*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Formula retrieval is an important research topic in Mathematical Information Retrieval (MIR). Most studies have focused on formula comparison to determine the similarity between mathematical documents. However, two similar formulae may appear in entirely different knowledge domains and have different meanings. Based on N-ary Tree-based Formula Embedding Model (NTFEM, our previous work in [Y. Dai, L. Chen, and Z. Zhang, An N-ary tree-based model for similarity evaluation on mathematical formulae, in Proc. 2020 IEEE Int. Conf. Systems, Man, and Cybernetics, 2020, pp. 2578-2584.], we introduce a new hybrid retrieval model, NTFEM-K, which combines formulae with their surrounding keywords for more accurate retrieval. By using keywords extraction technology, we extract keywords from context, which can supplement the semantic information of the formula. Then, we get the vector representations of keywords by FastText N-gram embedding model and the vector representations of formulae by NTFEM. Finally, documents are sorted according to the similarity between keywords, and then the ranking results are optimized by formula similarity. For performance evaluation, NTFEM-K is not only compared with NTFEM but also hybrid retrieval models combining formulae with long text and hybrid retrieval models combining formulae with their keywords using other keyword extraction algorithms. Experimental results show that the accuracy of top-10 results of NTFEM-K is at least 20% higher than that of NTFEM and can be 50% in some specific topics.

Original languageEnglish
Pages (from-to)1583-1602
Number of pages20
JournalInternational Journal of Software Engineering and Knowledge Engineering
Volume31
Issue number11-12
DOIs
StatePublished - 1 Dec 2021

Keywords

  • Formula embedding
  • Formula similarity
  • Keywords extraction
  • Mathematical information retrieval
  • Word embedding

Fingerprint

Dive into the research topics of 'A Hybrid Model Combining Formulae with Keywords for Mathematical Information Retrieval'. Together they form a unique fingerprint.

Cite this