Fusing Global Domain Information and Local Semantic Information to Classify Financial Documents

Mengzhen Fan, Dawei Cheng, Fangzhou Yang, Siqiang Luo, Yifeng Luo, Weining Qian, Aoying Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Many institutions are devoted to providing investment advising services to stock investors to help them make sound investment decisions. Industry analysts at these institutions need to analyze huge amounts of financial news documents, and yield investment advising reports to the service subscribers. Automatic document classification is required to organize collected financial news documents into pre-defined fine-grained categories, before the document analysis tasks. It is challenging to implement accurate fine-grained classification over massive financial documents, because documents from close fine-grained categories are highly semantically similar, while existing classification methods may fail to differentiate the subtle differences for documents from close fine-grained categories. In this paper, we implement a document classification framework, named GraphSEAT, to classify financial documents for a leading financial information service provider in China. Specifically, we build a heterogeneous graph to model the global structure of our targeting financial documents, where documents and financial named entities are deemed as nodes, and a document is connected to a contained named entity with an edge, and we then train a graph convolutional network (GCN) with attention mechanisms, to learn an embedding representation containing domain information for a document. We also extract semantic information from a document's word sequence with a neural sequence encoder, and finally form an overall embedding representation for a document and make the prediction, via fusing the two learned representations of the document with attention mechanisms. We perform extensive experiments on our real-world financial news dataset and three public datasets, to evaluate the performance of the document classification framework, and the experimental results demonstrate that GraphSEAT outperforms all compared eight baseline models, especially on our dataset.

Original languageEnglish
Title of host publicationCIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages2413-2420
Number of pages8
ISBN (Electronic)9781450368599
DOIs
StatePublished - 19 Oct 2020
Event29th ACM International Conference on Information and Knowledge Management, CIKM 2020 - Virtual, Online, Ireland
Duration: 19 Oct 202023 Oct 2020

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference29th ACM International Conference on Information and Knowledge Management, CIKM 2020
Country/TerritoryIreland
CityVirtual, Online
Period19/10/2023/10/20

Keywords

  • attention
  • financial document classification
  • graph embedding

Fingerprint

Dive into the research topics of 'Fusing Global Domain Information and Local Semantic Information to Classify Financial Documents'. Together they form a unique fingerprint.

Cite this