TY - GEN
T1 - Fusing Global Domain Information and Local Semantic Information to Classify Financial Documents
AU - Fan, Mengzhen
AU - Cheng, Dawei
AU - Yang, Fangzhou
AU - Luo, Siqiang
AU - Luo, Yifeng
AU - Qian, Weining
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/10/19
Y1 - 2020/10/19
N2 - Many institutions are devoted to providing investment advising services to stock investors to help them make sound investment decisions. Industry analysts at these institutions need to analyze huge amounts of financial news documents, and yield investment advising reports to the service subscribers. Automatic document classification is required to organize collected financial news documents into pre-defined fine-grained categories, before the document analysis tasks. It is challenging to implement accurate fine-grained classification over massive financial documents, because documents from close fine-grained categories are highly semantically similar, while existing classification methods may fail to differentiate the subtle differences for documents from close fine-grained categories. In this paper, we implement a document classification framework, named GraphSEAT, to classify financial documents for a leading financial information service provider in China. Specifically, we build a heterogeneous graph to model the global structure of our targeting financial documents, where documents and financial named entities are deemed as nodes, and a document is connected to a contained named entity with an edge, and we then train a graph convolutional network (GCN) with attention mechanisms, to learn an embedding representation containing domain information for a document. We also extract semantic information from a document's word sequence with a neural sequence encoder, and finally form an overall embedding representation for a document and make the prediction, via fusing the two learned representations of the document with attention mechanisms. We perform extensive experiments on our real-world financial news dataset and three public datasets, to evaluate the performance of the document classification framework, and the experimental results demonstrate that GraphSEAT outperforms all compared eight baseline models, especially on our dataset.
AB - Many institutions are devoted to providing investment advising services to stock investors to help them make sound investment decisions. Industry analysts at these institutions need to analyze huge amounts of financial news documents, and yield investment advising reports to the service subscribers. Automatic document classification is required to organize collected financial news documents into pre-defined fine-grained categories, before the document analysis tasks. It is challenging to implement accurate fine-grained classification over massive financial documents, because documents from close fine-grained categories are highly semantically similar, while existing classification methods may fail to differentiate the subtle differences for documents from close fine-grained categories. In this paper, we implement a document classification framework, named GraphSEAT, to classify financial documents for a leading financial information service provider in China. Specifically, we build a heterogeneous graph to model the global structure of our targeting financial documents, where documents and financial named entities are deemed as nodes, and a document is connected to a contained named entity with an edge, and we then train a graph convolutional network (GCN) with attention mechanisms, to learn an embedding representation containing domain information for a document. We also extract semantic information from a document's word sequence with a neural sequence encoder, and finally form an overall embedding representation for a document and make the prediction, via fusing the two learned representations of the document with attention mechanisms. We perform extensive experiments on our real-world financial news dataset and three public datasets, to evaluate the performance of the document classification framework, and the experimental results demonstrate that GraphSEAT outperforms all compared eight baseline models, especially on our dataset.
KW - attention
KW - financial document classification
KW - graph embedding
UR - https://www.scopus.com/pages/publications/85095864312
U2 - 10.1145/3340531.3412707
DO - 10.1145/3340531.3412707
M3 - 会议稿件
AN - SCOPUS:85095864312
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2413
EP - 2420
BT - CIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 29th ACM International Conference on Information and Knowledge Management, CIKM 2020
Y2 - 19 October 2020 through 23 October 2020
ER -