TY - GEN
T1 - Automatic text generation via text extraction based on submodular
AU - Ai, Lisi
AU - Li, Na
AU - Zheng, Jianbing
AU - Gao, Ming
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Automatic text generation is the generation of natural language texts by computer. It has many applications, including automatic report generation, online promotion, etc. However, the problem is still a challenged task due to the lack of readability and coherence even there are many existing works studied it. In this paper, we propose a two-phase algorithm, which consists of text cleanup and text extraction, to automatically generate text from multiple texts. In the first phase, we generate paragraphs based on the topic modeling and clustering analysis. In the second phase, we model the text extraction as a set covering problem after we find the keywords in terms of the scores of TF-IDF, and solve the problem via employing the tool of submodular. We conduct a set of experiments to evaluate our proposed method and experimental results demonstrate the effectiveness of our proposed method by comparing with some comparable baselines.
AB - Automatic text generation is the generation of natural language texts by computer. It has many applications, including automatic report generation, online promotion, etc. However, the problem is still a challenged task due to the lack of readability and coherence even there are many existing works studied it. In this paper, we propose a two-phase algorithm, which consists of text cleanup and text extraction, to automatically generate text from multiple texts. In the first phase, we generate paragraphs based on the topic modeling and clustering analysis. In the second phase, we model the text extraction as a set covering problem after we find the keywords in terms of the scores of TF-IDF, and solve the problem via employing the tool of submodular. We conduct a set of experiments to evaluate our proposed method and experimental results demonstrate the effectiveness of our proposed method by comparing with some comparable baselines.
KW - Automatic text generation
KW - K-Means
KW - Massive information
KW - Submodular
UR - https://www.scopus.com/pages/publications/85034604144
U2 - 10.1007/978-3-319-69781-9_23
DO - 10.1007/978-3-319-69781-9_23
M3 - 会议稿件
AN - SCOPUS:85034604144
SN - 9783319697802
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 237
EP - 246
BT - Web and Big Data - APWeb-WAIM 2017 International Workshops
A2 - Moon, Yang-Sae
A2 - Song, Shaoxu
A2 - Renz, Matthias
PB - Springer Verlag
T2 - 1st Asia-Pacific Web and Web-Age Information Management Joint Conference on Web and Big Data, APWeb-WAIM 2017 held in Conjuction with the International Workshop on Mobile Web Data Analytics, MWDA 2017, International Workshop on Hot Topics in Big Spatial Data and Urban Computing, HotSpatial 2017, International Workshop on Graph Data Management and Analysis, GDMA 2017, 2nd International Workshop on Data Driven Crowdsourcing, DDC 2017, 2nd International Workshop on Spatio-temporal Data Management and Analytics, SDMA 2017 and International Workshop on Mobility Analytics from Spatial and Social Data, MASS 2017
Y2 - 7 July 2017 through 9 July 2017
ER -