SIG-FIN-018-07

| Topic path: Top/SIG-FIN-018-07
  • 追加された行はこの色です。
  • 削除された行はこの色です。
  • SIG-FIN-018-07 へ行く。

[[第18回研究会]]

*Cross-lingual news article comparison using bi-graph clustering and Siamese-LSTM [#s8f0c98c]

**Authors [#u82a585c]
Enda Liu (The University of Tokyo), Kiyoshi Izumi (The University of Tokyo), Kota Tsubouchi (Yahoo! JAPAN), Tatsuo Yamashita (Yahoo! JAPAN)

**Abstract [#q754ae35]
Calculating similarity score for monolingual text is a popular task since it could be used for various text mining system. However seldom research is focusing on multilingual text resources. On the other hand, machine learning based algorithms such as CBOW word embedding and clustering are widely used in extracting features of text. In this research, we develop and train a model that could calculate the similarity of the two finance news reports, by utilizing CBOW, spherical clustering, bi-graph extraction as well as the Siamese-LSTM deep learning model. It performs well even though the training news data are closely related in the financial domain and also helps us to analyze the relationship among news reports written in different languages.


**Key Words [#d58afbe6]
text mining, cross-lingual text similarity, natural language processing, machine learning, recurrent neural network, clustering


**Paper [#i3ac442f]

//(3月6日以降に公表いたします)
&ref(SIG-FIN-018-07.pdf);
トップ   編集 差分 履歴 添付 複製 名前変更 リロード   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS