| Topic path: Top / SIG-FIN-018-17


//*Revenue Prediction based on Multi-task Max-margin Topic Models 

*マルチタスク最大マージントピックモデルによる収益予測 [#tf9cf3aa]

**著者 [#nf474624]
中川雄太(神戸大学大学院 システム情報学研究科),上野良輔(神戸大学大学院 システム情報学研究科),江口浩二(神戸大学大学院 システム情報学研究科)

**概要 [#wa52b671]
//Due to the development of information technology in recent years, the diversity in the form of information transmission has increased substantially and the amount of document data has grown exponentially in the world. This kind of information can be found in various fields including economic and financial field, such as in the form of document data of company valuation in online news and the form of numerical data of company financial indices and global exchange transactions on economic and financial websites. Researchers and practitioners in this field recently have a keen interest in discovering new ideas by making full use of these data. One promising approach to analyzing large-scale data is topic modeling, typically by Latent Dirichlet Allocation (LDA). This model assumes that each group (e.g., document) is represented as a mixture of latent topics, where each latent topic is represented as a distribution of data points (e.g., words). In general, real-world document data are associated with side information in the forms of discrete and continuous representations. Maximum Entropy Discrimination LDA (MedLDA) is a supervised topic model that can improve accuracy of latent topic estimation by making use of the side information associated with the documents. In the model, a margin maximization method as in Support Vector Machine (SVM) is incorporated into the framework of topic modeling with LDA, and the estimated topics are used as features for the classifier. However, MedLDA cannot be applied to document data that are associate with both discrete and continuous labels. In this paper, we generalize Multi-task MedLDA (MultiMedLDA) that simultaneously addresses classification and regression tasks in an extension of MedLDA. For document data with multiple types of labels, MultiMedLDA introduces an optimization method called dual decomposition to solve the multi-objective optimization problem with multi-tasks involving classification and regression tasks. It is expected that prediction performance can be improved by estimating latent topics using more side information. In this paper, we evaluate the effectiveness of MultiMedLDA through experiments with enterprise evaluation documents associated with continuous labels of change rate of operating incomes and discrete labels of categories of business, and discuss it compared with single-task MedLDA.

**キーワード [#v1220acc]
//Topic models, Latent dirichlet allocation, Multi-tasks

**論文 [#s442d956]

トップ   編集 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS