2013-02-21 (木) 18:38:47 (2842d)


Text Processing for Classification by V. Cho, J. Zhang Journal of Computational Intelligence in Finance 7 (1999) 2, pp. 6-22.


These days textual information becomes increasingly available through the Web. This makes text an attractive resource from which to mine knowledge. The major difficulty in mining textual data is that the information is unstructured. Hence the data has to be preprocessed first so as to obtain some form of structured data which is amenable to data mining techniques. This paper focuses on this preprocessing step. That is, methods and techniques are presented enabling the use of text as an information source to solve classification problems. Novel text processing schemes based on keyword record counting are proposed. The classification performance achieved by the various preprocessing techniques are measured and compared on an extremely challenging problem, the forecasting of stock market movements. The prediction accuracy achieved by the best text processing method is very close to what can be expected by human experts.

