Syntax and Semantics based Efficient Text Classification Framework

Suganya. S; Gomathi. C; Mano Chitra. S

Research Article

Syntax and Semantics based Efficient Text Classification Framework

by Suganya. S, Gomathi. C, Mano Chitra. S

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 65 - Issue 15

Published: March 2013

Authors: Suganya. S, Gomathi. C, Mano Chitra. S

10.5120/11000-6182

PDF

Suganya. S, Gomathi. C, Mano Chitra. S . Syntax and Semantics based Efficient Text Classification Framework. International Journal of Computer Applications. 65, 15 (March 2013), 18-21. DOI=10.5120/11000-6182

                        @article{ 10.5120/11000-6182,
                        author  = { Suganya. S,Gomathi. C,Mano Chitra. S },
                        title   = { Syntax and Semantics based Efficient Text Classification Framework },
                        journal = { International Journal of Computer Applications },
                        year    = { 2013 },
                        volume  = { 65 },
                        number  = { 15 },
                        pages   = { 18-21 },
                        doi     = { 10.5120/11000-6182 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2013
                        %A Suganya. S
                        %A Gomathi. C
                        %A Mano Chitra. S
                        %T Syntax and Semantics based Efficient Text Classification Framework%T 
                        %J International Journal of Computer Applications
                        %V 65
                        %N 15
                        %P 18-21
                        %R 10.5120/11000-6182
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

This system proposes an efficient text classification approach which is based on multi – layer SVM-NN text classification and two-level representation model. Automated text classification is attractive because it frees organizations from the need of manually organizing document bases, which can be too expensive. This system proposes two-level representation model to represent text data, one is for representing syntactic information using tf-idf value and the other is for semantic information using Wikipedia. Further, a multi-layer text classification framework is designed to make use of the semantic and syntactic information. The proposed framework contains three SVM-NN classifiers in which two classifiers are applied on syntactic level and semantic level in parallel. The outputs of these two classifiers will be combined and given as input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets like 20Newsgroups and Reuters-21578 have shown that the proposed model improves the text classification performance.

References

Hotho, A. , Staab, S. , and Stumme, G. 2003. Wordnet improves textdocument clustering. In Proceedings of the semantic web workshop at the 26th ACM SIGIR ,pp 541–544.
Gabrilovich, E. , and Markovitch, S. 2005. Feature generation for text categorization using word knowledge. In Proceedings of the 19 international joint conference on artificial intelligence, Edinburgh ,pp 1048–1053.
Gabrilovich, E. , and Markovitch, S. 2006. Overcoming the brittleness bottleneck using Wikipedia Enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st AAAI, Boston, MA, USA ,pp 1606–1611.
Banerjee, S. , Ramanathan, K. , and Gupta, A. 2007. Clustering short texts using Wikipedia. InProceedings of the 30th ACM SIGIR ,pp 787–788.
Hu, J. , Fang, L. , Cao, Y. , Zeng, H. , Li,H. ,Yang, Q. , and Chen,Z. 2008. Enhancing text clustering by leveraging wikipedia semantics. In Proceedings of the 31st ACM SIGIR ,pp 179 -186.
Wang, P. , and Domeniconi, C. 2008. Building semantic kernels for text classification using Wikipedia. In Proceedings of the 14th ACM SIGKDD, New York, NY, USA,pp 713–721.
Chang, M. , and Roth, D. 2008. Importance of semantic representation: Dataless classification. In Proceedings of th 23rd AAAI conference on artificial intelligence , pp 830–835.
Isa, D. , Lee, L. H. , Kallimani, V. P. , and Rajkumar, R. 2008. Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Transactions on Knowledge and Data Engineering, 20(9),pp1264–1272.
Joachims, T. 2008. Text categorization with support vector machines: learning with many relevant features. In Proceedings of the 10th European conference on machine learning (ECML 98), pp 137–142.
Han, E. H. , Karypis, G. & Kumar, V. 2008. Text categorization using weighted adjusted K-nearest neighbor classification. Technical Report, Department of Computer Science and Engineering, Army HPC Research Centre, University of Minnesota, Minneapolis, USA.

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Wikipedia Semantics Text classification Text representation Multi-layer classification SVM