Research Article

Syntax and Semantics based Efficient Text Classification Framework

by  Suganya. S, Gomathi. C, Mano Chitra. S
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 65 - Issue 15
Published: March 2013
Authors: Suganya. S, Gomathi. C, Mano Chitra. S
10.5120/11000-6182
PDF

Suganya. S, Gomathi. C, Mano Chitra. S . Syntax and Semantics based Efficient Text Classification Framework. International Journal of Computer Applications. 65, 15 (March 2013), 18-21. DOI=10.5120/11000-6182

                        @article{ 10.5120/11000-6182,
                        author  = { Suganya. S,Gomathi. C,Mano Chitra. S },
                        title   = { Syntax and Semantics based Efficient Text Classification Framework },
                        journal = { International Journal of Computer Applications },
                        year    = { 2013 },
                        volume  = { 65 },
                        number  = { 15 },
                        pages   = { 18-21 },
                        doi     = { 10.5120/11000-6182 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2013
                        %A Suganya. S
                        %A Gomathi. C
                        %A Mano Chitra. S
                        %T Syntax and Semantics based Efficient Text Classification Framework%T 
                        %J International Journal of Computer Applications
                        %V 65
                        %N 15
                        %P 18-21
                        %R 10.5120/11000-6182
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

This system proposes an efficient text classification approach which is based on multi – layer SVM-NN text classification and two-level representation model. Automated text classification is attractive because it frees organizations from the need of manually organizing document bases, which can be too expensive. This system proposes two-level representation model to represent text data, one is for representing syntactic information using tf-idf value and the other is for semantic information using Wikipedia. Further, a multi-layer text classification framework is designed to make use of the semantic and syntactic information. The proposed framework contains three SVM-NN classifiers in which two classifiers are applied on syntactic level and semantic level in parallel. The outputs of these two classifiers will be combined and given as input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets like 20Newsgroups and Reuters-21578 have shown that the proposed model improves the text classification performance.

References
  • Hotho, A. , Staab, S. , and Stumme, G. 2003. Wordnet improves textdocument clustering. In Proceedings of the semantic web workshop at the 26th ACM SIGIR ,pp 541–544.
  • Gabrilovich, E. , and Markovitch, S. 2005. Feature generation for text categorization using word knowledge. In Proceedings of the 19 international joint conference on artificial intelligence, Edinburgh ,pp 1048–1053.
  • Gabrilovich, E. , and Markovitch, S. 2006. Overcoming the brittleness bottleneck using Wikipedia Enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st AAAI, Boston, MA, USA ,pp 1606–1611.
  • Banerjee, S. , Ramanathan, K. , and Gupta, A. 2007. Clustering short texts using Wikipedia. InProceedings of the 30th ACM SIGIR ,pp 787–788.
  • Hu, J. , Fang, L. , Cao, Y. , Zeng, H. , Li,H. ,Yang, Q. , and Chen,Z. 2008. Enhancing text clustering by leveraging wikipedia semantics. In Proceedings of the 31st ACM SIGIR ,pp 179 -186.
  • Wang, P. , and Domeniconi, C. 2008. Building semantic kernels for text classification using Wikipedia. In Proceedings of the 14th ACM SIGKDD, New York, NY, USA,pp 713–721.
  • Chang, M. , and Roth, D. 2008. Importance of semantic representation: Dataless classification. In Proceedings of th 23rd AAAI conference on artificial intelligence , pp 830–835.
  • Isa, D. , Lee, L. H. , Kallimani, V. P. , and Rajkumar, R. 2008. Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Transactions on Knowledge and Data Engineering, 20(9),pp1264–1272.
  • Joachims, T. 2008. Text categorization with support vector machines: learning with many relevant features. In Proceedings of the 10th European conference on machine learning (ECML 98), pp 137–142.
  • Han, E. H. , Karypis, G. & Kumar, V. 2008. Text categorization using weighted adjusted K-nearest neighbor classification. Technical Report, Department of Computer Science and Engineering, Army HPC Research Centre, University of Minnesota, Minneapolis, USA.
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Wikipedia Semantics Text classification Text representation Multi-layer classification SVM

Powered by PhDFocusTM