|
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
|
| Volume 183 - Issue 37 |
| Published: Nov 2021 |
| Authors: Elaf Alhazmi |
10.5120/ijca2021921765
|
Elaf Alhazmi . Fake News Classification in Machine Learning with Different Word Representations. International Journal of Computer Applications. 183, 37 (Nov 2021), 1-7. DOI=10.5120/ijca2021921765
@article{ 10.5120/ijca2021921765,
author = { Elaf Alhazmi },
title = { Fake News Classification in Machine Learning with Different Word Representations },
journal = { International Journal of Computer Applications },
year = { 2021 },
volume = { 183 },
number = { 37 },
pages = { 1-7 },
doi = { 10.5120/ijca2021921765 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2021
%A Elaf Alhazmi
%T Fake News Classification in Machine Learning with Different Word Representations%T
%J International Journal of Computer Applications
%V 183
%N 37
%P 1-7
%R 10.5120/ijca2021921765
%I Foundation of Computer Science (FCS), NY, USA
Text classification has been effectively applied in a variety of domains, one of which is the detection of fake news. Working with a classification framework is an important approach for detecting fake news. One of the most significant steps in converting text to numbers in a classification framework is feature extraction. In this paper, we compare the effectiveness of several feature extraction approaches such as bag of words, TF-IDF, and one-hot encoding. For the experiment, we measured the accuracy of the classification and evaluated the best/worst classifier in three techniques using three fake news detection data sets and six machine learning classifiers. Following our tests, we discovered that employing a bag of words, also known as CountVectorizer, and the TF-IDF approach in text classification for selected data outperforms one-hot encoding. Despite the fact that logistic regression and support vector machine both produce valid results by using bag of words and TF-IDF, random forest classifier is the only algorithm that consistently produces accurate results in all three feature extraction methods. The accuracy of support vector machine in one-hot encoding was the lowest even though the algorithm produced substantial results in the other two extraction procedures.