A Hybrid Feature Selection Approach Based on LSI for Classification of Urdu Text
Künye
RASHEED, Imran, Haider BANKA & Hamaid Mahmood KHAN. "A Hybrid Feature Selection Approach Based on LSI for Classification of Urdu Text". Studies in Computational Intelligence, 907 (2021): 3-18.Özet
The feature selection method plays a crucial role in text classification to
minimizing the dimensionality of the features and accelerating the learning process
of the classifier. Text classification is the process of dividing a text into different
categories based on their content and subject. Text classification techniques have
been applied to various domains such as medical, political, news, and legal domains,
which show that the adaptation of domain-relevant features could improve the classification
performance. Despite the existence of plenty of research work in the area
of classification in several languages across the world, there is a lack of such work
in Urdu due to the shortage of existing resources. In this paper, First, we present a
proposed hybrid feature selection approach (HFSA) for text classification of Urdu
news articles. Second, we incorporate widely used filter selection approaches along
with Latent Semantic Indexing (LSI) to extract essential features of Urdu documents.
The hybrid approach tested on the Support Vector Machine (SVM) classifier
on Urdu “ROSHNI” dataset. The evaluated results were used to compare with the
results obtained by individual filter feature selection methods. Also, the approach is
compared to the baseline feature selection method. The proposed approach results
show a better classification with promising accuracy and better efficiency.