A Hybrid Feature Selection Approach Based on LSI for Classification of Urdu Text
MetadataShow full item record
CitationRASHEED, Imran, Haider BANKA & Hamaid Mahmood KHAN. "A Hybrid Feature Selection Approach Based on LSI for Classification of Urdu Text". Studies in Computational Intelligence, 907 (2021): 3-18.
The feature selection method plays a crucial role in text classification to minimizing the dimensionality of the features and accelerating the learning process of the classifier. Text classification is the process of dividing a text into different categories based on their content and subject. Text classification techniques have been applied to various domains such as medical, political, news, and legal domains, which show that the adaptation of domain-relevant features could improve the classification performance. Despite the existence of plenty of research work in the area of classification in several languages across the world, there is a lack of such work in Urdu due to the shortage of existing resources. In this paper, First, we present a proposed hybrid feature selection approach (HFSA) for text classification of Urdu news articles. Second, we incorporate widely used filter selection approaches along with Latent Semantic Indexing (LSI) to extract essential features of Urdu documents. The hybrid approach tested on the Support Vector Machine (SVM) classifier on Urdu “ROSHNI” dataset. The evaluated results were used to compare with the results obtained by individual filter feature selection methods. Also, the approach is compared to the baseline feature selection method. The proposed approach results show a better classification with promising accuracy and better efficiency.