A Hybrid Feature Selection Approach Based on LSI for Classification of Urdu Text

Rasheed, Imran; Banka, Haider; Khan, Hamaid Mahmood

View/Open

Kitap Bölümü (500.1Kb)

Access

info:eu-repo/semantics/embargoedAccess

Date

2021

Author

Rasheed, Imran
Banka, Haider
Khan, Hamaid Mahmood

Metadata

Show full item record

Citation

RASHEED, Imran, Haider BANKA & Hamaid Mahmood KHAN. "A Hybrid Feature Selection Approach Based on LSI for Classification of Urdu Text". Studies in Computational Intelligence, 907 (2021): 3-18.

Abstract

The feature selection method plays a crucial role in text classification to minimizing the dimensionality of the features and accelerating the learning process of the classifier. Text classification is the process of dividing a text into different categories based on their content and subject. Text classification techniques have been applied to various domains such as medical, political, news, and legal domains, which show that the adaptation of domain-relevant features could improve the classification performance. Despite the existence of plenty of research work in the area of classification in several languages across the world, there is a lack of such work in Urdu due to the shortage of existing resources. In this paper, First, we present a proposed hybrid feature selection approach (HFSA) for text classification of Urdu news articles. Second, we incorporate widely used filter selection approaches along with Latent Semantic Indexing (LSI) to extract essential features of Urdu documents. The hybrid approach tested on the Support Vector Machine (SVM) classifier on Urdu “ROSHNI” dataset. The evaluated results were used to compare with the results obtained by individual filter feature selection methods. Also, the approach is compared to the baseline feature selection method. The proposed approach results show a better classification with promising accuracy and better efficiency.

Source

Studies in Computational Intelligence

Volume

907

URI

https://hdl.handle.net/11352/3492