Data Valuation with Shapley-based Methods for Medical Image Classification
Dosyalar
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
This study introduces novel approaches to data valuation in medical image classification, focusing on the Gradient Shapley and Improved Gradient Shapley methods. These methods aim to reduce data selection costs while improving the model performance, making them highly practical for training processes. The Gradient Shapley method evaluates the contributions of individual data samples to model performance based on a robust theoretical foundation. In addition, the Improved Gradient Shapley method enhances computational efficiency and demonstrates superior performance, particularly on noisy or imbalanced datasets. Experiments conducted on the MedMNIST dataset reveal that both methods achieve competitive accuracy and AUC values even with significantly reduced data. For instance, in the PathMNIST dataset, using only 10% of the data resulted in an AUC value of 96.6%, which is remarkably close to the baseline AUC value of 98.3% achieved with the full dataset. In particular, the Shapley-based methods have shown better classification performance with ≤50% of the full data in some datasets. This study significantly improves data valuation processes in medical image classification. The findings highlight the potential of Shapley's value-based methods to optimize training processes without sacrificing performance. They offer a scalable and efficient method for real-world applications in critical domains like healthcare. Future research could explore integrating these methods with other data selection approaches to further enhance data valuation processes.










