Data Valuation with Shapley-based Methods for Medical Image Classification
Künye
AKÇELİK, Zeliha Kaya, Reyhan HOŞAVCI, Sümeyye Zülal DİK, Musa AYDIN & Zeki KUŞ. "Data Valuation with Shapley-based Methods for Medical Image Classification". 2025 10th International Conference on Machine Learning Technologies, (2025): 461-467.Özet
This study introduces novel approaches to data valuation in medical image classification, focusing on the Gradient Shapley and Improved Gradient Shapley methods. These methods aim to reduce data selection costs while improving the model performance, making them highly practical for training processes. The Gradient Shapley method evaluates the contributions of individual data samples to model performance based on a robust theoretical foundation. In addition, the Improved Gradient Shapley method enhances computational efficiency and demonstrates superior performance, particularly on noisy or imbalanced datasets. Experiments conducted on the MedMNIST dataset reveal that both methods achieve competitive accuracy and AUC values even with significantly reduced data. For instance, in the PathMNIST dataset, using only 10% of the data resulted in an AUC value of 96.6%, which is remarkably close to the baseline AUC value of 98.3% achieved with the full dataset. In particular, the Shapley-based methods have shown better classification performance with ≤50% of the full data in some datasets. This study significantly improves data valuation processes in medical image classification. The findings highlight the potential of Shapley's value-based methods to optimize training processes without sacrificing performance. They offer a scalable and efficient method for real-world applications in critical domains like healthcare. Future research could explore integrating these methods with other data selection approaches to further enhance data valuation processes.



















