Is diabetic retinopathy grading biased by imbalanced datasets?

Diabetic retinopathy (DR) is one of the most severe complications of diabetes and the leading cause of vision loss and even blindness. Retinal screening contributes to early detection and treatment of diabetic retinopathy. This eye disease has five stages, namely normal, mild, moderate, severe and proliferative diabetic retinopathy. Usually, highly trained ophthalmologists are capable of manually identifying the presence or absence of retinopathy in retinal images. Several automated deep learning (DL) based approaches have been proposed and they have been proven to be a powerful tool for DR detection and classification. However, these approaches are usually biased by the cardinality of each grade set, as the overall accuracy benefits the largest sets in detriment of smaller ones. In this paper, we applied several state-of-the-art DL approaches, using a 5-fold cross-validation technique. The experiments were conducted on a balanced DDR dataset containing 31330 retina fundus images by completing the small grade sets with samples from other well known datasets. This balanced dataset increases robustness of training and testing tasks as they used samples from several origins and obtained with different equipment. The results confirm the bias introduced by using imbalanced datasets in automatic diabetic retinopathy grading.

Is diabetic retinopathy grading biased by imbalanced datasets? Artigo de Conferência