CLASSIFICATION OF HEALTHY AND DIABETIC PATIENT BASED UPON EYE’S RETINA HEALTH CONDITIONS USING MACHINE LEARNING

: Diabetes has become a global and common disease. There are numerous effects of this disease. Its effect can leads to complete blindness if proper medical precautions are not taken timely. The objective of this research is to classify the patient whether the patient is diabetic or not based upon the training of the model by binary classification of Diabetic Retinopathy (DR). We used available dataset that consists of many eye retina images. A hybrid methodology used in this research that consists of histograms and extraction of first and second order feathers. We investigated a lot of difference between the histograms of normal retina of eye than the affected eyes. The researchers generated histograms of the eye retina images, fetch its first order (mean, median, variance etc. ) and second order features (energy, entropy, homogeneity and contrast etc.). We achieved 0.814 precision, 0.821 accuracy, 0.821 and 0.817 f1 score using decision tree. Hence it can be used effectively for identifying DR and early detection of diabetic patient.


INTRODUCTION
It is a common and famous proverb that prevention is better than cure.Therefore, in order to save the eyes of anyone else not only locally, nationally but also at internationally, we conducted research to identify and classify a normal human being as s a normal person or diabetic patient based on the health conditions of the retina of eyes using machine learning.The importance of Machine Learning (ML) is increasing day by day with its vide range applications in every field of life especially in medical sciences.The use of ML to analyze different types of infections is highlighted by (Ibrahim and Abdulazeez, 2021).The research conducted (Taylor et al., 1997), indicates that diabetic eyes have been studied and analyzed.The authors also linked the use of insulin with these parameters.They mentioned the outcomes of this research as the developing counties population can be suffer in Blinding Retinopathy (BR) diabetic disease due to adopting westernized food, gain of weights and long lives.As per their findings, 63% patients of the study in United Kingdom found to have decrease vision due to diabetic disease.The importance of early detection and the prevention from diabetic can lead to complete blindness in the research work of (Priya et al., 2013).The researchers concluded that how a diabetic patient can goes to complete blindness due to the effect of diabetic disease.They described the types of DR as proliferative-less, pre-proliferative and proliferative DR.According to their outcomes, initially DR indications are unknown but with the passage of time, it may leads to low vision to complete blindness.The only possible solution to protect for DR and complete blindness is that the patients or individuals should checked their sugar level regularly and keep focus about the sight issue if any.According to the research of (Rajalakshmi et al., 2018), they mentioned the importance of using Artificial Intelligence (AI) for automatic detection of DR and Sight-Threatening DR (STDR).They used the smartphone-based device fundus photography with the ophthalmologist rating.The researchers got the images of 301 patients having diabetes II for DR.So they concluded that smart phone based AI detection of diabetic patients can be used as a preliminary tool to check the DR presence in most of the diabetic patients.
The authors (Bhatia et al., 2016) put forward techniques for the development of a system that can detect the DR in diabetic patients automatically.Their main contribution is to find the ways to early detect the DR patient using AI based systems.The research work done (Chetoui et al., 2018).)for DR identification with ML and textual feature was performed to secure complete blindness.As far as their contribution is concerned, they apply different textual features such as Local Energybased Shape Histogram (LESH) and Local Ternary Pattern (LTP) etc.The researchers also used support vector machine algorithm of AI to classify the obtained histogram.The authors (Ahmed, 2002) conducted research for providing the history of diabetes mellitus.The word mellitus means honey sweet.As per their findings, before 3000 years, the ancient Egyptians defined diabetes mellitus.This term was initially created by Aratus.After that mellitus was enhanced by Aratus of Cappadocia firstly used the label "diabetes" (81-133AD).
Later, the word mellitus diabetes mellitus was defined by Thomas Willis in 1675 with the discovery of sweetness in urine and blood of the patients.It is due to his efforts that the presence of extra sugar in urine and blood was found in Dobson in 1776 due to this sweetness.Then the use and key role of in liver for cyclogenesis in 1857 in France.After that that the Mering and Minkowski found the use of pancreas to control diabetes in 1889.After all in 1921, these findings become the foundation for the discovery of insulin in clinics by the efforts of and Best.Later on many research based reports given the management of dietary and its chronic complications.As per the research conducted (Li et al., 2019), they suggested early screening of the diabetic patient to avoid blindness due to DR by achieving high specificity and sensitivity.High means greater than 90%.As per the outcomes of this research, the performance of deep learning models was not good because of certain restrictions.One of the main limitation was the availability of the datasets of fundus of eyes from different patients.To assess the deep learning models, images they got 13,673 images after 9,598 different patients.Then these fundus eye images separated into different 6 types with respect to 7 graders as per the level of DR and the quality of the images.Resultantly, 757 images found to be DR and then these interpreted to 4 categories of DR based cuts.The authors applied best deep learning algorithms such as semantic segmentation, object detection and the classification of an image.While, they got 0.8284 for DR classification accuracy for the classification of DR but these have not brought best results for lesion segmentation and its detection.It clearly indicates that detection and segmentation of lesion is a tough task.The research work (Suhail. and Zwiggelaarb, 2020) suggested afresh way of segmentation about the mammographic images masses.Resultantly, the outcomes of the segmentation highlighted better and enhanced area of the mammogram masses that shows the accurate shapes of irregularities.They finally tested their proposed method on 233 malignant and 233 benign irregularities.The authors (Sarki et al.,2020) provided early detection of DR or diabetic eye related material for patients, healthcare professionals and research communities.(Abinaya et al., 2020) also studied a methodical analysis of the published literature using ML techniques.The researchers (Aladawi et al., 2019) studied and evaluated already researched and published methods, specificity, classification and datasets for the automated detection of DR.The recent research studies (Mayya et al., 2021), (Atwany et al., 2022), (Kumar et al, 2022) have been conducted survey studies for the early detection of DR and use of artificial intelligence.

Dataset:
We used already available dataset (Aladawi et al., 2019).This dataset is known as Dataset for Diabetic Retinopathy (DDR).They used this data set for the classification of Diabetic Retinopathy (DR), Lesion Segmentation and Lesion Detection.However, it may be mentioned here that the authors have performed their research using multilevel classification.Following table shows the multilevel classification and number of images for DR grading: The researchers developed a training data set as per above multilevel.We have also used same data set for training.But our purpose is different than the research work of (Aladawi et al., 2019).We would like to classify healthy and diabetic patient depending upon the eyes condition of the patient.So, in this research, we used the following labels for binary classification: 0. No Diabetic Patient 1. Diabetic Patient.However, the number of images used in this research study are given in the following table: Methodology: As we have mentioned in data set section that we are working for binary classification whether a patient is a diabetic patient or not depending upon the DR condition or overall eye condition of the patient.In this way, we will annotate 0 level of (Aladawi et al., 2019) as 0 and 1 to 5 levels of the (Aladawi et al., 2019) as 1.We further like to mention here that our problem is binary problem but not a multilevel problem.So, as per the data set developed by (Aladawi et al., 2019), we reused this data set for the identification and classification of a patient into non-diabetic patient or diabetic patient.We used hybrid way for the classification and identification of early diabetic patient.This hybrid method consists of the following:

RESULTS AND DISCUSSIONS
Results of the Experiments using Python Libraries and Weka tool are given below: According to Tables 4 and 5, pictures and histograms representation of each of the images representing clear-cut difference among the multilabel images of the dataset.It also further elaborated the effective use of these images for binary classification of a normal person into a diabetic or non-diabetic patient.As per Table 6, first order features such as mean, median, variance, standard deviations are extracted.These results clearly show the differences of the images with respect to image level for the clear cut identification of the separation of these images.Similarly, second order features have also been extracted and provided in Table 7.We extracted GLCM and LBP total nine features named as LBPEnery, LBPEntropy, GLCM_Energy, LCM_Homogeneity, GLCM_Contrast, GLCM_Corre, GLCM_Dissimilarity, LCM_Entropy and GLCM_ASM.The extracted values of second order features clearly show the differences of different level eye images used in dataset.We used five binary classification techniques named as decision tree classifier, linear regression, logistic regression, naïve Bayes and grid search CV.We used four model evaluation measures such as precision, accuracy, recall and f1 score.According to Table 8 results, best results are indicated by bold face text.This tables depicts that decision tree classifier performed very well for the classification of diabetic patients or nondiabetic patients.Its reason is that maximum precision is 81%, accuracy is 82%, recall is also 82% and f1 score is 81%.So 81% precision indicates that our trained model returned or classified 81% correct results for the binary classification of diabetic and non-diabetic patients.However 82% accuracy shows that it is very much close results out of 100%.Similarly, 82% result of recall represents that this model returned 82% relevant instances of the images.In the same way, 81% f1 score indicates the harmonic mean of precision and recall that also shows excellent performance of the model evaluations and measurements.Resultantly, most of the evaluations scores are more than or equal to 70% using all five classification techniques.

DISCUSSIONS
As per Table 9 results using canopy clustering technique with Weka, the results shows that 97% performance found successful using Weka tool.Out of 97%, 45% results indicates non-diabetic patient and 52% results shows diabetic patients with different instances and percentages.According to Table 10, finally two clusters found with 40% non-diabetic patients in one cluster and 60% diabetic patients in second cluster.These results also show the best performance of the research study conducted for the classification of a normal patient eye into diabetic and non-diabetic patients.

Conclusion:
In this study, we proposed the idea of the early identification, detection and classification of diabetic and non-diabetic patient using hybrid approach.We created and used histograms of 0-lablel images (Non-diabetic) along with all other classes or levels of images from 1 to 5 as well.The above tables 4 and 5 shows the pictorial differences of the diabetic and non-diabetic patient.The images and their respective histograms both are highlighting the clear classification of diabetic and non-diabetic patient based upon the DR of eyes retina.While, the experimental results performed after extracted features from these images.The results have been given in table 8 using machine learning methods.Similarly the results generated using Weka are given in table 9 and table 10.As per the results created using decision tree, precision is 0.814, accuracy is 0.821, recall is 0.821and f1 score is 0.817.Hence it can be used effectively for identifying DR and early detection of diabetic patient.However, further experiments also performed with Weka for further analyze the outcomes.As per the clustering results of table 10, Weka identified 40% diabetic patients and 60% non-diabetic patients.It means we can also use clustering technique efficiently for the binary classification of the images.

Future work and limitations:
The authors would like to create new dataset of the retina images of the diabetic patients along with the images of non-diabetic patients for the further improvements of the early identification and classification of diabetic and nondiabetic patients.One of the main limitation of this study is to collect a large dataset of the retina images of diabetic and non-diabetic patients in time.

Table 2 : Division of Images.
This dataset consists of DR grading, lesion detection and lesion segmentation.But we used only DR grading for binary classification.It may also necessary to clear that this dataset consist of training, testing and validation images for DR grading, lesion detection and lesion segmentation.But we used only DR grading images for binary classification.The authors of this research study split this DR grading images after extracting first and second order features into training and testing data.The training data is 80% and testing data is 20%.

Table 4 : Pictorial Comparisons of Multilabel Eyes Images (0 to 5 Levels). Table 5: Histograms Based Comparisons of Multilabel Eyes Images (0 to 5 Levels). Val.
Here, GLCM stands for Gray Level Co-occurrence matrix, LBP stands for Local Binary Pattern.