Home Computer Science



Table of Contents:
SYSTEM ANALYSISSystem analysis is a process of understanding facts and identifying the problems. The purpose of system analysis is to study system and understand its objectives. This helps improve the system and accomplish purpose of the system. 6.4.1 EXISTING SYSTEM In the existing system, as discussed in the literature survey part, the diabetes risk score system with collected samples from certain region was developed. Based on age, family history, gender, WC, physical activity, BP, and smoking, various diabetes risk score tools were developed. To identify the variables, univariate regression analysis is done. To derive the risk score, the p coefficient values are identified using the analysis called logistic regression. For finding the cumulative regression coefficient, all p coefficients derived from the logistic regression are added. The authors also find the optimal cutoff value, sensitivity, and specificity. Based on the ROC analysis, they calculated the optimum value. They also validate the system using AUC. 6.4.2 PROPOSED METHOD hi the proposed method, we used IDRS as template to reverse calculation and create an imputed dataset for Asian and European countries. As the aim is to provide individual agespecific personalized T2D risk score, we calculated the p coefficient for each year instead of making an age group. Once the above step is completed, impute the data according to the average value between the ranges. While doing this, the value of p increases according to the individual specific age; similar calculation was done for China, Sri Lanka, Oman, Cambridge, France, UK, and Danish. A similar approach was taken to personalize BMI and WC. Several risk score tools are developed, but predicting the collect risk score without losing simplicity is really a challenging task. The proposed system is compared with the existing risk score system for the accuracy and performance. It can also be applied to different ethnic groups. Therefore, the diabetes risk score system is designed without any laboratory tests using AI techniques. METHODOLOGYMachine learning is the logical field, managing the maimers by which machines gain the fact from expertise. Python apparatuses and modules are used. Here, in this case, matplotlib, numpy, and pyplots for plotting yield results additionally bolsters machine learning algorithms such as classification, logistic regression, decision tree (DT), random forest (RF), linear, and different algorithms were utilized. Here, accuracy, confusion matrix, sensitivity, and specificity are calculated using the machine learning algorithm. Specificity or true negative rate is defined as the level of patients who are accurately distinguished as being healthy. (1  specificity) is the level of patients who are mistakenly recognized as being diseased. Sensitivity or true positive rate is defined as the percentage of patients who are correctly identified as being having the disease. In machine learning grouping models, one basic proportion of model exactness is AUC. By bend, ROC bend is inferred. ROC represents receiver operating trademark, which can be drawn as sensitivity versus 1  specificity. The motivation behind this work is to detect T2D of individuals who are interested to know about then risk score. Therefore, the diabetes risk score system is designed without any laboratory tests. Its design steps are as follows.
The proposed model is shown in Figure 6.1. FIGURE 6.1 Proposed model diagram. 6.5.1 DATASET SELECTION To develop a uniform T2D risk scoring system for Asians and Europeans, we used scoring values from IDRS (India), Chinese score system, SLDRISK (Sri Lanka), Omanese, Cambridge, France, UK, and Danish. The details of the scores of these four systems are given in Table 6.2. Table 6.2 represents p coefficient values of the Asian system by considering five parameters: age, WC, physical activity, family history, and BMI. To design a uniform T2D risk scoring system for Asians among the different parameters, WC has a significant role. The protection from insulin increases as the individual progresses toward becoming overweight. The risk factor to diabetes also increases if an individual has a family ancestry, that is, if a parent or sibling of the subject has/had diabetes. As the age increases, the risk of diabetes also increases due to the lack of physical activity, yet T2D is also being observed in youths. Using all these constraints, we have identified strong parameters that affect T2D; based on these parameters, we develop the diabetes risk score system. Important parameters are age, gender, WC, physical activity, family history, and BMI.
The risk score can be easily calculated using the p coefficient value. It can be mostly used in developing countries, p coefficient calculations are explained in Section 6.2. The risk score of probability using p coefficient value can be calculated as follows:
wherex_{v} x ,... are independent risk factors, p_{0} is the intercept, and p_{r} p,,... are regression coefficients. 6.5.2 DATA PREPARATION AND The IDRS is used as template to reverse calculation and create an imputed dataset. As the aim is to provide individual agespecific personalized T2D risk score, the p coefficient is calculated for each year instead of making an age group. To achieve this, we took IDRS as reference and created the imputed dataset. In IDRS, values of p coefficient for age groups <35, 3549, and >50 are 0, 0.84, and 1.47, respectively. We created a continuous dataset for individual ages from 21 to 80 using these p coefficient values. To do so, the considered lowest value is 0.4 for 2134 years and highest value for these 2134 years is calculated based on the next value of the categoiy. Therefore, here, the highest p coefficient value is determined as 0.2. A similar technique is applied for the age categories 3549 and 5080. Therefore, values obtained for the ages 35, 49, 50, and 80 are 0.699, 1.1, 1.2, and 1.64, respectively. Once the above step is completed, impute the data according to the average value between the ranges. While doing this, the p value increases according to the individual specific age. Similarly, the calculation is done for China, Sri Lanka, and Oman. A similar approach was taken to personalize BMI and WC. 6.5.2.1 FOR PHYSICAL ACTIVITY Physical activity is one of the important parameters for predicting the T2D. Three categories of physical activity are considered according to the IDRS: vigorous exercise with the p coefficient of 0, no exercise with the p coefficient of 1.45, and mild exercise with the p coefficient of 1.13. In view of inquiries shaped by the International Physical Activity Questionnaires, physical activity was separated as low, moderate, and high. Here, lively physical exercises are the exercises that require hard physical exertion and influence you to inhale a lot harder than typical. Such physical exercises resemble hard work, burrowing, high impact exercise, and quick bicycling. Moderate exercises are exercises that require moderate physical exertion and influence you to inhale fairly harder than ordinary. 6.5.2.2 FOR FAMILY HISTORY Family history is another important parameter for predicting the T2D. We have considered three categories of family history according to the IDRS: two nondiabetic parents with the p coefficient of 0, either parent with the p coefficient of 0.54, and both parents with the p coefficient of 0.83. All these categories are included for dataset creation. 6.5.3 COMPUTATION FOR DATA IMPUTATION Once p coefficients are calculated as explained in the data computation part, in the next stage, imputing the data is veiy much essential. Here, Python library Scikit leam is used, and also, there is a Python module dedicated to permutations and combinations called itertools. It is one of the greatest comers of the Python 3 standard library: itertools. Itertools. Product (): This tool computes the Cartesian product of input timetables. This module implements a number of iterator building blocks in a form suitable for Python. This is the efficient tool that can be used for a variety of combinations. Initially, we took four parameters, namely, age, waist, physical activity, and family history; later, BMI was also included in the list. Once all the values are added in the particular list, the product (*) with the itertools module was used. Therefore, it is acting like nested for loop, and we got all the combinations of four parameters so that the total number of samples obtained is 514,384. A similar approach was taken to create the dataset of India (IDRS) with BMI, China, SLDRISK, and Oman. The steps involved in creating the dataset are shown in Figure 6.2. 6.5.4 DESCRIPTION OF THE DATASET The dataset is created for India, China, Sri Lanka and Oman based on certain important attributes and does not contain any missing values. All variables were categorized as age (2134 years versus 3549 and >50 years), WC (men <90 cm, 9099 cm, >100 versus women < 80 cm, 8089, >90 cm), BMI (weight in kg divided by height in m2) (BMI < 25 vs. 2529 vs. 3034 and >35), family history of diabetes (two nondiabetic parents versus either parent having diabetes and both parent having diabetes), and physical activity (vigorous exercise versus mild and no exercise is considered). Based on the outcome of the diabetes (that is 0/1), training data should be classified such that 0 indicates no diabetes and 1 indicates diabetes. Table 6.3 represents the attribute of the created dataset. FIGURE 6.2 Steps involved in creating the dataset. TABLE 6.3 Attributes
6.5.5 DATA MODELING AND ALGORITHMS USED FOR PREDICTION Different algorithms, namely, multiple logistic regression, Gaussian Bayes (GB), RF, and DT [21] are applied to the imputed Indian, Chinese, SLDRISK, and Oman diabetic datasets. The data were grouped into training (70%) and test sets (30%) comprising of 50% of T2D. Two combinations of parameters, such as (i) age, gender, physical activity, family history, and WC and (ii) age, gender, physical activity, family history, WC, and BMI, were used in predicting the efficacy (specificity and sensitivity) of each algorithm using ROC andAUC with 95% confidence interval. Furthermore, the outcomes of each algorithm were compared with each other, and the best model is selected. In another approach, we used a consensus algorithm [22] to get the average of scores from the entire algorithm. Similarly, we developed the consensusbased Asian score, as described in Figure 6.3. The essential issue to decide positioning accord is an issue to join a few rankings, which are chosen by at least two decision makers into positioning agreement. For different Asian countries, machine learning algorithms were initially applied. The average value of each method is identified; then, final prediction is done using the consensusbased average rank algorithm, as shown in Figure 6.3. 
<<  CONTENTS  >> 

Related topics 