Title of Report

Adams Kusi Appiah1, Manisha Singh2, Chi Chen3, Ruyu Tan4, Sharang Chaudhry5, Simona Nallon6, Upeksha Perera7

Problem Presenters:

Agustin Calatroni and Petra LeBeau

Rho, Inc.

Faculty Mentors

Emily Kang

University of Cincinnati

Abstract

The use of machine learning (ML) algorithms for making predictions have been embraced in many applied research areas. Most recently, biomedical research has embraced ML due to the advancement in biotechnol-ogy like genomics. Black box machine learning algorithms are known to minimise predictive error from both theoretical and application perspectives. However, black box models such as neural networks, random forests, and other methods does not o?er much interpretability. Therefore, researchers are often left with the di?cult choice of either choosing accuracy, which generally requires more complex prediction functions and are less in-terpretable, or ML algorithms that are simple and interpretable but do not make the most accurate predictions. Hence, model interpretability leads to trust in the model. In addition to trust, model interpretability also leads to understanding and transparency of the model development process. Thus, the resulting predictions and decisions are also trusted and understandable. Today, many researchers are embracing machine learning algorithms but are challenged with the interpretation of the more complex models. This still presents a bar-rier to the widespread practical use of these techniques. The gradient boosting machine (GBM) is arguably the most natural ML algorithm choice for classification and prediction of binary response variables. In this project, we explore several tools such as individual conditional expectation (ICE) plots, partial dependency plots (PDP), and other plots for visualising the model estimated by a GBM learning algorithm. (ADD SOME RESULTS, JUST ONE OR TWO TO THE ABSTRACT)

Keywords

Machine learning, liver disease, gradient boosting machine, LIME, variable importance measures, partial de-pendence plots, individual conditional expectation plots, surrogate model, sensitivity analysis, misclassification error, Shapley value

1 Introduction

The success of ML algorithms in medicine and multi-omic studies over the last decade have come as no sur-prise to ML researchers. This can be largely attributed to their superior predictive accuracy and their ability to work on both large volume and high-dimensional datasets. The key notion behind their performance is self-improvement. That is, these algorithms make predictions and improve by analysing mistakes made in those predictions. The di?culty with this ” predicts and learn” paradigm is that these algorithms su?er from

1. Department of Biostatistics, University of Nebraska Medical Center

2. Department of Information Science, University of Massachusetts

3. Department of Biostatistics, State University of New York at Bu?alo

4. Department of Applied Mathematics, University of Colorado at Boulder

5. Department of Mathematical Sciences, University of Nevada, Las Vegas

6. Department of Statistics, California State University, East Bay

7. Department of Mathematics and Statistics, Sam Houston State University

1

diminished interpretability. This is often referred to as the “black-box” nature of ML methods, and, typically, the loss in interpretability is due to the nonlinear and high number of interactions embedded within the re-sulting models.

In cases where interpretability is crucial, for instance in studies of disease pathologies, ad-hoc methods lever-aging the strong predictive nature of these methods have to be implemented. These methods are used as aides for ML users to answer questions of the following nature: ‘why did the algorithm make certain decisions?’, ‘what variables were the most important in predictions?’, and/or ‘is the model trustworthy?’. In this work, interpretability of a particular class of ML methods called Gradient boosting machines has been studied on the prediction of the liver disease status of patients. The remainder of this paperwork is organized as follows: Section blah presents blah.

1.1 Machine Learning Visualization and Interpretation

Many methods have been developed because of the need for interpretation of the models. Typically, these methods tend to use relevant information required for answering specific questions that are of interest. Given that interpretation is a wide concept with a large scope, often a variety of them need to be used in conjunction. Some of the most commonly used methods are Variable Importance Plot strobl2007bias Partial Dependence Plot goldstein2015peeking Individual and Conditional Expectation cite among others. All three of these techniques compute statistics that account for the change in predictions if the value(s) of the input feature(s) were changed. Besides attempting to derive metrics from a change in predictive nature of the methods, the use of surrogate models has been suggested cite. Surrogates are defined as interpretable models that typically produce the same predictions as the non-interpretable model of interest. The rationale is that models with similar predictive patterns may have similar decision-making ideology.

Broadly, the scope of interpretation is divided into areas. The first is global, where the entire dataset is used for interpretation and the second is local, where a subset of the data is used for deriving an interpretive analysis of the model. Variable Importance Plots and Partial Dependence Plots fall in the global category. As far as the use of surrogate models is concerned, it possible to obtain them both at global and local levels. The intricacies of these methods are further discussed in the analysis section of the paper.

1.2 Liver Disease

The liver is the largest organ in the human body and performs multiple functions. These functions include emulsification of fats, detoxification of blood, and vitamin and cholesterol storage among others. Liver dis-ease is defined as any condition that obstructs with the normal function of the organ. The most common ailments include Nonalcoholic fatty liver disease, Hepatitis A, B&C, Cirrhosis of the liver, Alcoholic hepati-tis, and hemochromatosis. On a global scale, over one million new cases of liver cancer are diagnosed every year. luk2007artificial Common symptoms of liver disease include nausea, vomiting, jaundice, weakness, and excessive bleeding. In clinical practice, detection of the liver disease involves monitoring levels of direct and indirect bilirubin (pigments formed during the breakdown of hemoglobin), alanine aminotransferase & aspartate aminotransferase (enzymes), and albumin (a protein made in the liver). Often, tests for additional enzymes like alkaline phosphatase and gamma-glutamyl transpeptidase can also be conducted. It has pre-viously been explored in literature, that gender and age can also be potential contributing factors for liver disease cite. A dreadful fact is that liver disease is not easily discovered. The liver is capable of maintaining normal function even when partially damaged. Thus, early diagnosis is one of the most important steps in liver disease treatment.

2 Materials

This section presents more information about the Gradient Boosting Machine and presents details about the Indian Liver Patient dataset.

2

2.1 Gradient Boosting Machines

Gradient boosting machines (GBM) are a class of powerful machine-learning techniques used in regression, classification, and ranking problems. GBM produces a unifying prediction model in the form of an ensemble of prediction models such as the decision trees. The idea of gradient boosting is that boosting can be interpreted as an optimisation algorithm on a reasonable cost function. The main objective of Gradient boosting machines is to develop new base learners that are correlated maximally with the negative gradient of the loss function, associated with the whole forward learning ensemble. The main focus of boosting algorithms is to add new models to the ensemble sequentially. At each iteration, a new weak base-learner is developed concerning the error of the whole ensemble learning so far. The learning procedure of GBM consecutively fits new models to provide a more accurate estimate of the response variable natekin2013gradient

2.1.1 Gradient Boosting Algorithm

As presented in natekin2013gradient the goal is to construct a functional dependence between the predic-tors and the response. Let the loss function be ?(y, f) and z(x, ?) be a custom base-learner. Sometimes the solution to the parameter estimates is di?cult to obtain. Therefore, it was proposed to choose a new function z(x, ?t) to be parallel to the negative gradient {gt(xi)}Ni=1.

gt(x)=Ey ??(y,f(x)) x f(x)=fˆt-1(x)

?f(x)

The new function increment is chosen to be correlated with ?gt(x). Proceeding, it is possible to replace the optimization task with the general least squares minimization:

(?t, ?t)= arg min ?, ? P N ? gt(xi) + ?z(xi, ?) 2

i=1

An algorithmic overview of the gradient boosting algorithm, as originally proposed by Friedman (2001), is shown below.

Algorithm 1 Algorithm for obtaining nerves through regression and clustering

Require: Input data (x, y)N i=1

Require: Choice of the loss-function ?(y, z(x))

Require: Choice of the base-learner model z(x, ?)

1: Initialize fˆ0

2: For t = 1 to M do

3: Compute the negative gradient gt(x)

4: Fit a new base learner function z(x, ?t)

5: Find the best gradient descent step-size ?t:?t = arg min ?

6: Update the function estimate fˆt = fˆt-1 + ?tz(x, ?t)

7: End for.

N ? yi, fˆt-1(xi) + ?z(xi, ?t)

Pi=1

In practice, an appropriate loss function, ?(y, z) and base learner needs to be chosen. In this work, the GBM was fit using H2O package using a large search grid for parameters.

2.2 Data

To demonstrate and interpret GBM, the Indian Liver Patient Dataset is used. This dataset is openly available for public use UCIRepository. The data contains records of 583 liver patients from North East India. The data is comprised of eleven variables: ten independent variables and a response variable indicating disease status of the patient. Out of the 583 patients, 167 don’t have liver disease and 416 have liver disease.

3

The dataset of liver patients was used to estimate performance (i.e. precision, accuracy, misclassification, and error rate) of classification algorithms. The complete data set is divided into training data and testing data. The following attributes were considered for our experimentation:

• Age of the Patient: The patients range from ages 4 to 90 with a median of 45 years old.

• Gender of the Patient: There are 142 women and 441 men in the study.

• Total Bilirubin: Bilirubin is a yellow pigment that is formed in the liver to break down hemoglobin and found in blood and stool. Total bilirubin consists of both conjugated and unconjugated bilirubin. The normal levels of total bilirubin are 0.1 to 1.2 mg/dL.

• Direct Bilirubin (Conjugated Bilirubin): Direct bilirubin flows directly into the blood. The normal level for direct bilirubin is 0.3 mg/dL.

• Alkaline Phosphatase(ALP): Alkaline Phosphatase is an enzyme that is found in the blood and helps with breaking down proteins. Abnormal levels of ALP can indicate that the liver and gallbladder are not functioning properly. The normal range of alkaline phosphatase is from 44 to 147 IU/L.

• Alanine Aminotransferase(ALT): This enzyme is found in the blood and is a good indicator to verify whether a liver is damaged especially due to cirrhosis and hepatitis. ALT can be measured using a test called SGPT. Normal levels of ALT are 20-60 IU/L.

• Aspartate Aminotransferase(AST): AST is an enzyme and can be measured using a test called SGOT. Normal levels of AST range from 10 to 40 units per liter. High levels of AST indicate damage in an organ such as heart or liver.

• Total Proteins: Total proteins consist of the proteins albumin and globulin. The test for total proteins measures the amount of these proteins in your body. The normal range of this is between 6-8.3 g/dL.

• Albumin: Albumin is the protein that prevents the fluid in blood from leaking out into the tissues. The normal range for albumin is 35-55 g/liter.

• Albumin to Globulin Ratio: This is a good indicator of the state of the liver. The normal A/G ratio is approximately 0.8 to 2.0. If the A/G ratio is either too high or too low, then more tests need to be completed to diagnose the issue.

3 Analysis

Visual Analysis uses visual perception to solve complex problems with machine learning algorithms. We can create fast, accurate, and trustworthy interpretations of machine learning models by designing e?ective visual representations. There are two generic methods that can be used for visual interpretation and analysis. The first method is the visual model structure and the second method is visualizing model behavior (also referred to as the black box model). In the project, variable importance plots (VIP), partial dependence plots (PDP), the individual conditional expectation (ICE) plots are used. Local Interpretable Model-agnostic Explanations (LIME) plots are also used to visualize our model.

3.1 Variable Importance

Variable importance represents the statistical significance of each variable in the data with respect to its ef-fect on the generated model. Variable importance ranks every predictor based on the contribution that the predictor makes to the model. This technique helps data scientists to weed out certain predictors that do not contribute much to the model. Sometime researchers think that a specific variable has a significant contri-bution to the model but then the variable importance results show that the contributions for that variable are not significant. Variable importance is calculated by the sum of the decrease in error when the variable

4

gets split. The relative importance of the variable importance is calculated by dividing the highest variable importance value. Then variable important values are bounded between 0 and 1.

A variable importance plot was created for the liver patient data. This plot helps identify which variables contribute the most in predicting the outcome of the model. The variable importance plot revealed that alkaline phosphatase, age, and albumin are the three variables that contribute the most significantly to the model. These variables also have high predictive power in classifying individuals as cases (liver disease) and controls (non-liver disease).

Algorithm 2 Algorithm for obtaining permutation feature importance

Require: Trained model fˆ

Require: Feature matrix X

Require: Target vector Y

Require: Error measure ( ˆ )

L X, Y

1: Estimate the original model error origfˆ = L(Y, fˆ(X) (e.g. mean squared error)

2: For each feature j ? 1, …p do

3: Generate feature matrix Xnew by permuting feature Xj in X.

4: Estimate error new = L(Y, fˆ(Xnewj )) based on the predictions of the permuted data.

5: Calculate permutation feature importance F Ij = new(fˆ) .

orig(fˆ)

6: Alternatively, the di?erence can be used: F Ij = new(fˆ) ? orig(fˆ).

7: Sort variables by descending F I

8: End for.

3.2 Partial Dependence Plots

Partial Dependence Plot (PDP) is a global and graphical representation of the marginal e?ect that a set of variables have on the target field ignoring the rest of variables. Partial dependence plots can be used for clas-sifying and regressing ensembles. PDP represent the ensemble results not the data values. Partial dependence plots help with understanding how each predictor or variable e?ects the model predictions and helps answer the questions. PDP marginalize over the values of all other features and show how each predictor a?ects the model’s prediction. Overall, partial dependence plots are a great way to gain trust and understanding from these complex models.

The principle of PDP can be represented mathematically in the following way: given an estimator, f, the point is in the features space. This doesn’t make sense and I don’t know what happened here R.T.

The partial dependence plots are created on the training data. It is possible to see that the probability of correctly classifying a liver disease patient decreases when alkaline phosphatase level is greater than 5.2. Then, the probability becomes stable when alkaline phosphatase level is greater than 5.7. As the alanine amino-transferase level increases, the model begins to have a positive influence for classification. A similar pattern is visible for the direct bilirubin.

3.3 Individual Conditional Expectation

We know that the partial dependence plot for visualizing the average e?ect of a feature is a global method, because it does not focus on specific instances, but on an overall average. The equivalent to a PDP for local expectations is called individual conditional expectation (ICE) plots. Individual conditional expectation plots are similar to partial dependence plots and ICE plots are newer and are not well-known. ICE plots are used to explain the localization and they are very useful when the explanatory variables have strong relationships. ICE plots visualize the relationship of the predicted response on a feature for every instance separately, resulting in multiple lines, one line for each instance, compared to only one line in partial dependence plots.

5

Assume we have observations x = (xSi, xCi) for i = 1, …, N and estimated response fb. For a specific subject

(i) against a grid of values of xSi can be plotted, where the

i, xCi is set as the value observed, a curve of fbS

(i)

x-coordinate represents the value of xSi and the y-coordinate represents the value of fbS . This curve conveys

the information of conditional expectation of fS(i). Finally, a cluster of N curves can be aggregated on the plot. The detailed algorithm for plotting an individual ICE curve can be summarized as

To formulate the algorithm, we assume X is a n × p predictor matrix, S ? {1, …, p}, xC is the covariates to be fixed as constant value that observed,

Algorithm 3 Algorithm for ICE

1: Function ICE(X,fb)

2: For i = 1 to N do

(i)

3: fbS = 0N×p

4: Set xC = Xi, C

5: For l = 1, …, N do

6: Set xS = Xl, S

(i)

7: fbSl = fb(xS, xC )

3.4 Surrogate Models

In order to overcome the issue of the complex nonlinear framework and lack of transparency of ML models, most literature suggests using surrogate models. This is a simple model that imitate the workings of the complex model nonlinear ML model as closely as possible. The surrogate models are developed by training a linear regression or a decision tree on the original inputs and predictions of the complex ML models.

Interpretations generated through surrogate models in line with human perception of a real world problem. The coe?cients, variable importance and interactions generated by surrogate models are parallel with the human awareness of the problem. Surrogate models increase the general expectation of the phenomenon when it used to be demonstrate the internal mechanisms of the complex model (Hall, Phan & Ambati, 2017).

3.4.1 Global Interpretability

Global interpretability helps to gain insights about the distribution of the dependent variable based on some predictors. Global interpretability model make decisions based on conditional interactions based on a set of features and a response. The objective is trying to interpret the feature interactions and the importances as a step towards perceive global interpretation. In general, global model interpretability is very hard to achieve. A model that exceeds the considerable number of parameters is di?cult to be understood by humans, since a feature space that has more than 3 dimensions is not imaginable by human consciousness (Molnar , 2018).

In the liver disease data set, there are ten features that relates to the response liver disease. The number of predictors is much larger to be able to visualize by human. Therefore, we believe that in order to make better decisions, increase predictive ability and have better trustworthiness we need better visualization methods. **ADD MORE RELATING THIS TO THE PROJECT

3.4.2 Local Interpretability

Local interpretability explain the conditional interaction between features and a response for a particular instance. The objective is understanding what kind of a prediction the model makes for a particular input and the reason for this prediction (Molnar , 2018). The local interpretability does not consider about fun-damental assumptions and the structure of the the model. The idea behind the local interpretability is to focus on a specific data point and identify a neighborhood in the feature space around that point and identify

6

the model decisions based on this region. **source: https://towardsdatascience.com/human-interpretable-machine-learning-part-1-the-need-and-importance-of-model-interpretation-2ed758f5f476**.

In the project we use LIME as a local surrogate model. We try to model our data with lime package in R. ‘lime’ is used as a method for illustrate the prediction of a black box models by fitting a local model in the neighborhood of the point in question an perturbations of this point. **ADD MORE

3.5 Local Interpretable Model-agnostic Explanations(LIME)

3.5.1 General Background of LIME

The global interpretation of black-box models does not represent how the model behaves for all patient. The LIME, a local surrogate model, is a visualization technique proposed by (Marco Tulio & Ribeiro) can be used to explain the predictions of complex ML classifier which produce outcomes that are not intuitive to users. The LIME algorithm enables the breakdown of complex, black-box ML models which is less interpretable into interpretable variable importance plots with the assumption that every complex model is linear on a local scale. Thus, the LIME provides a tool for local interpretation of black-box model.

3.5.2 LIME Algorithm

To analyze the model’s prediction behavior and to check which input parameters are contributing to the pre-diction we perturb the input around its neighborhood. The data points around the neighborhood is weighted by their proximity to the original instance. For these neighborhood data points, the prediction given by the black box model is analyzed. Then a set of features that describe the prediction, are selected for the data around the neighborhood.As the next step a simple model is fitted to this neighborhood. Then the feature weights from the simple model are extracted and used as explanations for the complex black box model (need cite) .

The model which we used for feature selection and to calculate the neighborhood observations is simple and coded into the lime package in R which can be influenced by user and the model.

CITE THIS: https://homes.cs.washington.edu/ marcotcr/blog/lime/

The above figure illustrates the LIME algorithm. The blue/pink background represents the decision function of the original black box model. It is possible to see that it is nonlinear. The bright red cross is the instance being explained (X). A set instances around the neighborhood of X is selected, and weighed them according

7

to their proximity to X. Then the original model’s prediction on these instances is considered, and a set of features that best describe the prediction are selected for the data around the neighborhood. Then a simple linear model is generated to approximate the original model. Note that the explanation in this case is not faithful globally, but it is faithful locally around X.

3.6 Shapley Value

The Shapley value is a key solution concept from cooperative game theory in general and voting games in particular. It is characterized by a collection of desirable properties. The Shapley value is the average marginal contribution of a feature value over all possible coalitions. We repeat this computation for all possible coalitions. The computation time will increase exponentially with the number of features, so we have to sample from all possible coalitions.

3.7 Misclassification Error

Misclassification error is used to measure the error in the categorical data when the predicted category is di?erent from the actual category. **source:https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/** In our data set we have 583 cases, where 167 have liver disease and 416 do not have liver disease. To check the validity of our methods we use the confusion matrix and we will explain each component of the table later.

Total Number n Predicted: YES Predicted: NO

Actual: YES True Positive False Negative

Actual: NO False Positive True Negative

• True positives (TP): These are cases in which we predicted yes (they have liver disease), and they do have liver disease.

• True negatives (TN): We predicted no, that is we predict they don’t have liver disease, and in fact they don’t have liver disease.

• False positives (FP): We predicted yes, that is we predict they have liver disease, but they don’t actually have liver disease. (Also known as a “Type I error.”)

• False negatives (FN): We predicted no, that is we predict they don’t have liver disease, but they actually do have liver disease. (Also known as a “Type II error.”)

Also we can calculate accuracy, misclassification rate, specificity and precision from the table.

• Accuracy: Accuracy means overall, how often is the classifier correct: (TP+TN)/total.

• Misclassification Rate: Misclassification Rate means overall how often is the result wrong: (FP+FN)/total.

• Specificity: Specificity means when it’s actually no, how often does it predict no: TN/actual no. And this is equivalent to 1? False Positive Rate.

• Precision: Precision means when it predicts yes, how often is it correct: TP/predicted yes.

• Recall: Recall is the fraction of the cases that are actually yes over the total amount of the cases that predictions are correct. That is recall = TP/(TP + FN).

We use misclassification error to find out the true probability of success. In order to find the classification error we find the sum of the two types of errors (Type 1 error and Type 2 error). The probability of success is the opposite number of the sum of these two errors, thus we need to find the complement of this sum. As for our data, we do calculation for both training data and training data. The results are following:

8

Training dataset

Predicted

N=116 NO YES TOTAL

Actual NO 14 19 33

Actual YES 19 64 83

Total 33 83 116

Test dataset

Predicted

N=467 NO YES TOTAL

Actual NO 128 6 134

Actual YES 11 322 333

Total 139 328 467

From the tables above, we clearly know TP, TN, FP, FN. In order to understand the model, we want to calculate the misclassification rate, accuracy, recall, null error rate and precision for our data and the graph below store all the result by calculation.

Figure 1: GBM Model Performance

4 Discussion

4.1 Variable Importance Plots

Based on the algorithm 2 above in the analysis part, each predictors’ variable importance can be calculated. The variable importance plot below shows that there are three variables that contributes significantly to the GBM model. Alkaline phosohotase has the highest contribution to the model, while age and albumin are the second and third most important factors respectively. Gender has the least contribution to the GBM model here.

9

Figure 2: Variable Importance Plot

4.2 Partial Dependence Plots

The partial dependence plot of direct bilirubin shows that this variable has a positive marginal importance to the liver disease given other variables are held constant at their average. **ADD MORE

Given that other variables are held constant to their average, the variable age has positive marginal importance to the liver disease. Thus, we know age has a significant influence to the GBM model. **ADD MORE

4.3 Individual Conditional Expectation

ICE plots predict response for all patients in the dataset but only vary variable of our interest. From the plots below, the individual conditional expectation plots are about the two variables: direct bilirubin and age. ICE plots are similar to PDP, but ICE plots show all the instances. Each case in ICE plots has their own trend, which is not entirely followed the average pattern. It is possible to see in 4.3 that how each case’s predicted probability of liver disease percentile tends to increase in a non-linear manner between about 0.25 and 1. We can also see that each cases’ prediction is impacted in a di?erent manner. For example, it looks like the case predictions at the top of the plot do not increase as much as those at the bottom. The di?erences we see among the curves indicate that there are interactions between the ALP and the other features. In 4.3, the ICE plot will also indicate that age and the other features have interactions.

10

Figure 3: PDP for Direct Bilirubin

Figure 4: PDP for Age

Figure 5: ICE Plot for Direct Bilirubin

11

Figure 6: ICE Plot for Age

4.4 Surrogate Models

4.4.1 Gloabl

4.4.2 Local Interpretable Model-agnostic Explanations (LIME)

The individual feature importance plots allows us to fairly understand which features contributed most strongly to the classification a person has liver disease or not, and which features contradicted it and how they in-fluenced the prediction. The individual LIME feature importance visualization is represented in 3.7 for the five most important features for six cases(subjects) for the liver dataset. The header of each facet gives the case number, which class label was predicted and also with their probability. The length of each bar indicates the weight of the feature; where green/positive weights contribute to the prediction ‘Yes’ a subject has liver disease, the red/negative weights contradict it. Thus, the predicted probability is smaller for larger value in a feature with red bars. The explanation fit is their squared value of the ridge regression model indicating the amount of variation if the complex GBM explained by the simple ridge regression model. The feature age, alamine aminotransferase, and bilirubin (both direct and total) are important for many subjects. When the value of direct bilirubin is greater than 1.58, the bar is green, since this increases the predicted probability of the subject been liver diseased (see case 74). Other samples for which direct bilirubin is also important but smaller (i.e., direct bilirubin ? 0.20) have a red bar since this decreases the chances of having liver disease (see case 92). Thus, subjects with higher levels of direct bilirubin have increased chances of liver diseases.

The cases 74 and 76 behave similar. However, case 76 has little albumin and globaling ratio which are de-tracting from its status as ‘Yes’ without a?ecting the final prediction. In case 92, we notice the features total bilirubin, direct bilirubin and albumin are reducing the subject the prediction of been liver disease but unusual levels of alkaline phosphatase and aspartate aminotransferase are making up for the odd ones. Hence is clear that high values in the measurements of these features are indicative of liver disease and only two features seemed to be enough. It is also interesting to compare Case 28 (age ?30) and Case 46 (age>60). Case 28 could be considered as a young adult, which decreases the prediction of having liver disease. For Case 46, probably an elderly subject, and this has a negative influence on the prediction of liver disease.

Their R-squared values in all cases have values between 0.08 and 0.26. This means that the simple ridge regression model explanations capture a rather small proportion of the variance of the complex model. For this reason, the LIME may not be an appropriate tool to explain this GBM model.

12

5 Results and Conclusions

6 Summary and Future Work

13