In this section, we will compare the performance of different machine learning algorithm. We are using Naive Bayes (NB), Random Forest (RF), Logistic Regression (LR) and MLFNN do this task. MLFNN have been explained in the theoretical background section. Here we will discuss little about the rest of the algorithms and after that compare the results.

Naive Bayes classifier is a very straightforward and robust algorithm for the classification task. To get the in-depth understanding of the naive Bayes classifier, we need to first understand the principle of Bayes theorem. So, we will discuss the Bayes Theorem first. Bayes theorem works on the concept of conditional probability. Conditional probability is the probability that an event will happen, given that another event has already occurred. The conditional probability can calculate the probability of an event using its prior knowledge.

Equation () represents the Bayes’ theorem. A and B are two events.

P(A|B): This is conditional probability. An event A occurs, given that B has already occurred. It is called posterior probability.

P(A) and P(B): The probability of event A and event B respectively.

P(B|A): the conditional probability that event B occurs, given that A has already happened.

NB is a classifier which uses the Bayes Theorem. NB predicts membership probabilities for each class like the probability that a given data belongs to a particular class. The probabilities are counted after that, and the class which has the highest probability is the most likely class. This concept is also known as Maximum A Posteriori (MAP).

Random Forest (RF) algorithm is a type of supervised classification algorithm. RF create a forest with several trees. The number of trees in the forest is directly proportional to results it can get. The more the number of trees, the more results it can get. Using RF classifier can have many advantages. Few of them are listed below. Both classification and regression tasks can be solved using RF algorithm. RF algorithm can counter overfitting. Overfitting decreases the testing accuracy of the model. If the number of trees is sufficient in the RF algorithm, then the model will not overfit the data. The third advantage is RF classifier can handle missing values. So, if the data has missing value it will not impact the accuracy of the random forest. In the end RF classifier can also be built for categorical values. There are two stages in RF algorithm; in the first stage, the creation of the random forest, the other is to predict the random forest classifier built in the first stage.

The Logistic Regression (LR) model is a type of supervised classification model involving a linear discriminant. Given a set of inputs, LR does not try to predict the value of a numeric variable. Instead, it predicts that the output is a probability that the given input point belongs to a particular class. The central principle of Logistic Regression is the assumption that the input space can be separated into two regions, one for each class, by using a linear boundary. This dividing plane is called a linear discriminant, because, its linear function, and it helps the model classify between points belonging to different classes. The logistic regression models are categorized based on the number of target classes and use the functions like sigmoid or softmax functions to predict the target class. LR model uses the sigmoid function when there is binary classification task and softmax function when there is multiclassification task.

Deep Learning (DL) is the sub-domain of Machine Learning which is the sub-domain of Artificial Intelligence. DL is a set of algorithms that try to model high-level abstractions in data. While traditional machine learning algorithms are linear, DL algorithms are stacked in a hierarchy of increasing complexity and abstraction. DL tries to extract high-level abstract data representations through hierarchically, combining simple features into more complex features layer by layer. The depth of the neural networks allows them to construct a feature hierarchy of increasing abstraction, with each subsequent layer acting as a filter for increasingly complex features that combine those of the previous layer.