Binary cross entropy is a loss function used for binary classification tasks (tasks with only two outcomes/classes). It works by calculating the following average:
The above equation can be split into two parts to make it easier to understand:
The above graph shows that the further away the prediction is from the actual y value the bigger the loss gets.
That means that if the correct answer is 0, then the cost function will be 0 if the prediction is also 0. If the prediction approaches 1, then the cost function will approach infinity.
If the correct answer is 1, then the cost function will be 0 if the prediction is 1. If the prediction approaches 0, then the cost function will approach infinity.
Categorical crossentropy is a loss function used for multi-class classification tasks. The outputed loss is the negative average of the sum of the true values multiplied by the log of the predicted values .
A confusion matrix is a table that summarises the predictions of a classifier or classification model. By definition, entry in a confusion matrix is the number of observations actually in group , but predicted to be in group .
Precision is a metric for classification models that identifies the frequency with which a model was correct when predicting the positive class. Precision is defined as the number of true positives over the number of true positives plus the number of false positives.
The F1-Score is the harmonic mean of precision and recall. A perfect model will have an F1-Score of 1.
It's also possible to weight precision or recall differently using the -Score. Here a real factor is used to weight the recall times as much as the precision.
The ROC curve (receiver operating characteristic curve) is a graph that illustrates the performance of a classification model as its discrimination threshold is varied. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.
AUC stands for "Area under the ROC Curve". AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example. - Google Developers Machine Learning Crash Course
The Kullback-Leibler divergence, , often shortenend to just KL divergence, is a measure of how one probability distribution is different from a second, reference porbability distribution.
The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions. For unidimensional predictions, it is strictly equivalent to the mean squared error as applied to predicted probabilities. - Wikipedia
The mean squared error (MSE) or mean squared deviation (MSD) measure the average of the squares of the errors - that is, the average squared differences between the estimated and actual values.
The mean absolute error (MAE) measure the average of the absolute values of the errors - that is, the average absolute differences between the estimated and actual values.
Mean absolute percentage error is an extension of the mean absolute error (MAE) that divides the difference between the predicted value and the actual value by the actual value. The main idea of MAPD is to be sensitive to relative errors. It's for example not changed by a global scaling of the target variable.
The median absolute error also often called median absolute deviation (MAD) is metric that is particularly robust to outliers. The loss is calculated by taking the median of all absolute differences between the target and the prediction.
The coefficient of determination, denoted as is the proportion of the variation in the dependent variable that has been explained by the independent variables in the model.
The Tweedie distributions are a family of probability distributions, which include he purely continuous normal, gamma and Inverse Gaussian distributions and more.
The unit deviance of a reproductive Tweedie distribution is given by:
The -Score computes the percentage of deviance explained. It is a generalization of , where the squared error is replaced by the Tweedie deviance. - Scikit Learn
, also known as McFadden’s likelihood ratio index, is calculated as