Getting log loss score in scikit-learn

scikit-learn

According to this wiki, “Logarithmic loss measures the performance of a classification model where the prediction input is a probability value between 0 and 1”.

Log loss is useful in getting a measure of the performance of a machine learning classifier. The goal is to minimize the log loss value, where 0 is a perfect score (all classification predictions correct).

“Log Loss takes into account the uncertainty of your prediction based on how much it varies from the actual label. This gives us a more nuanced view into the performance of our model.”

It’s easy to get a log loss score in scikit-learn using sklearn.metrics.log_loss. However, it may not be obvious how to get the predictions from your classifier returned as probability values, which log_loss() needs.

Enter predict_proba(), a method that most scikit-learn classifiers implement. You get your predictions using predict_proba(), and use those to get the log loss score, like so:

clf = LogisticRegression()
clf.fit(X, y)
clf_probs = clf.predict_proba(X_test)
log_loss_score = log_loss(y_test, clf_probs)

As simple as that!

Leave a Reply