How to Use Independent Validation in Python
Authors
Abstract
To statistically test whether two groups or models differ, classifier accuracy is compared. However, common accuracy estimates like cross-validation have unknown distributions, making them unsuitable for statistical inference. Alternatives like permutation tests or train-test splits are computationally expensive and limited to frequentist tests against chance. Independent Validation (IV) is a more flexible alternative providing a known estimate distribution. This enables both conventional hypothesis testing and Bayesian analysis of classifier performance. Although Python is most widely used for machine learning, a Python implementation of IV has been lacking so far. This article introduces such an implementation; beyond the core IV algorithm, the package allows to: (1) plot accuracy against training set size, (2) estimate the posterior distribution of the asymptotic accuracy, and (3) query the posterior for statistics and credible intervals. This makes it easy to apply IV when comparing accuracy posteriors across classes, datasets, or classifiers on the same data.