MINE: Maximal Information-based Nonparametric Exploration

Technical Information


The calculation of MIC and other MINE statistics

MIC and the other MINE statistics are calculated from a matrix of scores generated from a given set of two-variable data. This matrix, called the characteristic matrix, is created by searching for grids that maximize the penalized mutual information of the distribution induced on each grid's cells by the data. Different relationship types give rise to characteristic matrices with different properties. For instance, strong relationships yield characteristic matrices with high peaks, monotonic relationships yield symmetric characteristic matrices, and complex relationships yield characteristic matrices whose peaks are far from the origin.

Statistical significance

As with any data exploration technique, it is important to address multiple testing concerns thoroughly when using MINE statistics. We suggest doing so by controlling the false discovery rate as in the paper describing MINE. For this purpose, we have pre-computed tables of uncorrected p-values of various MIC scores at different sample sizes.

Publication

More detailed information is contained in the published paper describing MINE:

D. Reshef*, Y. Reshef*, H. Finucane, S. Grossman, G. McVean, P. Turnbaugh, E. Lander, M. Mitzenmacher**, P. Sabeti**. Detecting novel associations in large datasets. Science 334, 6062 (2011). [abstract] [full text] [reprint] [accompanying commentary]

*,** These authors contributed equally to this work and are listed alphabetically.