Quantifying the separability of data classes in neural networks

Achim Schilling; Andreas Maier; Richard Gerum; Claus Metzner; Patrick Krauss

doi:10.1016/j.neunet.2021.03.035

Quantifying the separability of data classes in neural networks

Neural Netw. 2021 Jul:139:278-293. doi: 10.1016/j.neunet.2021.03.035. Epub 2021 Apr 5.

Authors

Achim Schilling¹, Andreas Maier², Richard Gerum³, Claus Metzner⁴, Patrick Krauss⁵

Affiliations

¹ Laboratory of Sensory and Cognitive Neuroscience, Aix-Marseille University, Marseille, France; Neuroscience Lab, University Hospital Erlangen, Germany; Cognitive Computational Neuroscience Group, University Erlangen-Nürnberg (FAU), Germany.
² Chair of Machine Intelligence, University Erlangen-Nürnberg (FAU), Germany.
³ Department of Physics and Center for Vision Research, York University, Toronto, Ontario, Canada.
⁴ Neuroscience Lab, University Hospital Erlangen, Germany; Chair of Biophysics, University Erlangen-Nürnberg (FAU), Germany.
⁵ Neuroscience Lab, University Hospital Erlangen, Germany; Cognitive Computational Neuroscience Group, University Erlangen-Nürnberg (FAU), Germany; Cognitive Neuroscience Center, University of Groningen, The Netherlands. Electronic address: patrick.krauss@fau.de.

PMID: 33862387
DOI: 10.1016/j.neunet.2021.03.035

Abstract

We introduce the Generalized Discrimination Value (GDV) that measures, in a non-invasive manner, how well different data classes separate in each given layer of an artificial neural network. It turns out that, at the end of the training period, the GDV in each given layer L attains a highly reproducible value, irrespective of the initialization of the network's connection weights. In the case of multi-layer perceptrons trained with error backpropagation, we find that classification of highly complex data sets requires a temporal reduction of class separability, marked by a characteristic 'energy barrier' in the initial part of the GDV(L) curve. Even more surprisingly, for a given data set, the GDV(L) is running through a fixed 'master curve', independently from the total number of network layers. Finally, due to its invariance with respect to dimensionality, the GDV may serve as a useful tool to compare the internal representational dynamics of artificial neural networks with different architectures for neural architecture search or network compression; or even with brain activity in order to decide between different candidate models of brain function.

Keywords: Data class separability; Deep learning interpretability; Discrimination value; Neural architecture search; Neural network analysis; Representational similarity analysis.

MeSH terms

Knowledge Bases
Machine Learning*