Canonical correlation analysis for elliptical copulas

J Multivar Anal. 2021 May:183:104715. doi: 10.1016/j.jmva.2020.104715. Epub 2020 Nov 23.

Abstract

Canonical correlation analysis (CCA) is a common method used to estimate the associations between two different sets of variables by maximizing the Pearson correlation between linear combinations of the two sets of variables. We propose a version of CCA for transelliptical distributions with an elliptical copula using pairwise Kendall's tau to estimate a latent scatter matrix. Because Kendall's tau relies only on the ranks of the data this method does not make any assumptions about the marginal distributions of the variables, and is valid when moments do not exist. We establish consistency and asymptotic normality for canonical directions and correlations estimated using Kendall's tau. Simulations indicate that this estimator outperforms standard CCA for data generated from heavy tailed elliptical distributions. Our method also identifies more meaningful relationships when the marginal distributions are skewed. We also propose a method for testing for non-zero canonical correlations using bootstrap methods. This testing procedure does not require any assumptions on the joint distribution of the variables and works for all elliptical copulas. This is in contrast to permutation tests which are only valid when data are generated from a distribution with a Gaussian copula. This method's practical utility is shown in an analysis of the association between radial diffusivity in white matter tracts and cognitive tests scores for six-year-old children from the Early Brain Development Study at UNC-Chapel Hill. An R package implementing this method is available at github.com/blangworthy/transCCA.

Keywords: Canonical correlation analysis; Kendall’s tau; resampling methods; robust methods; transelliptical distributions.