Click here to flash read.
The identification of the dependent components in multiple data sets is a
fundamental problem in many practical applications. The challenge in these
applications is that often the data sets are high-dimensional with few
observations or available samples and contain latent components with unknown
probability distributions. A novel mathematical formulation of this problem is
proposed, which enables the inference of the underlying correlation structure
with strict false positive control. In particular, the false discovery rate is
controlled at a pre-defined threshold on two levels simultaneously. The
deployed test statistics originate in the sample coherence matrix. The required
probability models are learned from the data using the bootstrap. Local false
discovery rates are used to solve the multiple hypothesis testing problem.
Compared to the existing techniques in the literature, the developed technique
does not assume an a priori correlation structure and work well when the number
of data sets is large while the number of observations is small. In addition,
it can handle the presence of distributional uncertainties, heavy-tailed noise,
and outliers.
No creative common's license