1: \begin{abstract}
2: The paper tackles the estimation of the effective dimension of a sample of dependent random vectors. The proposed method uses the principal component decomposition of sample covariance to establish a low-rank approximation that helps uncover the hidden structure. The number of principal components to be included in the decomposition is determined via a probabilistic principal components analysis embedded in a penalized profile likelihood criterion. The choice of the penalty parameter is guided by a data-driven procedure that is justified via analytical derivations and extensive finite sample simulations. Application of the proposed penalized approach is illustrated with {three} gene expression datasets in which the number of cancer subtypes is estimated from all expression measurements. The analyses point towards hidden structures in the data, e.g. additional subgroups, that could be of scientific interest.
3: \end{abstract}
4: