1: \begin{abstract}
2: A new methodology is proposed for generating realizations of a random
3: vector with values in a finite-dimensional Euclidean space that are
4: statistically consistent with a data set of observations of this vector.
5: The probability distribution of this random vector, while a-priori not
6: known, is presumed to be concentrated on an unknown subset of the
7: Euclidean space. A random matrix is introduced whose columns are
8: independent copies of the random vector and for which the number of
9: columns is the number of data points in the data set. The approach is
10: based on the use of (i) the multidimensional kernel-density estimation
11: method for estimating the probability distribution of the random
12: matrix, (ii) a MCMC method for generating realizations for the random
13: matrix, (iii) the diffusion-maps approach for discovering and
14: characterizing the geometry and the structure of the data set, and
15: (iv) a reduced-order representation of the random matrix, which is
16: constructed using the diffusion-maps vectors associated with the first
17: eigenvalues of the transition matrix relative to the given data
18: set. The convergence aspects of the proposed methodology are
19: analyzed and a numerical validation is explored through three
20: applications of increasing complexity. The proposed method is found to
21: be robust to noise levels and data complexity as well as to the intrinsic
22: dimension of data and the size of experimental data sets. Both the
23: methodology and the underlying mathematical framework presented in
24: this paper contribute new capabilities and perspectives at the
25: interface of uncertainty quantification, statistical data analysis,
26: stochastic modeling and associated statistical inverse problems.
27: % for boundary value problems, in the design of experiments for random
28: % parameters, and in signal processing and machine learning.
29: \end{abstract}
30: