1: \begin{abstract}
2: Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable
3: analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure: in the first stage, we model the entry-wise missing mechanism by logistic regression, and in the second stage, we complete the target parameter matrix by maximizing a weighted log-likelihood with a low-rank constraint. We propose a fast and scalable estimation algorithm that achieves sublinear convergence, and the upper bound for the estimation error of the proposed method is rigorously derived.
4: Experimental results support our theoretical claims,
5: and the proposed estimator shows its merits compared to other existing methods.
6: The proposed method is applied to analyze the National Health and Nutrition Examination Survey data.
7: \end{abstract}
8: