6efff5329ec44208.tex
1: \begin{abstract}
2:   Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable
3:   analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure: in the first stage, we model the entry-wise missing mechanism by logistic regression, and in the second stage, we complete the target parameter matrix by maximizing a weighted log-likelihood with a low-rank constraint. We propose a fast and scalable estimation algorithm that achieves sublinear convergence, and the upper bound for the estimation error of the proposed method is rigorously derived.
4:   Experimental results support our theoretical claims, 
5:  and the proposed estimator shows its merits compared to other existing methods. 
6: The proposed method is applied to analyze the National Health and Nutrition Examination Survey data.
7: \end{abstract}
8: