abstract:288105658d92ff20.tex

1: \begin{abstract}

2: A recent literature in econometrics models unobserved cross-sectional heterogeneity in panel data by assigning each cross-sectional unit a one-dimensional, discrete latent type.

3: Such models have been shown to allow estimation and inference by regression clustering methods.

4: This paper is motivated by the finding that the clustered heterogeneity models studied in this literature can be badly misspecified, even when the panel has significant discrete cross-sectional structure.

5: To address this issue, we generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple, imperfectly-correlated latent variables that describe its response-type to different covariates.

6: We give inference results for a k-means style estimator of our model and develop information criteria to jointly select the number clusters for each latent variable.

7: Monte Carlo simulations confirm our theoretical results and give intuition about the finite-sample performance of estimation and model selection.

8: We also contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting.

9: Our results suggest that over-fitting can be severe in k-means style estimators when the number of clusters is over-specified.

10: \end{abstract}

11: