236353895db25649.tex
1: \begin{abstract}
2: The cost of both generalized least squares (GLS) and Gibbs sampling in a crossed
3: random effects model can easily grow faster than $N^{3/2}$ for $N$ observations. \cite{ghos:hast:owen:2021} develop a backfitting algorithm that reduces
4: the cost to $O(N)$.
5: Here we extend that method to
6: a generalized linear mixed model for logistic regression.
7: We use backfitting within an iteratively
8: reweighted penalized least squares algorithm.
9: The specific approach is a version of penalized quasi-likelihood
10: due to \cite{scha:1991}. A straightforward version of Schall's algorithm
11: would also cost more than $N^{3/2}$ because it requires the
12: trace of the inverse of a large matrix.
13: We approximate that quantity at cost $O(N)$
14: and prove that this substitution makes an asymptotically negligible difference.
15: % {\color{orange} The weights do
16: % not change the convergence of backfitting, because as Art hinted you
17: % can absorb the weights into $y$ and $Z$ and you are back to an
18: % unweighted problem, for example. But perhaps you refer to something else?}
19: % {\color{blue} Indeed, backfitting would converge. To show, the number of iterations do not grow with problem size, the $O(1)$ iteration analysis uses concentration inequalities of quatitites like $\sum_{i=1}^{R}Z_{ij}, \sum_{j=1}^{C}Z_{ij}, (Z^{\tran}Z)_{js}$. With $\cw$ coming in, we would need concentration of $\cw$ also if we include $\cw$ in $Z$.}
20: Our backfitting
21: algorithm also collapses the fixed effect with one random effect at a time
22: in a way that is analogous to the collapsed
23: Gibbs sampler of \cite{papa:robe:zane:2020}.
24: We use a symmetric operator that facilitates efficient covariance computation.
25: We illustrate our method on a real dataset from Stitch Fix.
26: By properly accounting for crossed random
27: effects we show that a naive logistic
28: regression could underestimate sampling
29: variances by several hundred fold.
30: \end{abstract}
31: