1: \begin{abstract}
2: Cryo-Electron Microscopy (cryo-EM) is an increasingly popular experimental technique for estimating the 3D structure of macromolecular complexes such as proteins based on 2D images.
3: These images are notoriously noisy, and the pose of the structure in each image is unknown \textit{a priori}.
4: Ab-initio 3D reconstruction from 2D images entails estimating the pose in addition to the structure.
5: In this work, we propose a new approach to this problem.
6: We first adopt a multi-head architecture as a pose encoder to infer multiple plausible poses per-image in an amortized fashion.
7: This approach mitigates the high uncertainty in pose estimation by encouraging exploration of pose space early in reconstruction.
8: Once uncertainty is reduced, we refine poses in an auto-decoding fashion.
9: In particular, we initialize with the most likely pose and iteratively update it for individual images using stochastic gradient descent (SGD).
10: Through evaluation on synthetic datasets, we demonstrate that our method is able to handle multi-modal pose distributions during the amortized inference stage, while the later, more flexible stage of direct pose optimization yields faster and more accurate convergence of poses compared to baselines.
11: Finally, on experimental data, we show that our approach is faster than state-of-the-art cryoAI and achieves higher-resolution reconstruction.
12: \end{abstract}
13: