1: \begin{abstract}
2: We study Bayesian histograms for distribution estimation on $[0,1]^d$ under the Wasserstein $W_v, 1 \leq v < \infty$ distance in the i.i.d sampling regime. We newly show that when $d < 2v$, histograms possess a special \textit{memory efficiency} property, whereby in reference to the sample size $n$, order $n^{d/2v}$ bins are needed to obtain minimax rate optimality. This result holds for the posterior mean histogram and with respect to posterior contraction: under the class of Borel probability measures and some classes of smooth densities. The attained memory footprint overcomes existing minimax optimal procedures by a polynomial factor in $n$; for example an $n^{1 - d/2v}$ factor reduction in the footprint when compared to the empirical measure, a minimax estimator in the Borel probability measure class. Additionally constructing both the posterior mean histogram and the posterior itself can be done super--linearly in $n$. Due to the popularity of the $W_1,W_2$ metrics and the coverage provided by the $d < 2v$ case, our results are of most practical interest in the $(d=1,v =1,2), (d=2,v=2), (d=3,v=2)$ settings and we provide simulations demonstrating the theory in several of these instances. %In proving posterior contraction, we give an example of how to leverage conjugacy to deal with metrics, such as $W_1$, whose optimal convergence rates are faster than those attained under the Kullback–Leibler divergence and metrics, such as $W_v$ ($v \geq 2$), that are not dominated by the Hellinger metric.
3: %to avoid the \cite{ghosal2000} method, which can be desirable when dealing with a metric such as $W_v$ where the optimal convergence rate is faster than under $KL$.
4: \end{abstract}
5: