1: \begin{abstract}
2: Generating images with a Text-to-Image model often requires multiple trials, where human users iteratively update their prompt based on feedback, namely the output image.
3: Taking inspiration from cognitive work on reference games and dialogue alignment, this paper analyzes the dynamics of the user prompts along such iterations.
4: We compile a dataset of iterative interactions of human users with Midjourney.\footnote{Code and data are available in: \url{https://github.com/shachardon/Mid-Journey-to-alignment}}
5: Our analysis then reveals that prompts predictably converge toward specific traits along these iterations.
6: We further study whether this convergence is due to human users, realizing they missed important details, or due to adaptation to the model's ``preferences'', producing better images for a specific language style. We show initial evidence that both possibilities are at play.
7: The possibility that users adapt to the model's preference raises concerns about reusing user data for further training. The prompts may be biased towards the preferences of a specific model, rather than align with human intentions and natural manner of expression.
8:
9: \end{abstract}
10: