abstract:cbb0798b4329c5f2.tex

1: \begin{abstract}

2: Generating images with a Text-to-Image model often requires multiple trials, where human users iteratively update their prompt based on feedback, namely the output image.

3: Taking inspiration from cognitive work on reference games and dialogue alignment, this paper analyzes the dynamics of the user prompts along such iterations.

4: We compile a dataset of iterative interactions of human users with Midjourney.\footnote{Code and data are available in: \url{https://github.com/shachardon/Mid-Journey-to-alignment}}

5: Our analysis then reveals that prompts predictably converge toward specific traits along these iterations.

6: We further study whether this convergence is due to human users, realizing they missed important details, or due to adaptation to the model's ``preferences'', producing better images for a specific language style. We show initial evidence that both possibilities are at play.

7: The possibility that users adapt to the model's preference raises concerns about reusing user data for further training. The prompts may be biased towards the preferences of a specific model, rather than align with human intentions and natural manner of expression.

8:

9: \end{abstract}

10: