1: \begin{abstract}
2: %%%%%%%%% Verion 3
3: % \jc{
4: Controllable generation through Stable Diffusion (SD) fine-tuning aims to improve fidelity, safety, and alignment with human guidance.
5: Existing reinforcement learning from human feedback methods usually rely on predefined heuristic reward functions or pretrained reward models built on large-scale datasets, limiting their applicability to scenarios where collecting such data is costly or difficult.
6: To effectively and efficiently utilize human feedback, we develop a framework, \myshorttitle{},
7: which leverages online human feedback collected on the fly during model learning. Specifically, \myshorttitle{} features two key mechanisms: (1) \emph{Feedback-Aligned Representation Learning}, an online training method that captures human feedback and provides informative learning signals for fine-tuning,
8: and (2) \emph{Feedback-Guided Image Generation}, which involve generating images from SD's refined initialization samples, enabling faster convergence towards the evaluator's intent.
9: We demonstrate that HERO is $4\times$ more efficient in online feedback for body part anomaly correction compared to the best existing method.
10: Additionally, experiments show that HERO can effectively handle tasks like reasoning, counting, personalization, and reducing NSFW content with only 0.5K online feedback.
11: % }
12:
13:
14:
15:
16:
17: % The experimental results on various tasks show that \myshorttitle{} is $4\times$ more feedback-efficient than the best existing method.
18: % Moreover, the model fine-tuned with \myshorttitle{} demonstrates its transferability of learned concepts to previously unseen prompts.
19: % We demonstrate \myshorttitle{} across various tasks, enabling effective human-controllable generation with significantly reduced online human feedback.
20:
21:
22:
23:
24: \end{abstract}
25: