abstract:f0102a2914bbf0dc.tex

1: \begin{abstract}

2: %%%%%%%%% Verion 3

3: % \jc{

4: Controllable generation through Stable Diffusion (SD) fine-tuning aims to improve fidelity, safety, and alignment with human guidance.

5: Existing reinforcement learning from human feedback methods usually rely on predefined heuristic reward functions or pretrained reward models built on large-scale datasets, limiting their applicability to scenarios where collecting such data is costly or difficult.

6: To effectively and efficiently utilize human feedback, we develop a framework, \myshorttitle{},

7: which leverages online human feedback collected on the fly during model learning. Specifically, \myshorttitle{} features two key mechanisms: (1) \emph{Feedback-Aligned Representation Learning}, an online training method that captures human feedback and provides informative learning signals for fine-tuning,

8: and (2) \emph{Feedback-Guided Image Generation}, which involve generating images from SD's refined initialization samples, enabling faster convergence towards the evaluator's intent.

9: We demonstrate that HERO is $4\times$ more efficient in online feedback for body part anomaly correction compared to the best existing method.

10: Additionally, experiments show that HERO can effectively handle tasks like reasoning, counting, personalization, and reducing NSFW content with only 0.5K online feedback.

11: % }

12:

13:

14:

15:

16:

17: % The experimental results on various tasks show that \myshorttitle{} is $4\times$ more feedback-efficient than the best existing method.

18: % Moreover, the model fine-tuned with \myshorttitle{} demonstrates its transferability of learned concepts to previously unseen prompts.

19: % We demonstrate \myshorttitle{} across various tasks, enabling effective human-controllable generation with significantly reduced online human feedback.

20:

21:

22:

23:

24: \end{abstract}

25: