1: \begin{abstract}
2:
3: Virtual try-on of eyeglasses involves placing eyeglasses of different shapes and styles onto a face image without physically trying them on.
4: While existing methods have shown impressive results, the variety of eyeglasses styles is limited and the interactions are not always intuitive or efficient.
5: To address these limitations, we propose a Text-guided Eyeglasses Manipulation method that allows for control of the eyeglasses shape and style based on a binary mask and text, respectively.
6: Specifically, we introduce a mask encoder to extract mask conditions and a modulation module that enables simultaneous injection of text and mask conditions.
7: This design allows for fine-grained control of the eyeglasses’ appearance based on both textual descriptions and spatial constraints.
8: Our approach includes a disentangled mapper and a decoupling strategy that preserves irrelevant areas, resulting in better local editing.
9: We employ a two-stage training scheme to handle the different convergence speeds of the various modality conditions, successfully controlling both the shape and style of eyeglasses.
10: Extensive comparison experiments and ablation analyses demonstrate the effectiveness of our approach in achieving diverse eyeglasses styles while preserving irrelevant areas.
11: \end{abstract}
12: