abstract:fcb64a8b951f408b.tex

1: \begin{abstract}

2:

3:

4: Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable.

5: Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations.

6: In this work, we ameliorate this shortcoming by incorporating chemistry-aware Large Language Models (LLMs) into EAs.

7: Namely, we redesign crossover and mutation operations in EAs using LLMs trained on large corpora of chemical information. We perform extensive empirical studies on both commercial and open-source models on multiple tasks involving property optimization, molecular rediscovery, and structure-based drug design, demonstrating that the joint usage of LLMs with EAs yields superior performance over all baseline models across single- and multi-objective settings.

8: We demonstrate that our algorithm improves both the quality of the final solution and convergence speed, thereby reducing the number of required objective evaluations. Our code is available at \url{https://github.com/zoom-wang112358/MOLLEO}.

9:

10:

11:

12:

13:

14: \end{abstract}

15: