A Human Feedback Strategy for Photoresponsive Molecules in Drug Delivery: Utilizing GPT-2 and Time-Dependent Density Functional Theory Calculations

Hu, Junjie; Wu, Peng; Wang, Shiyi; Wang, Binju; Yang, Guang

doi:10.3390/pharmaceutics16081014

Open AccessArticle

A Human Feedback Strategy for Photoresponsive Molecules in Drug Delivery: Utilizing GPT-2 and Time-Dependent Density Functional Theory Calculations

by

Junjie Hu

¹

,

Peng Wu

^2,*,

Shiyi Wang

³,

Binju Wang

⁴ and

Guang Yang

^3,5,6,7,*

¹

Faculty of Medicine, Imperial College London, London SW7 2AZ, UK

²

School of Chemistry and Chemical Engineering, Ningxia University, Yinchuan 750014, China

³

Bioengineering Department and Imperial-X, Imperial College London, London W12 7SL, UK

⁴

College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China

⁵

National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK

⁶

Cardiovascular Research Centre, Royal Brompton Hospital, London SW3 6NP, UK

⁷

School of Biomedical Engineering & Imaging Sciences, King’s College London, London WC2R 2LS, UK

^*

Authors to whom correspondence should be addressed.

Pharmaceutics 2024, 16(8), 1014; https://doi.org/10.3390/pharmaceutics16081014

Submission received: 28 May 2024 / Revised: 11 July 2024 / Accepted: 19 July 2024 / Published: 31 July 2024

(This article belongs to the Special Issue Advanced Materials Science and Technology in Drug Delivery)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Photoresponsive drug delivery stands as a pivotal frontier in smart drug administration, leveraging the non-invasive, stable, and finely tunable nature of light-triggered methodologies. The generative pre-trained transformer (GPT) has been employed to generate molecular structures. In our study, we harnessed GPT-2 on the QM7b dataset to refine a UV-GPT model with adapters, enabling the generation of molecules responsive to UV light excitation. Utilizing the Coulomb matrix as a molecular descriptor, we predicted the excitation wavelengths of these molecules. Furthermore, we validated the excited state properties through quantum chemical simulations. Based on the results of these calculations, we summarized some tips for chemical structures and integrated them into the alignment of large-scale language models within the reinforcement learning from human feedback (RLHF) framework. The synergy of these findings underscores the successful application of GPT technology in this critical domain.

Keywords:

drug delivery; photoresponsive molecule; GPT; TDDFT; RLHF

1. Introduction

Smart drug delivery systems, which have gained significant attention in pharmaceutical research, enhance patient health by ensuring targeted therapeutic delivery [1]. In recent decades, artificial intelligence (AI) has demonstrated its capability to address complex challenges within pharmaceutical research, particularly in smart drug delivery, from the observable to the micro-/nanoscale [2,3,4]. These advanced drug delivery systems are designed to be self-regulating, responding to a range of stimuli associated with disease pathology [5,6]. Light stands out as an external actuation method for therapeutic applications due to its non-invasive characteristics, stability in biological settings, adjustable intensity, and unparalleled temporal and spatial precision [7,8,9,10,11,12].

It is widely used in smart delivery systems for various therapeutics, with different wavelengths ranging from ultraviolet (UV, 100–400 nm) to visible (400–750 nm) and near-infrared (NIR, 750–2000 nm) light eliciting different responses [13]. Short-wavelength light, or UV light, has enough energy to alter chemical bonds and configurations, such as breaking covalent bonds or changing cis–trans conformations, effectively triggering drug delivery mechanisms [14,15]. Owing to these advantages, UV light is frequently used as a stimulus in diverse research and applications [16,17].

It is expected that deep learning will also be used in the solutions of more problems in drug delivery, including the design of stimulus-responsive molecules [4,18]. The prediction of temporal dynamics in drug delivery has been accomplished through the application of convolutional neural networks and long short-term memory networks [19]. When deep learning is used to solve molecular design problems in chemistry and materials, the lack of specialized datasets often leads to task failure. Advancements in pre-training and fine-tuning methods for large language models (LLMs) have facilitated the creation of chemical molecules [20,21]. The potential utilization of the generative pre-trained transformer for designing light-responsive molecules is significant, and there is currently limited research in this specific domain.

Causal language models such as GPT, GPT-2, and GPT-3 are trained to calculate/predict the probability of the occurrence of several words given all preceding words, making them ideal for text generation [22,23,24]. After character-level RNNs and masked transformer language models were used to capture Simplified Molecular Input Line Entry System (SMILES) language patterns [21,25], Sanjar Adilov constructed a GPT-2-like language model to learn SMILES representations and transfer knowledge to downstream molecular generative tasks [26]. In subsequent studies, GPT has attracted the attention of more researchers [27,28]. Additionally, instruction tuning using human experience to enhance large models has proven to be a highly effective method for improving the quality of generated content. This approach includes the use of Direct Preference Optimization (DPO) algorithms [29], Contrastive Preference Optimization (CPO) algorithms [30], and others. Recently, researchers have also proposed Kahneman–Tversky Optimization (KTO) algorithms to simplify dataset preparation for this purpose [31]. Our integration of GPT-2 with these techniques for applications in light-responsive drug delivery represents a very promising and meaningful exploration.

The open-source dataset QM7b, provided by the Quantum Machine Project, encompasses a variety of physicochemical properties of molecules [32,33]. In order to ultimately achieve the generation of light-responsive drug delivery molecules, these data were used to fine-tune the pre-trained language model [33,34]. Among the physicochemical properties in the dataset are the excitation energies of the molecules. Our UV-GPT inherits the transformer structure from GPT-2 and utilizes the tokenizer from SMILES-GPT, facilitating the downstream generation of UV-responsive molecules. By integrating predictive modeling and TDDFT calculations, we discovered that the fine-tuning UV-GPT model generates molecules with UV photoresponsive properties. The molecules generated were positively influenced by the pre-trained SMILES-GPT, as evidenced by the statistics on drug-like properties and synthesizability. This evidence shows that our application of the combined GPT and TDDFT calculations in designing stimulus-responsive molecules holds practical value for drug delivery. Additionally, we explored various implementations of RLHF, utilizing structural knowledge generated by computational chemistry as human feedback to enhance the quality of the generated content. However, our current model does not fully integrate the chemists’ extensive theoretical knowledge of chemistry.

2. Materials and Methods

Pre-trained language model and adapter tuning

We trained our model for 6 epochs on the SMILES strings of the QM7b datasets [32,33]. We used AdamW for optimization and cosine annealing for learning-rate scheduling. The initial and final learning rates were set to

5 \times 10^{- 4}

and

5 \times 10^{- 8}

, respectively. We kept the default Adam hyperparameters and optimized the batch size (128) and maximum sequence length (512).

Our SMILES tokenizer was pre-trained on SMILES-GPT. This transformer decoder replicates GPT-2 except that during tokenization, it used the character-level byte-pair encoding instead of byte-level encoding. We reserved 72 characters from the SMILES alphabet as an initial vocabulary and supplemented the vocabulary with up to 1000 of the most frequent merges. The model uses parameterized token and position embeddings, 8 attention heads, and 4 attention blocks. With the embedding/hidden dimension of 512, it has 13.4 M parameters.

Prediction model for excitation energy

The Coulomb matrix of the molecule was generated using DeepChem computation. The training set and test set were divided in an 80:20 ratio. Implementation of Support Vector Regression (SVR) with a quantum kernel was achieved with sklearn and qiskit, while SVR with alternative kernels was implemented solely with sklearn.

Drug-likeness and evaluation metrics for generative molecules

Drug-likeness is a consideration when evaluating the generative molecules for photoresponsive drug delivery. We utilized the quantitative estimate of drug-likeness (QED) as a metric, as introduced in [35]. The QED metric yields a numerical score ranging from 0 to 1, where elevated scores correspond to an increased probability of drug-likeness.

The Synthetic Accessibility Score (SAscore) [36], used to assess the ease of synthesizing drug-like molecules, rates molecules from 1 to 10 based on historical synthetic data and molecular complexity. Fragment contributions and a complexity penalty form the basis, derived from PubChem’s vast molecule database. Validation against expert chemists’ estimations showed strong agreement (r² = 0.89). This method harnesses big data to streamline and enhance the synthesis evaluation process in molecular design.

DFT and TDDFT simulation

Density functional theory (DFT) calculations were performed on the molecules using ORCA 5.0.4 [37]. Optimization of molecules was performed at the

P B E 0 / 6 - 311 G *

level of theory [38]. Time-dependent density functional theory (TDDFT) calculations were performed at the

P B E 0 / T Z V P

level [38,39]. The gas phase, water, and chlorobenzene solvents were modeled using the implicit solvent polarizable continuum model

(P C M)

[40,41] with Grimme’s D3 [42,43,44] dispersion corrections during optimization and TDDFT calculations.

RLHF

We utilized the transformer reinforcement learning package for reinforcement learning with human feedback. In the fine-tuning process, we applied Direct Preference Optimization (DPO), Contrastive Preference Optimization (CPO), and Kahneman–Tversky Optimization (KTO) with the chemical knowledge datasets.

The default sigmoid loss was used in DPO, where the beta factor was set at 0.1. Similarly, the loss type of CPO was sigmoid, and its beta factor was set at 0.1. The beta factor in the KTO loss was set at 0.1, with a higher value meaning less divergence from the initial policy. The desirable and undesirable losses of KTO are weighed by desirable weight and undesirable weight, respectively. Both of them were set at 1.0.

The preference dataset used in DPO and CPO is a dictionary object with the keys ‘prompt’, ‘chosen’, and ‘rejected’. The binary signal dataset used in KTO is a dictionary object with the keys ‘prompt’, ‘completion’, and ‘label’.

3. Results and Discussion

3.1. Generative Workflow for Photoresponsive Drug Delivery

Numerous studies have focused on utilizing AI-based databases to scale up, optimize, and accelerate the development of nanocarrier drug delivery systems that are safe, effective, and stable. Endogenous triggers like pH variations, hormone levels, enzymatic actions, overexpression of biomarkers, glucose, or redox gradients are intrinsic to the body’s disease state. These triggers can externally prompt or amplify drug release in affected areas [6]. UV light’s high energy can modify chemical bonds, facilitating drug delivery mechanisms. Its versatility makes UV a common stimulus in research and applications for drug delivery. To design more effective UV-responsive drug delivery molecules, we employed pre-trained language models, fine-tuned target molecule datasets, and used machine learning predictive models of molecular excitation energies.

To date, no studies have established a methodology for applying GPT technology to drug delivery molecules and validating its efficacy. Here, we opted for a transformer structure with adapter layers, specifically the GPT-2 model. Pre-trained on the PubChem dataset, this GPT-2 model aims to generate molecules with high drug-likeness and easy synthesis. A SMILES tokenizer was developed by Sanjar Adilov, based on atomic dictionaries and linked characters. The linked characters of molecular SMILES provide information about chemical bonds and molecular configurations. The SMILES tokenizer also incorporates Multi-Layer Perceptron (MLP) and attention mechanism networks. Our goal is to adapt a pre-trained GPT for generating effective photoresponsive molecules using adapter layers. Once this workflow is proven to work, we can integrate various pre-trained language models through Hugging Face. This workflow for generating stimulus-responsive molecules via a pre-trained language model is shown in Figure 1.

The QM7b dataset provides excitation energies for molecules, which were utilized as input for fine-tuning our UV-GPT and training our screening model. This open-source dataset also includes the Coulomb matrix and physicochemical properties of molecules. Prof. Alexandre Tkatchenko’s group shared the SMILES of molecules with us [32,33], which are crucial for our adapter training and serve as the foundation of our workflow. Our generative pre-trained transformer for UV light-responsive drug delivery (UVGPT) utilized training datasets containing molecules with excitation energies ranging from 4.13 to 12.41 eV. Understanding the differences in the properties of various molecules based on chemical bonds, atomic potentials, and molecular conformations is a direct manifestation of the quantitative structure–activity relationship (QSAR) of molecules. Experienced chemical researchers can design molecules with specific properties based on their intuition. Our UVGPT learns the QSAR of UV light-responsive molecules from training datasets.

3.2. Screening the Generative Molecules

After training our UVGPT on UV light-responsive molecules, a total of one thousand molecules were generated by the model. Among these, 443 possessed valid chemical structures processed with RDKit. Dylan M. Anstine and Olexandr Isayev [45] reviewed generative methods in chemical sciences and proposed evaluation metrics. The methods for calculating drug-likeness and SAscore, as outlined in their publications [45], were employed in our research. QED estimated the drug-likeness of molecules using a machine learning model trained on a dataset of drug-like compounds. Higher QED values suggest an increased likelihood of drug-likeness. Similarly, the Synthetic Accessibility Score calculation methods aimed to systematically assess the ease of synthesizing drug-like molecules, aiding in prioritizing compounds for molecular design.

In the following workflow, we evaluated the excitation energy, drug-likeness, and SAscore of these molecules, as illustrated in Figure 2. We utilized the Coulomb matrix of molecules as the molecular descriptor in our prediction model. Support Vector Regression (SVR) served as the predictive machine learning model. We methodically compared various sets of parameters for SVR, encompassing different kernels (rbf, sigmoid, and quantum), regularization parameters (from 0.01 to 80), and kernel coefficients (auto, scale, 0.8, 0.84, and 2.3). Based on the Mean Squared Error (MSE) of both the training and test sets, our predictive model selected the rbf kernel, a regularization parameter of 80, and a kernel coefficient of 0.84. Prior to utilizing SVR for predicting the excitation energies of generative molecules, we employed DeepChem to compute the Coulomb matrix of these generative molecules. The results are shown in Figure 2a.

When analyzing the density distribution plots of excitation energies depicted in Figure 2a, we observe a range spanning from 5 to 11 eV, with the majority concentrated between 8 and 9 eV. This distribution closely aligns with that of the training samples, indicating UVGPT’s success in learning the QSAR from molecular data and facilitating molecular design.

To refine the assessment of molecule excitation energies via precise TDDFT quantum chemical calculations, we conducted additional screening of the molecules. Given that drug delivery molecules, while not directly engaged in target binding at the lesion site, still exert direct effects on the human body, drug-like characteristics are equally crucial for the generated molecules. The parameter-tuned UVGPT inherited the pre-trained model’s performance on PubChem, and the distribution calculated using the QED method is illustrated in Figure 2b.

Based on the QED ranking, we identified the nine molecules with the highest degree of drug-like properties, and their corresponding SMILES representations and QED values are shown in Figure 3. These nine molecules exhibited QED values ranging from 0.528 to 0.57. Notably, the molecule with the SMILES notation OC1CC1OC(C)C demonstrated the highest degree of drug-like properties and was absent from the PubChem database. Additionally, the compound (1S,2S)-2-methylcyclopropan-1-ol in PubChem shared similarities with OC1CC1OC(C)C. Both findings further contribute to the evidence demonstrating the effectiveness of UVGPT.

Similarly, based on the density distribution results of the SAscore in Figure 2c, we identified eight recommended molecules after filtering out mediocre results. The SAscore values of these eight molecules ranged from 5.47 to 5.92. Notably, the molecule with the SMILES notation C=CC=NSN=C achieved an SAscore value of 5.537, indicating its status as a conjugated alkene. Additionally, the molecule with the SMILES notation CC(C(Cl)N)S=N also achieved an SAscore value of 5.537, classifying it as a cumulene. These are shown in Figure 4.

3.3. Quantum Chemical Calculations

To enhance our understanding and validate the outcomes generated by UVGPT, we conducted quantum chemical simulations to compute the physicochemical properties of the screened molecules, focusing primarily on their excitation energies. Additionally, we assessed the rationality and stability of the molecular structures within the chemical context.

To validate the excited state properties, vertical excitation by DFT calculations were performed. As shown in Table 1, nearly all (16/17) generated molecules contained heteroatoms, including oxygen, sulfur, chlorine, and especially nitrogen, suggesting potential for biological applications. Six molecules were saturated with a first excited energy greater than 6.199 eV (putative corresponding maximum absorption peak wavelength < 200 nm). Seven molecules exhibited a maximum absorption peak wavelength between 200 nm and 400 nm. The remaining five molecules were in the visible and near-infrared bands. Enhancements in the generalization of predictive models could contribute to increased efficiency in our smart workflow.

Moreover, as discussed earlier, the molecule with the SMILES notation OC1CC1OC(C)C is significant for drug application. Unfortunately, this molecule exhibits far-ultraviolet absorption and is unlikely to be used in drug delivery applications. Additionally, we did not find useful analogs in the databases. Therefore, we propose that modifying the isopropyl group by introducing double bonds (e.g., aldehyde, nitro, etc.) could decrease the excitation energy, potentially making it suitable for photoresponsive molecules with special drug applications.

It is widely recognized that conjugated alkenes are more stable than cumulenes. The cumulene molecule generated by UVGPT does not conform to an optimal structure within the bounds of established chemical knowledge. For example, the molecule with the SMILES notation CC1NC(C)=C=C1 (cumulene) lies several kcal/mol higher in Gibbs energy compared to its isomer. Addressing this issue requires incorporating additional chemists’ intuition and improving the quality of the datasets utilized.

Additionally, the conjugated alkene with the SMILES notation C=CC=NSN=C exhibits ultraviolet absorption properties. Its ability to undergo photochemical reactions upon UV light exposure renders it potentially useful in drug delivery applications.

3.4. Fine-Tuning with Human Feedback

By combining the results of our quantum chemical simulations shown in Table 1 with the molecular structures shown in Figure 3 and Figure 4, we have leveraged our chemical knowledge to establish three structural criteria for identifying molecules with longer excitation wavelengths:

The molecule has a polyatomic ring structure.
The atomic ring structure contains more than one unsaturated chemical bond.
The ring structure includes non-carbon atoms, such as nitrogen (N), sulfur (S), oxygen (O), etc.

We filtered the molecular data according to these three criteria, resulting in 121 molecules that meet a weakened criterion and 251 molecules that do not meet the criteria at all. The weakened criterion means satisfying at least two of the three conditions.

As shown in Figure 5, the KTO instruction fine-tuning relies on a binary signal dataset labeled based on human experience. We labeled 121 molecules that meet the weakened criterion as ‘True’ and those that do not meet the criteria at all as ‘False’, resulting in a binary signal dataset of 372 entries. Additionally, the DPO and CPO algorithms rely on a preference dataset, which consists of 121 molecules for the recommended samples and another set for the rejected data. As noted by the proposer of the KTO algorithm, the training set for KTO is much easier to obtain under these conditions. Our findings in our application were consistent with this.

We further introduced a low-rank adapter based on the pre-trained GPT-2 to implement the fine-tuned model, aiming to make the generated molecular content more aligned with chemical experience and intuition. The structure of the fine-tuning model is shown in Figure 5, in which we updated the parameters of the adapter within the RLHF framework. The training parameters were primarily sourced from the relevant functional layers of the GPT-2 attention mechanism. Additionally, we fine-tuned the GPT-2-based model using KTO, DPO, and CPO trainers, respectively. We then counted the number of molecules in the generated content that satisfied the aforementioned weakened condition, with the results shown in Table 2. It was found that CPO produced a higher total number and proportion of valid molecular data. Although the dichotomous dataset for the KTO algorithm is more readily available, the quality of its generated content was significantly inferior to that produced by the other models in our scenario. Nonetheless, we recognize the potential of these three types of trainers, along with additional RLHF algorithms, for developing more effective language models for our task.

4. Future Work

Previous studies have highlighted the limitations of UV-responsive materials for in vivo applications, citing concerns like potential cellular photodamage and inadequate penetration [10]. In subsequent investigations, researchers have turned their attention to visible and near-infrared light-responsive molecular materials [11,46]. However, the development of molecules utilizing LLM modeling for this purpose remains hindered by the absence of high-quality datasets. We aim to tackle this challenge to enhance the applicability and credibility of our workflow for medical clinical applications.

The remarkable success of the generative pre-trained transformer in various applications has garnered significant attention from both academia and industry. Communities like AdapterHub and Hugging Face have amassed numerous open-source pre-trained transformer structures, facilitating the design of light-responsive molecules for drug delivery systems. This diversity of molecular generation tools and content enhances the applicability of these technologies. Moreover, these advancements can seamlessly integrate into the workflows discussed in this paper.

While our current models are proficient in inheriting properties from training datasets, they have yet to reach a level of innovative insight that could rival human chemists. There is a need to further explore methods for refining the model parameters using additional knowledge and more effective techniques. Reinforcement learning from human feedback (RLHF) methods provide an exciting opportunity to incorporate more theoretical chemical knowledge into generating high-quality molecular content. To achieve this goal, we need to invest further efforts into designing the entire algorithmic framework of the generative pre-trained transformer (GPT) from scratch.

Author Contributions

Conceptualization, J.H. and G.Y.; Methodology, J.H., P.W., S.W., B.W. and G.Y.; Software, J.H. and P.W.; Validation, J.H.; Formal analysis, J.H. and P.W.; Investigation, J.H.; Data curation, J.H. and P.W.;Writing—original draft, J.H.;Writing—review & editing, J.H. and G.Y.; Visualization, J.H.; Project administration, J.H. and G.Y.; Funding acquisition, G.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported in part by the ERC IMI (101005122), the H2020 (952172), the MRC (MC/PC/21013), the Royal Society (IEC/NSFC/211235), the NVIDIA Academic Hardware Grant Program, the SABER project supported by Boehringer Ingelheim Ltd., NIHR Imperial Biomedical Research Centre (RDA01), Wellcome Leap Dynamic Resilience, UKRI guarantee funding for Horizon Europe MSCA Postdoctoral Fellowships (EP/Z002206/1), and the UKRI Future Leaders Fellowship (MR/V023799/1).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data files and Codes could be found via this link: https://github.com/jhu22/Pharmaceutics2024.

Acknowledgments

Many thanks to Alexandre Tkatchenko and Leonardo Medrano Sandonas for their important help in understanding and using QM7b (http://quantum-machine.org). We extend our gratitude to Qi Li for collecting the literature on drug delivery.

Conflicts of Interest

The authors declare no conflict of interest.

References

Vargason, A.M.; Anselmo, A.C.; Mitragotri, S. The evolution of commercial drug delivery technologies. Nat. Biomed. Eng. 2021, 5, 1–17. [Google Scholar] [CrossRef] [PubMed]
Hassanzadeh, P.; Atyabi, F.; Dinarvand, R. The significance of artificial intelligence in drug delivery system design. Adv. Drug Deliv. Rev. 2019, 151–152, 169–190. [Google Scholar] [CrossRef] [PubMed]
Meenakshi, D.U.; Nakumar, S.; Francis, A.P.; Sweety, P.; Fuloria, S.; Fuloria, N.K.; Subramaniyan, V.; Khan, S.A. Deep Learning and Site-Specific Drug Delivery; Wiley: Hoboken, NJ, USA, 2022; pp. 1–38. [Google Scholar] [CrossRef]
Vora, L.K.; Gholap, A.D.; Jetha, K.; Thakur, R.R.S.; Solanki, H.K.; Chavda, V.P. Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design. Pharmaceutics 2023, 15, 1916. [Google Scholar] [CrossRef] [PubMed]
Tao, Y.; Chan, H.F.; Shi, B.; Li, M.; Leong, K.W. Light: A Magical Tool for Controlled Drug Delivery. Adv. Funct. Mater. 2020, 30, 2005029. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Yang, F.; Xiong, F.; Gu, N. The Smart Drug Delivery System and Its Clinical Potential. Theranostics 2016, 6, 1306–1323. [Google Scholar] [CrossRef] [PubMed]
Lan, G.; Ni, K.; Lin, W. Nanoscale metal–organic frameworks for phototherapy of cancer. Coord. Chem. Rev. 2019, 379, 65–81. [Google Scholar] [CrossRef] [PubMed]
Bouchaala, R.; Anton, N.; Anton, H.; Vamme, T.; Vermot, J.; Smail, D.; Mély, Y.; Klymchenko, A.S. Light-triggered release from dye-loaded fluorescent lipid nanocarriers in vitro and in vivo. Colloids Surfaces B Biointerfaces 2017, 156, 414–421. [Google Scholar] [CrossRef] [PubMed]
Son, J.; Yi, G.; Yoo, J.; Park, C.; Koo, H.; Choi, H.S. Light-responsive nanomedicine for biophotonic imaging and targeted therapy. Adv. Drug Deliv. Rev. 2019, 138, 133–147. [Google Scholar] [CrossRef] [PubMed]
Jia, S.; Fong, W.-K.; Graham, B.; Boyd, B.J. Photoswitchable Molecules in Long-Wavelength Light-Responsive Drug Delivery: From Molecular Design to Applications. Chem. Mater. 2018, 30, 2873–2887. [Google Scholar] [CrossRef]
Cho, H.J.; Chung, M.; Shim, M.S. Engineered photo-responsive materials for near-infrared-triggered drug delivery. J. Ind. Eng. Chem. 2015, 31, 15–25. [Google Scholar] [CrossRef]
Liu, J.; Kang, W.; Wang, W. Photocleavage-based Photoresponsive Drug Delivery†. Photochem. Photobiol. 2021, 98, 288–302. [Google Scholar] [CrossRef] [PubMed]
Barhoumi, A.; Liu, Q.; Kohane, D.S. Ultraviolet light-mediated drug delivery: Principles, applications, and challenges. J. Control. Release 2015, 219, 31–42. [Google Scholar] [CrossRef] [PubMed]
Weissleder, R. A clearer vision for in vivo imaging. Nat. Biotechnol. 2001, 19, 316–317. [Google Scholar] [CrossRef] [PubMed]
Bagheri, A.; Arandiyan, H.; Boyer, C.; Lim, M. Lanthanide-Doped Upconversion Nanoparticles: Emerging Intelligent Light-Activated Drug Delivery Systems. Adv. Sci. 2016, 3, 1500437. [Google Scholar] [CrossRef] [PubMed]
Karimi, M.; Zangabad, P.S.; Baghaee-Ravari, S.; Ghazadeh, M.; Mirshekari, H.; Hamblin, M.R. Smart Nanostructures for Cargo Delivery: Uncaging and Activating by Light. J. Am. Chem. Soc. 2017, 139, 4584–4610. [Google Scholar] [CrossRef] [PubMed]
Linsley, C.S.; Wu, B.M. Recent advances in light-responsive on-demand drug-delivery systems. Ther. Deliv. 2017, 8, 89–107. [Google Scholar] [CrossRef] [PubMed]
Gao, J.; Karp, J.M.; Langer, R.; Joshi, N. The Future of Drug Delivery. Chem. Mater. 2023, 35, 359–363. [Google Scholar] [CrossRef] [PubMed]
Harrison, P.J.; Wiesler, H.; Sabirsh, A.; Karlsson, J.; Malmsjö, V.; Hellander, A.; Wählby, C.; Spjuth, O. Deep-learning models for lipid nanoparticle-based drug delivery. Nanomedicine 2021, 16, 1097–1110. [Google Scholar] [CrossRef] [PubMed]
Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C.A.; Bekas, C.; Lee, A.A. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. Acs Cent. Sci. 2019, 5, 1572–1583. [Google Scholar] [CrossRef]
Chithrananda, S.; Grand, G.; Ramsundar, B. ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction. arXiv 2020, arXiv:2010.09885. [Google Scholar]
Openai, A.; Openai, K.; Openai, T.; Openai, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 1 July 2024).
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Gupta, A.; Müller, A.T.; Huisman, B.J.H.; Fuchs, J.A.; Schneider, P.; Schneider, G. Generative Recurrent Networks for De Novo Drug Design. Mol. Inform. 2017, 37, 1700111. [Google Scholar] [CrossRef] [PubMed]
Adilov, S. Generative Pre-Training from Molecules; Cambridge Engage Preprints: Cambridge, UK, 2021; Volume 16, Available online: https://chemrxiv.org/engage/chemrxiv/article-details/6142f60742198e8c31782e9e (accessed on 25 June 2024).
Haroon, S.; Hafsath, C.A.; Hafsath, C.A. Generative Pre-trained Transformer (GPT) based model with relative attention for de novo drug design. Comput. Biol. Chem. 2023, 106, 107911. [Google Scholar] [CrossRef] [PubMed]
Jablonka, K.M.; Schwaller, P.; Ortega-Guerrero, A.; Smit, B. Is GPT-3 All You Need for Low-Data Discovery in Chemistry; Cambridge Engage Preprints: Cambridge, UK, 2023; Volume 14. [Google Scholar]
Rafailov, R.; Sharma, A.; Mitchell, E.; Manning, C.D.; Ermon, S.; Finn, C. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. Adv. Neural Inf. Process. Syst. 2023, 36, 53728–53741. [Google Scholar]
Xu, H.; Sharaf, A.; Chen, Y.; Tan, W.; Shen, L.; Van Durme, B.; Murray, K.; Kim, Y.J. Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation. arXiv 2024, arXiv:2401.08417. [Google Scholar]
Ethayarajh, K.; Xu, W.; Muennighoff, N.; Jurafsky, D.; Kiela, D. KTO: Model Alignment as Prospect Theoretic Optimization. arXiv 2024, arXiv:2401.08417. [Google Scholar]
Rupp, M.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O.A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Phys. Rev. Lett. 2012, 108, 058301. [Google Scholar] [CrossRef] [PubMed]
Montavon, G.; Rupp, M.; Gobre, V.; Vazquez-Mayagoitia, A.; Hansen, K.; Tkatchenko, A.; Müller, K.R.; Von Lilienfeld, O.A. Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 2013, 15, 095003. [Google Scholar] [CrossRef]
Blum, L.C.; Reymond, J.-L. 970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13. J. Am. Chem. Soc. 2009, 131, 8732–8733. [Google Scholar] [CrossRef]
Bickerton, G.R.; Paolini, G.V.; Besnard, J.; Muresan, S.; Hopkins, A.L. Quantifying the chemical beauty of drugs. Nat. Chem. 2012, 4, 90–98. [Google Scholar] [CrossRef] [PubMed]
Ertl, P.; Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 2009, 1, 8. [Google Scholar] [CrossRef] [PubMed]
Neese, F. The ORCA program system. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2011, 2, 73–78. [Google Scholar] [CrossRef]
Adamo, C.; Barone, V. Toward reliable density functional methods without adjustable parameters: The PBE0 model. J. Chem. Phys. 1999, 110, 6158–6170. [Google Scholar] [CrossRef]
Weigend, F.; Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Phys. Chem. Chem. Phys. 2005, 7, 3297. [Google Scholar] [CrossRef] [PubMed]
Marenich, A.V.; Cramer, C.J.; Truhlar, D.G. Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B 2009, 113, 6378–6396. [Google Scholar] [CrossRef] [PubMed]
Skyner, R.E.; McDonagh, J.L.; Groom, C.R.; van Mourik, T.; Mitchell, J.B.O. A review of methods for the calculation of solution free energies and the modelling of systems in solution. Phys. Chem. Chem. Phys. 2015, 17, 6174–6191. [Google Scholar] [CrossRef] [PubMed]
Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J. Chem. Phys. 2010, 132, 154104. [Google Scholar] [CrossRef] [PubMed]
Grimme, S. Semiempirical GGA-type density functional constructed with a long-range dispersion correction. J. Comput. Chem. 2006, 27, 1787–1799. [Google Scholar] [CrossRef]
Grimme, S.; Ehrlich, S.; Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. J. Comput. Chem. 2011, 32, 1456–1465. [Google Scholar] [CrossRef]
Anstine, D.M.; Isayev, O. Generative Models as an Emerging Paradigm in the Chemical Sciences. J. Am. Chem. Soc. 2023, 145, 8736–8750. [Google Scholar] [CrossRef] [PubMed]
Olejniczak, J.; Carling, C.-J.; Almutairi, A. Photocontrolled release using one-photon absorption of visible or NIR light. J. Control. Release 2015, 219, 18–30. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The workflow for generating UV light-responsive molecules. This workflow includes an LLM model based on GPT-2, a SMILES tokenizer, pre-training on PubChem datasets, fine-tuning on UV molecule datasets, and a screening model incorporating the Coulomb matrix descriptor. Molecules excited by ultraviolet light have the potential to become stimuli-responsive materials in drug delivery systems. The dark-green areas represent the pre-trained transformer. The indigo areas represent the pre-training of GPT-2 combined with the SMILES tokenizer on the PubChem dataset. The orange-yellow areas indicate that a new adapter was fine-tuned using the ultraviolet light-excited molecule dataset on the pre-trained GPT-2. The orange-red areas show the prediction model, trained on the QM7b dataset and Coulomb matrix features, which predicts the properties of molecules generated by the fine-tuned GPT-2.

Figure 2. The excitation energy (a), drug-likeness (b), and SAscore (c) of generative molecules. These molecules were generated by GPT-2, fine-tuned on the ultraviolet light-excited molecule dataset. The excitation energy data in (a) are from predictions made by an SVM model. (b) Shows the distribution of the drug-likeness scores of the molecules, obtained from DeepChem’s calculation of the quantitative estimate of drug-likeness (QED) values. (c) presents the synthetic accessibility scores of the molecules, calculated using RDKit, with higher values indicating that the molecules are easier to synthesize.

Figure 3. The QED values and SMILES representations of nine selected molecules generated by UVGPT.

Figure 4. The SAscore values and SMILES representations of eight selected molecules generated by UVGPT.

Figure 5. Instruction tuning with chemical datasets. The details of the pre-trained GPT-2 model are provided, including the implementation process of RLHF, which involves KTO, DPO, and CPO. The characteristics of the binary signal dataset and the preference dataset are also shown. In the training and fine-tuning of the GPT-2 Peft model, the parameters in the orange-yellow area were used for training, while the parameters of the other layers retained their pre-trained values. The KTO algorithm used the binary signal dataset, while both DPO and CPO used the preference dataset. Refer to the Methods section for descriptions of the binary signal and preference datasets. Labels in the binary signal dataset are assigned by experts to the molecules as either True or False. In the preference dataset, experts select recommended molecules and identify unreasonable molecules as restricted.

Table 1. Molecular structures and first excitation energies converted to wavelengths in the gas phase, water, and organic solvents.

Molecular SMILES	First Excitation Energy Converted to Wavelengths
Molecular SMILES	Gas Phase	Water	Organic Solvents
CCC(N)CC	190	178	181
OC1CC1OC(C)C	191	177	180
CCC(N)C(C)C	193	182	185
CC(CN)C(C)	193	181	184
CC(C)C(C)CN	197	187	190
CCC(N)CCO	199	181	185
COCC1CCN1	201	189	192
CNC=CCC	231	219	222
CC(C(Cl)N)S=N	226	305	218
C1=C=C=N1	235	235	235
NCCCC1SC1	251	246	247
C=CC=NSN=C	286	288	291
C=C=C=NN=C	468	445	451
CC1CC=C=C1	492	696	605
CC1C=C=NC1	631	644	648
CC1NC(C)=C=C1	733	481	529
CC1=NC=C=N1	749	736	739

Table 2. Analysis of the generated content from the RLHF fine-tuned model.

RLHF Trainer	Number of Molecules Satisfying Different Conditions
RLHF Trainer	At Least Two Judgments	Ratio of Two Judgments
KTO	16	21.05%
DPO	124	35.13%
CPO	131	43.96%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, J.; Wu, P.; Wang, S.; Wang, B.; Yang, G. A Human Feedback Strategy for Photoresponsive Molecules in Drug Delivery: Utilizing GPT-2 and Time-Dependent Density Functional Theory Calculations. Pharmaceutics 2024, 16, 1014. https://doi.org/10.3390/pharmaceutics16081014

AMA Style

Hu J, Wu P, Wang S, Wang B, Yang G. A Human Feedback Strategy for Photoresponsive Molecules in Drug Delivery: Utilizing GPT-2 and Time-Dependent Density Functional Theory Calculations. Pharmaceutics. 2024; 16(8):1014. https://doi.org/10.3390/pharmaceutics16081014

Chicago/Turabian Style

Hu, Junjie, Peng Wu, Shiyi Wang, Binju Wang, and Guang Yang. 2024. "A Human Feedback Strategy for Photoresponsive Molecules in Drug Delivery: Utilizing GPT-2 and Time-Dependent Density Functional Theory Calculations" Pharmaceutics 16, no. 8: 1014. https://doi.org/10.3390/pharmaceutics16081014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Human Feedback Strategy for Photoresponsive Molecules in Drug Delivery: Utilizing GPT-2 and Time-Dependent Density Functional Theory Calculations

Abstract

1. Introduction

2. Materials and Methods

3. Results and Discussion

3.1. Generative Workflow for Photoresponsive Drug Delivery

3.2. Screening the Generative Molecules

3.3. Quantum Chemical Calculations

3.4. Fine-Tuning with Human Feedback

4. Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI