Publications (Click Titles to Read More)

Chemprop: A Machine Learning Package for Chemical Property Prediction

Published in ChemRxiv, 2023

Deep learning, particularly directed message-passing neural networks (D-MPNNs), has emerged as a robust method for predicting molecular properties. Chemprop, an open-source software, employs the D-MPNN architecture to provide swift and straightforward access to machine-learned molecular properties. In its latest iteration, Chemprop boasts added functionalities like support for multi-molecule properties, reactions, atom/bond-level properties, and spectra. The package also incorporates uncertainty quantification, transfer learning, enhanced hyperparameter optimization, and other tailored options. Benchmarked on datasets like MoleculeNet and SAMPL, Chemprop has demonstrated state-of-the-art performance in predicting various molecular attributes. The software simplifies the process of training D-MPNN models, ensuring efficiency and user-friendliness.

Recommended citation: Heid, Esther; Greenman, Kevin P; Chung, Yunsie; Li, Shih-Cheng; Graff, David E; Vermeire, Florence H; Wu, Haoyang; Green, William H; McGill, Charles J. (2023). "Chemprop: A Machine Learning Package for Chemical Property Prediction." https://chemrxiv.org/engage/chemrxiv/article-details/64d1f13d4a3f7d0c0dcd836b

Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning

Published in Journal of Chemical Information and Modeling, 2023

Understanding critical properties like temperature, pressure, density, and acentric factors is crucial for evaluating the thermo-physical properties of chemical compounds. Given the high costs and time involved in experiments, we developed a machine learning model to predict these properties using the SMILES representation of chemicals. We examined directed message passing neural network (D-MPNN) and graph attention network for model architecture and improved performance using added features, multitask training, and pretraining. Our optimized D-MPNN model, enriched by Abraham parameters, predicts multiple properties, showcasing top-tier accuracies in various evaluations. We have made the dataset, containing details for 1144 compounds, publicly available along with the source code for further research.

Recommended citation: Biswas, Sayandeep; Chung, Yunsie; Ramirez, Josephine; Wu, Haoyang; Green, William H. (2023). "Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning." Journal of Chemical Information and Modeling, 63(15), 4574-4588. https://pubs.acs.org/doi/full/10.1021/acs.jcim.3c00546

Automated reaction kinetics and network exploration (Arkane): A statistical mechanics, thermodynamics, transition state theory, and master equation software

Published in International Journal of Chemical Kinetics, 2023

We introduce Arkane, an open-source statistical mechanics software designed to compute thermodynamic properties and reaction rate coefficients of chemical species over molecular potential energy surfaces (PES). It can account for collisional energy transfer effects and provides estimates where quantum chemistry data is unavailable. The software supports multiple electronic structure computation tools (e.g., Gaussian, Molpro, Orca). Notable features of Arkane include handling various internal rotation modes, considering quantum tunneling effects, and computations using transition state theory and RRKM. Arkane output includes thermodynamic properties with energy corrections, and it offers automated PES exploration and sensitivity analysis. The software is part of the RMG-Py suite and available on GitHub.

Recommended citation: Grinberg Dana, Alon; Johnson, Matthew S; Allen, Joshua W; Sharma, Sandeep; Raman, Sumathy; Liu, Mengjie; Gao, Connie W; Grambow, Colin A; Goldman, Mark J; Ranasinghe, Duminda S; Gillis, Ryan J; Payne, A Mark; Li, Yi‐Pei; Dong, Xiaorui; Spiekermann, Kevin A; Wu, Haoyang; Dames, Enoch E; Buras, Zachary J; Vandewiele, Nick M; Yee, Nathan W; Merchant, Shamel S; Buesser, Beat; Class, Caleb A; Goldsmith, Franklin; West, Richard H; Green, William H. (2023). "Automated reaction kinetics and network exploration (Arkane): A statistical mechanics, thermodynamics, transition state theory, and master equation software." International Journal of Chemical Kinetics. 55(6). 300-323. https://onlinelibrary.wiley.com/doi/full/10.1002/kin.21637

Autonomous, multi-property-driven molecular discovery: from predictions to measurements and back

Published in ChemRxiv, 2023

We developed an autonomous molecular discovery platform powered by machine learning to expedite the design of molecules with specific attributes. Two case studies on dye-like molecules aimed for desired absorption wavelength, lipophilicity, and photo-oxidative stability. In the first study, the platform identified 312 new molecules over three automatic cycles, broadening its knowledge of the structure–property space with each iteration. The second study leveraged pre-existing property models, pinpointing 6 top-performing molecules. By seamlessly integrating prediction, synthesis, measurement, and model retraining, this platform showcases the promise of automated systems in understanding and innovating within a chemical space.

Recommended citation: Koscher, Brent; Canty, Richard B; McDonald, Matthew A; Greenman, Kevin P; McGill, Charles J; Bilodeau, Camille L; Jin, Wengong; Wu, Haoyang; Vermeire, Florence H; Jin, Brooke; Hart, Travis; Kulesza, Timothy; Li, Shih-Cheng; Jaakkola, Tommi S; Barzilay, Regina; Gómez-Bombarelli, Rafael; Green, William H; Jensen, Klavs F. (2023). "Autonomous, multi-property-driven molecular discovery: from predictions to measurements and back." ChemRxiv. https://chemrxiv.org/engage/chemrxiv/article-details/6435f8c5a41dec1a56e64577

Kinetic Modeling of API Oxidation:(2) Imipramine Stress Testing

Published in Molecular Pharmaceutics, 2022

Studying the chemical stability of active pharmaceutical ingredients (APIs) is vital for ensuring drug quality and safety. Traditional stress testing, though crucial, is resource-intensive and can be limited by API availability. Our research integrates quantum chemical calculations with automated reaction mechanism generation to delve deeper into API degradation. Building upon a prior study, we created the first ab initio predictive chemical kinetic model for free-radical oxidative degradation in API stress testing. We centered on imipramine oxidation, and our model predictions matched experimental results, successfully identifying the major degradation products. This work underscores the value of blending quantum chemical computations with predictive modeling for API stability research. Ultimately, we aim to develop a digital workflow that synergizes first-principle models, data-driven methods, and high-throughput experiments to revolutionize future API stability assessments.

Recommended citation: Wu, Haoyang; Grinberg Dana, Alon; Ranasinghe, Duminda S; Pickard IV, Frank C; Wood, Geoffrey PF; Zelesky, Todd; Sluggett, Gregory W; Mustakis, Jason; Green, William H. (2022). "Kinetic Modeling of API Oxidation:(2) Imipramine Stress Testing." Molecular Pharmaceutics. 19(5). 1526-1539. https://pubs.acs.org/doi/full/10.1021/acs.molpharmaceut.2c00043

Group contribution and machine learning approaches to predict Abraham solute parameters, solvation free energy, and solvation enthalpy

Published in Journal of Chemical Information and Modeling, 2022

We present a group contribution method (SoluteGC) and a machine learning model (SoluteML) to predict the Abraham solute parameters, as well as a machine learning model (DirectML) to predict solvation free energy and enthalpy at 298 K. Extensive data sets containing 8366 solute parameters, 20,253 solvation free energies, and 6322 solvation enthalpies are compiled in this work to train the models. Data sets and tools to make solvation predictions are made open-access.

Recommended citation: Chung, Yunsie; Vermeire, Florence H; Wu, Haoyang; Walker, Pierre J; Abraham, Michael H; Green, William H. (2022). "Group contribution and machine learning approaches to predict Abraham solute parameters, solvation free energy, and solvation enthalpy." Journal of Chemical Information and Modeling. 62(3). 433-446. https://pubs.acs.org/doi/full/10.1021/acs.jcim.1c01103

Kinetic Modeling of API Oxidation: (1) The AIBN/H2O/CH3OH Radical “Soup”

Published in Molecular Pharmaceutics, 2021

We explored the chemical stability of active pharmaceutical ingredients (API) through stress testing using ab initio electronic structure calculations. By studying an azobis(isobutyronitrile) (AIBN) system with different water/methanol ratios at various pH values and temperatures, we identified dominant radicals in varying conditions. Our research offers insights into API oxidation and introduces advanced tools for automatic kinetic model development.

Recommended citation: Grinberg Dana, Alon; Wu, Haoyang; Ranasinghe, Duminda S; Pickard IV, Frank C; Wood, Geoffrey PF; Zelesky, Todd; Sluggett, Gregory W; Mustakis, Jason; Green, William H. (2021). "Kinetic Modeling of API Oxidation: (1) The AIBN/H2O/CH3OH Radical “Soup”." Molecular Pharmaceutics. 18(8). 3037-3049. https://pubs.acs.org/doi/full/10.1021/acs.molpharmaceut.1c00261

Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors

Published in Chemical Science, 2021

We construct a reactivity descriptor database based on ab initio calculations of 130k organic molecules, and train a multi-task constrained model to calculate demanded descriptors on-the-fly. The proposed platform enhances the inter/extra-polated performance for regio-selectivity predictions and enables learning from small datasets with just hundreds of examples.

Recommended citation: Guan, Yanfei; Coley, Connor W; Wu, Haoyang; Ranasinghe, Duminda; Heid, Esther; Struble, Thomas J; Pattanaik, Lagnajit; Green, William H; Jensen, Klavs F. (2021). "Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors." Chemical Science. 12(6). 2198-2208. https://pubs.rsc.org/en/content/articlelanding/2021/SC/d0sc04823b

Enhanced light extraction from free-standing InGaN/GaN light emitters using bio-inspired backside surface structuring

Published in Optics Express, 2017

Light extraction from InGaN/GaN-based multiple-quantum-well (MQW) blue light emitters is enhanced using a simple, scalable, and reproducible method to create hexagonally close-packed conical nano- and micro-scale features on the backside outcoupling surface. A 4.8-fold overall enhancement in light extraction (9-fold at normal incidence) relative to a flat outcoupling surface was achieved using a feature pitch of 2530 nm.

Recommended citation: Pynn, Christopher D; Chan, Lesley; Gonzalez, Federico Lora; Berry, Alex; Hwang, David; Wu, Haoyang; Margalith, Tal; Morse, Daniel E; DenBaars, Steven P; Gordon, Michael J. (2017). "Enhanced light extraction from free-standing InGaN/GaN light emitters using bio-inspired backside surface structuring." Optics Express. 25(14). 15778-15785. https://opg.optica.org/oe/fulltext.cfm?uri=oe-25-14-15778&id=368404