Gawehn, E., Hiss, J. A. & Schneider, G. Deep learning in drug discovery. Mol. Inform. 35, 3–14 (2016).
Schmidt, J., Marques, M. R., Botti, S. & Marques, M. A. Recent advances and applications of machine learning in solid-state materials science. Npj Comput. Mater. 5, 83 (2019).
von Lilienfeld, O. A. Quantum machine learning in chemical compound space. Angew. Chem. Int. Ed. 57, 4164–4169 (2018).
von Lilienfeld, O. A., Müller, K.-R. & Tkatchenko, A. Exploring chemical compound space with quantum-based machine learning. Nat. Rev. Chem. 4, 347–358 (2020).
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In International Conference on Machine Learning, 9323–9332 (PMLR, 2021).
Schütt, K., Unke, O. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning, 9377–9388 (PMLR, 2021).
Huang, B. & von Lilienfeld, O. A. Quantum machine learning using atom-in-molecule-based fragments selected on the fly. Nat. Chem. 12, 945–951 (2020).
Christensen, A. S., Bratholm, L. A. & Faber, F. A. & Anatole von Lilienfeld, O. FCHL revisited: Faster and more accurate quantum machine learning. J. Chem. Phys. 152, 044107 (2020).
Heinen, S., von Rudorff, G. F. & von Lilienfeld, O. A. Toward the design of chemical reactions: Machine learning barriers of competing mechanisms in reactant space. J. Chem. Phys. 155, 064105 (2021).
Heinen, S., Schwilk, M., von Rudorff, G. F. & von Lilienfeld, O. A. Machine learning the computational cost of quantum chemistry. Mach. Learn.: Sci. Technol. 1, 025002 (2020).
Christensen, A. S., Faber, F. A. & von Lilienfeld, O. A. Operators in quantum machine learning: Response properties in chemical space. J. Chem. Phys. 150, 064105 (2019).
Faber, F. A., Christensen, A. S. & Huang, B. & Von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning. J. Chem. Phys. 148, 241717 (2018).
Balcells, D. & Skjelstad, B. B. tmQM dataset-quantum geometries and properties of 86k transition metal complexes. J. Chem. Inf. Model. 60, 6135–6146 (2020).
Unke, O. et al. SE(3)-equivariant prediction of molecular wavefunctions and electronic densities. Advances in Neural Information Processing Systems 34 (2021).
Schütt, K., Gastegger, M., Tkatchenko, A., Müller, K.-R. & Maurer, R. J. Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions. Nat. Commun. 10 (2019).
Grisafi, A. et al. Transferable machine-learning model of the electron density. ACS Cent. Sci. 5, 57–64 (2018).
Fabrizio, A., Grisafi, A., Meyer, B., Ceriotti, M. & Corminboeuf, C. Electron density learning of non-covalent systems. Chem. Sci. 10, 9424–9432 (2019).
Ramakrishnan, R., Dral, P. O. & Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1, a data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data 4, 170193 (2017).
Nakata, M. & Shimazaki, T. PubChemQC project: A large-scale first-principles electronic structure database for data-driven chemistry. J. Chem. Inf. Model. 57, 1300–1308 (2017).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 134 (2020).
Nakata, M., Shimazaki, T., Hashimoto, M. & Maeda, T. PubChemQC PM6: Data sets of 221 million molecules with optimized molecular geometries and electronic properties. J. Chem. Inf. Model. 60, 5891–5899 (2020).
Glavatskikh, M., Leguy, J., Hunault, G., Cauchy, T. & Da Mota, B. Dataset’s chemical diversity limits the generalizability of machine learning predictions. J. Cheminformatics 11, 1–15 (2019).
Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R. & Miller, T. F. III Orbnet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 153, 124111 (2020).
Grimme, S., Bannwarth, C. & Shushkov, P. A robust and accurate tight-binding quantum chemical method for structures, vibrational frequencies, and noncovalent interactions of large molecular systems parametrized for all spd-block elements (Z = 1–86). J. Chem. Theory Comput. 13, 1989–2009 (2017).
Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB-An accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).
Grimme, S. Exploration of chemical compound, conformer, and reaction space with meta-dynamics simulations based on tight-binding quantum chemical calculations. J. Chem. Theory Comput. 15, 2847–2862 (2019).
Bannwarth, C. et al. Extended tight-binding quantum chemistry methods. WIREs Comput. Mol. Sci. 11, e1493 (2021).
Rezac, J., Fanfrlik, J., Salahub, D. & Hobza, P. Semiempirical quantum chemical PM6 method augmented by dispersion and H-bonding correction terms reliably describes various types of noncovalent complexes. J. Chem. Theory Comput. 5, 1749–1760 (2009).
Folmsbee, D. & Hutchison, G. Assessing conformer energies using electronic structure and machine learning methods. Int. J. Quantum Chem. 121, e26381 (2021).
Bolton, E. E., Kim, S. & Bryant, S. H. PubChem3D: Conformer generation. J. Cheminformatics 3, 4 (2011).
Axelrod, S. & Gomez-Bombarelli, R. GEOM: Energy-annotated molecular conformations for property prediction and molecular generation. arXiv preprint arXiv:2006.05531 (2020).
Mendez, D. et al. ChEMBL: Towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).
Chai, J.-D. & Head-Gordon, M. Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections. Phys. Chem. Chem. Phys. 10, 6615–6620 (2008).
Weigend, F. & Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Phys. Chem. Chem. Phys. 7, 3297–3305 (2005).
Smith, D. G. et al. Psi4 1.4: Open-source software for high-throughput quantum chemistry. J. Chem. Phys. 152, 184108 (2020).
Meyers, J., Carter, M., Mok, N. Y. & Brown, N. On the origins of three-dimensionality in drug-like molecules. Future Med. Chem. 8, 1753–1767 (2016).
Sauer, W. H. & Schwarz, M. K. Molecular shape diversity of combinatorial libraries: A prerequisite for broad bioactivity. J. Chem. Inf. Comput. Sci. 43, 987–1003 (2003).
Moss, G. et al. Basic terminology of stereochemistry (IUPAC recommendations 1996). Pure Appl. Chem. 68, 2193–2222 (1996).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Bento, A. P. et al. An open source chemical structure curation pipeline using rdkit. J. Cheminformatics 12, 1–16 (2020).
Christensen, A. S. et al. Orbnet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 155, 204103 (2021).
Riniker, S. & Landrum, G. A. Better informed distance geometry: Using what we know to improve conformation generation. J. Chem. Inf. Model. 55, 2562–2574 (2015).
Tosco, P., Stiefl, N. & Landrum, G. Bringing the MMFF force field to the RDKit: Implementation and validation. J. Cheminformatics 6, 37 (2014).
Lloyd, S. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–137 (1982).
Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Isert, C., Atz, K., Jiménez-Luna, J. & Schneider, G. QMugs: Quantum Mechanical Properties of Drug-like Molecules., ETH Zurich, https://doi.org/10.3929/ethz-b-000482129 (2021).
Dalby, A. et al. Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J. Chem. Inform. Comput. Sci. 32, 244–255 (1992).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
NIST Standard Reference Database 101. Computational Chemistry Comparison and Benchmark DataBase, Release 21. https://cccbdb.nist.gov/expbondlengths1.asp August 2020.
Bach, R. D. Ring strain energy in the cyclooctyl system. the effect of strain energy on [3 + 2] cycloaddition reactions with azides. J. Am. Chem. Soc. 131, 5233–5243 (2009).
Goulet-Hanssens, A. et al. Electrocatalytic Z/E isomerization of azobenzenes. J. Am. Chem. Soc. 139, 335–341 (2017).
Roca-Lopez, D., Tejero, T. & Merino, P. DFT investigation of the mechanism of E/Z isomerization of nitrones. J. Org. Chem 79, 8358–8365 (2014).
Berthold, M. R. et al. KNIME: The Konstanz Information Miner. In Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007) (Springer, 2007).
Schrödinger, L. L. C. The PyMOL Molecular Graphics System, Version 2.3.5.
Nakata, M., Maeda, T., Shimazaki, T., Hashimoto, M. The PubChemQC Project. http://pubchemqc.riken.jp/ Accessed Sept. 2020.