Machine-learning-assisted materials discovery using failed experiments

Abstract
Inorganic-organic hybrid materials(1-3) such as organically templated metal oxides(1), metal-organic frameworks (MOFs)(2) and organohalide perovskites(4) have been studied for decades, and hydrothermal and (non-aqueous) solvothermal syntheses have produced thousands of new materials that collectively contain nearly all the metals in the periodic table(5-9). Nevertheless, the formation of these compounds is not fully understood, and development of new compounds relies primarily on exploratory syntheses. Simulation-and data-driven approaches (promoted by efforts such as the Materials Genome Initiative(10)) provide an alternative to experimental trial-and-error. Three major strategies are: simulation-based predictions of physical properties (for example, charge mobility(11), photovoltaic properties(12), gas adsorption capacity(13) or lithium-ion intercalation(14)) to identify promising target candidates for synthetic efforts(11,15); determination of the structure-property relationship from large bodies of experimental data(16,17), enabled by integration with high-throughput synthesis and measurement tools(18); and clustering on the basis of similar crystallographic structure (for example, zeolite structure classification(19,20) or gas adsorption properties(21)). Here we demonstrate an alternative approach that uses machine-learning algorithms trained on reaction data to predict reaction outcomes for the crystallization of templated vanadium selenites. We used information on 'dark' reactions-failed or unsuccessful hydrothermal syntheses-collected from archived laboratory notebooks from our laboratory, and added physicochemical property descriptions to the raw notebook information using cheminformatics techniques. We used the resulting data to train a machine-learning model to predict reaction success. When carrying out hydrothermal synthesis experiments using previously untested, commercially available organic building blocks, our machine-learning model outperformed traditional human strategies, and successfully predicted conditions for new organically templated inorganic product formation with a success rate of 89 per cent. Inverting the machine-learning model reveals new hypotheses regarding the conditions for successful product formation.