IBM RXN for Chemistry meets Science of Synthesis and Synfacts from Thieme.
In digital chemistry, machine learning applications have a single point of failure: data quality.
The primary impediment to the widespread and successful application of machine learning in day-to-day laboratory operations is data quality. Machine learning has stringent requirements, and incorrect data can degrade the performance of trained prediction models and, ultimately, the end user experience.
To effectively train prediction models, historical chemical data must meet extremely broad and stringent quality standards. To begin with, the data must be correct, properly labeled, and deduplicated. Furthermore, complex inference tasks, such as the prediction of retrosynthetic routes or experimental procedures, require not just more data, but data that is more diverse and detailed, free of bias across the entire range of inputs for which the predictive models are being developed.
IBM Research has been developing data-driven chemistry solutions based on language models for over four years. The team relied on chemical reaction records derived from patents and have come to appreciate both the benefits and drawbacks of these freely accessible databases. Science of Synthesis (sos.thieme.com) and Synfacts (both maintained by Thieme) provide an unprecedented level of human curation among commercially available datasets, establishing them as the gold standard for chemical reaction records.
During this webinar, the IBM Research and Thieme teams will disclose the outcome of their collaborations. The teams will compare the performance of language models trained on the highest-quality commercially available datasets (Science of Synthesis and Synfacts) to that of publicly available patent reaction records, with a specific focus on retrosynthetic and chemical prediction tasks.
Seven eminent synthetic chemistry experts from China, Germany, Switzerland, New Zealand, and the USA (together with their groups) provided insightful feedback to IBM and Thieme during this collaboration creating a unique forum for exchange between machine learning experts and the synthetic organic chemistry community. Their valuable work will also be illustrated.
Ещё видео!