Published in AI Letters, the study collaborators Optibrium, Takeda and Intellegens, demonstrated the application of deep learning imputation to a global pharma dataset, offering new insights into drug discovery data that transform how complex data can be used to identify opportunities for new drugs and streamline the drug discovery process.

CAMBRIDGE, UK, 29 June 2021: Optibrium Limited, leading provider of software and artificial intelligence (AI) solutions for drug discovery, today announced the publication of a peer-reviewed study in Applied AI Letters, “Deep Imputation on Large-Scale Drug Discovery Data” [1].

Working with Takeda Pharmaceuticals’ proprietary global dataset, the team applied Optibrium’s Augmented Chemistry® platform, demonstrating the potential of deep learning imputation to reduce cost and improve success rates of drug discovery. The platform leverages the Alchemite™ deep learning method [2], developed by Intellegens, and shown to deliver more accurate and reliable predictions of complex biological properties of potential drugs, enabling more effective design decisions.

The study demonstrated that deep learning imputation generates new and valuable insights on global pharma-scale, high-value and proprietary datasets. Such datasets are complex, with data deriving from many different experiments, including compound activities in biochemical and phenotypic assays, high-throughput screening data and absorption, distribution, metabolism, elimination, and toxicity (ADMET) endpoints.

Making the best decisions on project progression on such data is further complicated by the fact that most potential drug compounds are measured in only a small subset of experiments that pharmaceutical and biotech companies routinely use, resulting in datasets where only a few per cent of the possible measurements have been made. Furthermore, measurements are very noisy due to the complexity of biological experiments. While these characteristics limit the effectiveness of most machine learning methods, the study confirmed that Augmented Chemistry® provided valuable insights on such challenging data.

The study also found that deep learning imputation made more accurate predictions of compounds’ biological properties, including prospective prediction of compound activities in the context of projects. In particular, it showed substantial advantages in predicting complex endpoints, such as cell-based assays, that are resource-intensive and where more accurate predictions result in substantial time and cost reductions.

Furthermore, the method reliably identified the most accurate predictions on which to base decisions, which is essential to avoid missing valuable opportunities arising from inaccurate predictions. It highlighted where more experimental data are required to make a confident decision, setting it apart from other machine learning and AI methods that struggle to provide reliable confidence information on individual predictions [3].

Following on from a previous study [4], which demonstrated the effectiveness of deep learning imputation on smaller project-specific datasets, this new study showed that the same method scales to global pharma datasets. The described model was built on 1.8 million data points relating to approximately 700,000 compounds and 1,200 experimental endpoints. When applied on this scale, the insights into high-value compounds and research strategies increase exponentially.

Matthew Segall, CEO at Optibrium, said: This study corroborates the tremendous results we have seen in many collaborations with pharma, biotech and not-for-profit organisations. We are excited to see the significant benefits our AI technology is producing and the enthusiasm for its uptake in the pharma community.”

To view a webinar summarising the key results of this study, visit

For further information on Optibrium, please visit, contact or call +44 1223 815900.

[1] Irwin et al. (2021) App. AI Lett. DOI: 10.1002/ail2.31

[2] T. Whitehead et al. (2019) J. Chem. Inf. Model. 59(3) pp. 1197-1204.

[3] Hirschfeld et al. (2020) J. Chem. Inf. Model. 60(8) pp. 3770-3780

[4] Irwin et al. (2020) J. Chem. Inf. Model. 60(6) pp. 2848–2857