Finding Hidden Treasure – how AI/ML can harness existing data to accelerate drug discovery

For 2020 the BIO International Convention, usually one the world’s largest gatherings of the global biotech industry, is transitioning to a new, virtual event format. Drug Discovery Today’s Steve Carney will moderate a discussion on how advanced data science technologies such as artificial intelligence and machine learning (AI/ML) can be developed to assist drug discovery. This is part of the main BIO programme which will be aired on Wednesday 10th June, with live Q&A.

**The panel discussion will be aired at 10.30 BST on Wednesday 10th June with live Q&A and will available after that date on demand for anyone registered with BIO. **

Satnam Surae, Chief Product Officer at Aigenpulse, based in Oxford, UK, is a speaker on the panel. Steve Carney caught up with him to discuss some of the key issues ahead of the event. 

Steve: Shall we start with a bit of background? Who are Aigenpulse and how do you approach the problem of getting more out of existing data?
Satnam: We provide a next generation life science AI/ML SaaS platform for biotechs and pharma enterprise. This is a method of software delivery and licensing in which software is accessed online via subscription, rather than bought and installed on individual computers. We specialise in harmonising and integrating lots of different types of data including proteomics, cytometry, EHR/EM (electronic health records and electronic medical records), assays, genomics and more. 
Steve: But you can’t get over the old adage of ‘rubbish in, rubbish out’?
Satnam: Absolutely right, without lots of high quality data that is machine readable, structured, connected and clean, you can’t really leverage advanced tools such as AI/ML. Our platform unifies multi-omics silos, promoting data re-use, and provides automated processing, analysis and report templating which boosts efficiency. It comes with in-built statistics, visualisation and ML tools that enable high-quality outputs to be generated by bench scientists bioinformaticians/data scientists and others in the organisation who don’t need to know how to code. It can be useful at multiple stages of the R&D life cycle but it can only be as good as the data provided.
Steve: So what are the main advantages for drug development?
Satnam: The drug development R&D process generates lots of big data and lots of small data in different shapes and formats and files. This is a real bug bear of scientists in organisations because very often researchers can’t find the data that they need. This can lead to wasted effort and work being repeated a number of times. 
It’s been estimated that two thirds of researchers’ time is spent on processing data - time that could be used for higher value scientific analysis and the complex tasks that researchers do best. Typical tools sever the link between raw data and analysis and may compromise data integrity, as each step is prone to errors and manual processes may not be controlled. Errors, inconsistencies and uncontrolled changes cause significant delays to decision making and analysis. Given that the total cost per approved drug is now around $2.6 bn (compared to $0.2 bn in the 1980s) and that only around one in ten drugs is successful, this high cost and low predictability of drug research all across the world isn’t sustainable and there is a clear need better procedures in order to improve efficiency. 
Steve: I try to avoid mentioning the ‘c word’ but what impact is Covid19 having? 
Satnam: It’s really impossible to get away from Covid. We’re living in an interesting time - the lockdown and the response to the Covid pandemic provides multiple challenges. Efforts are being refocused on fighting the virus and that can have a negative impact on other life science research. Clearly, at a human level, there is serious disruption and time scales for research programmes may be elongated due to down time and changing priorities. There will almost certainly be increased competition for patient enrolment and patients may find it difficult to keep appointments. Huge efforts are being made to ensure the security of supply chains for drugs and to maintain quality. 
People have been talking about use of AI for a long time but it has tended to go in fits and starts. We now have a real impetus in the post-lockdown world where digital first (enabling and enabled by AI/ML) has to be the one central strategy. This is underpinned by technologies that already exist such as cloud and automation technologies which enable people to work from home, from the lab or wherever they want. The potential pitfalls are still the same - data harmonisation, data connection and availability of good quality data which is well labelled. 

Steve: How can this be achieved?
Satnam: Ultimately the end goal is to iterate faster. Large public data resources are available such as Genomics England and UK Biobank, and real world EHR/EMR records. But the data is often messy and hard to work with, so finding the needle in the haystack requires a lot of work. 
The key challenges are not just the quality of datasets but harmonisation of data sources (internal, public) and types (multi-omics) for which specialist knowledge is needed. For internal application you need buy-in from the whole organisation and willingness to build capabilities, which will be dependent on a modernised tech stack. It may involve outsourcing or in-licence agreements and companies such as Precision Life or Benevolent work as partners with companies and research organisations to develop capabilities. Aigenpulse comes in from a different standpoint – we want to be an enabling technology for companies to do this themselves – to harness the data they need, with the vast expertise that they already have but to equip them with AI/ML tools which enable them to fully focus on scientific analysis – not the on onerous, manual data processing or having to spend time building in-house tools which aren’t maintained and get lost in the ether. The business uses are there in all parts of the value chain from R&D (for example, lead/target identification) to clinical trials (biomarkers and stratification) to manufacturing (quality control and monitoring). Diagnostics such as imaging and blood tests is where we’ve already seen some real wins.  
Steve: So is the hype around AI and ML justified or will the bubble burst? 
Satnam: There’s no getting away from the amount of hype in this area and it’s true that not all of the claims that have been made have been justified and that companies have sometimes referenced AI in order to increase valuations. But I think now the hype is probably becoming in line with reality.  It will always be important to be aware of bias in data and resulting models and that’s why it’s absolutely critical to collaborate with domain experts and leverage their experience. In my view, this is the most important aspect. But I’m confident that, moving forward, the adoption of the technologies will make a significant difference to drug discovery and innovation. We can now say that these technologies offer some of the best tools available for real world drug discovery and we have some great examples of how AI/ML has assisted the whole process from start to finish. 
Steve: Does this mean that drug discovery companies are now ready to embrace the technology?
Satnam: This seems to be a case in which ‘no one wants to be first but everyone wants to be second’ rings true, although we are now getting to the stage where more companies are willing to use the tools that are available. This is not always straight forward as there may be fundamental disconnects in smaller biotech and pharma companies where some structural elements may need to change - the rest of the organisation must have the correct infrastructure so that all of the pieces of the puzzle can be put in place. Really understanding the limitations, completeness and inherent bias contained within the data and matching the right tools for the job at hand is critical, but the interest is there and it will be possible to overcome these hurdles with time and sharing experiences. 
Steve: Looking ahead, what will be the biggest impact in the long term? Could AI/ML become crucial in repurposing of drugs, reducing the number of iterations to get to the lead molecule more quickly or looking for very novel chemical space? 
Satnam: There’s clear potential for chemical synthesis and optimisation of later stage drug discovery, and unmet medical need exists in all disease areas. Being able to stratify for different patient groups could give huge benefits for both very early stage and vey late stage drug discovery. If you look at where most money is spent, it’s getting through Phase II and Phase III clinical trials and so taking what you learn in R&D and making sure that you have the right patient cohort to test will greatly improve the chances of success, as many of these new molecules will work in some patients but not in others. This may be where the biggest gains will be realised but changes in infrastructure will be needed to underpin these gains.    
The panel discussion will be aired at 10.30 BST on Wednesday 10th June with live Q&A and will available after that date on demand for anyone registered with BIO. To Contact Satnam, click here

Share this article

More services


This article is featured in:


Comment on this article

You must be registered and logged in to leave a comment about this article.