Why better databases are key to chemistry’s digital future

The Royal Society of Chemistry has announced that its near 180-year old archive of research insights have been made available for Text and Data Mining projects, but why is this important? Richard Kidd, Head of Chemistry Data at the Royal Society of Chemistry explores the significance.

Machine-learning and artificial intelligence have gone from being the dreams of science fiction to something we now talk about every day without raising an eyebrow. 

We encounter these technologies every single day in our normal day-to-day routine. From searching for the nearest store to looking for the perfect Christmas gift; internet search engines are now so capable they can even predict what you’re looking for before you’ve finished your search term.
The secret to this success is perhaps no secret at all – data. Over the years, search giants such as Google, Yahoo and Bing have compiled huge databases comprised of user interactions. By finding patterns and correlations between user input and your own searches, they can appear more intuitive than ever.
Similar techniques are already being used in scientific research, using Text and Data Mining of documents to feed those breakthroughs in machine learning and AI. Using similar search technology and harnessing the speed and power of digital technology, teams using this technique can enjoy a crucial head start on their project by getting computers to identify all the key pointers to set you off in the right direction.
In fact, smart technology is now so clever it can identify patterns across millions of scientific papers, even cross-discipline, at the blink of an eye – unveiling insights that may never have been uncovered using more traditional techniques. It can even differentiate between different meanings of a word – knowing what kind of mole you’re looking for could be the difference between accurate data and a wasted resource – so this is crucial.
However, having the smarts is nothing if you don’t have the data. That’s where scientific publishers such as ourselves have a crucial role to play if the potential of these smart systems is to be realised.
The Royal Society of Chemistry has been publishing high quality research for nearly 180 years, with hundreds of thousands of findings and studies contained within our archives, spanning several eras of scientific discovery. However, if we are to advance, we have to constantly look forward.
That’s why we have recently completed digitising this extensive archive, making millions of papers searchable for Text and Data Mining projects. 
In effect, this allows our collection of research to ‘plug and play’ with other resources, ensuring those connections can be made across disciplines and ensuring faster and more accurate research results. 
The future of journal publications will of course will evolve towards capturing research results and reports alongside FAIR data, ensuring research is findable, accessible, interoperable and reusable. But the complexity of language and expression will still need text analysis to feed machine learning and AI – to drawing insights from a broad spectrum of evidence.
What’s clear is that this technology is a fantastic addition to a research team’s toolkit and can provide a crucial head start by pointing your resources in the right direction. As more and more resources such as our own become available, the functionality – and impact – of these processes can only improve. 
More information on the Royal Society of Chemistry’s Text and Data Mining service can be found at:

Share this article

More services


This article is featured in:
Medicinal Chemistry


Comment on this article

You must be registered and logged in to leave a comment about this article.