The View From Here

Manuel C. Peitsch

Text Mining

Manuel C. PeitschWe are witnessing an exponential growth in available publications, patents and other scientific documents. The complexity of the information landscape in life sciences is further enhanced by the availability of an increasing number of diverse databases and information resources. As a consequence, much of the scientific information could go unnoticed and hence untapped by a large portion of the scientific community. Therefore, scientists crucially need new methods and tools to cope with this ever growing world of information.  Several fields of research and development activity are contributing to designing the future of information management: i) the development of increasingly powerful text mining methods; ii) the development and generalization of open access-like principles in scientific publishing; and iii) the gradual convergence of databases and literature.

Text mining methods are making much progress and increasingly sophisticated approaches are being developed to improve the sensitivity of text analysis and improve its precision. In this context, biomedical text analysis, while being very complex, offers an excellent environment to develop and test reliable text mining methods.

Unhindered access to published scientific results has become a major concern to the scientific community.  While this is an important aspect of everyday life for a scientist, it is of even greater importance to the text mining community.  Indeed, not only is mining the abstracts available in Medline very insufficient from a content perspective but most tools are developed to deal with these abstracts only, which inherently limits their generalization and applicability to more-complex documents. While this is a good start, it also is a major issue that will need to be addressed jointly by the text mining and scientific publishing communities.

Current and traditional scientific publishing practices have other shortcomings that need addressing and are discussed in some of the papers listed below. Integration of diverse information sources and the gradual convergence of publications and databases are crucial steps towards breaking the shackles of space limits in traditional publishing and will drive the scientific community to think differently about publications, peer review and eventually about what makes a “good CV”.

The increasingly universal use of text mining and its potential consequences on scientific publishing makes this topic interesting reading in this issue of Drug Discovery Today E-Choice.

Further to the publications listed below, I would suggest you watch out for one of the upcoming issues of Genome Biology, which is devoted to BioCreative 2.

Enjoy the read!

Prof. Manuel C. Peitsch, Ph.D.
Director Computational Sciences and Bioinformatics
Philip Morris R&D
Quai Jeanrenaud 3
2000 Neuchatel
Switzerland

Short Biography
Manuel Peitsch is Director for Computational Sciences and Bioinformatics with Philip Morris International R&D. Manuel joined PMI R&D from Novartis where he spent seven years and successively led “Informatics and Knowledge Management” and “Systems Biology”.  Prior to joining Novartis in 2001, Manuel held several leadership positions in bioinformatics, scientific computing and knowledge management with GlaxoWellcome and GlaxoSmithKline.

Manuel obtained his Ph.D. in biochemistry from the University of Lausanne (Switzerland) and spent his post-doctoral years at the National Cancer Institute (Dr. J. V. Maizel Jr) and the University of Lausanne (Prof. J. Tschopp). Since 2002 he is Professor for Bioinformatics at the University of Basel. 

Manuel also serves on several advisory boards and councils. For instance, he is the Chairman of the Executive Board of the Swiss Institute of Bioinformatics, which he co-founded in 1988, and a member of the Swiss National Research Council.