During the past decade, virtual screening (VS) has evolved from traditional
similarity searching, which utilizes single reference compounds, into an
advanced application domain for data mining and machine-learning
approaches, which require large and representative training-set
compounds to learn robust decision rules. The explosive growth in the
amount of public domain-available chemical and biological data has
generated huge effort to design, analyze, and apply novel learning
methodologies. This article focuses on machine-learning techniques within the
context of ligand-based VS (LBVS). In addition, several relevant
VS studies from recent publications are analyzed, providing a detailed view of the
current state-of-the-art in this field and highlighting not only the
problematic issues, but also the successes and opportunities for further
advances.