Review
Receptor-based virtual screening protocol for drug discovery

https://doi.org/10.1016/j.abb.2015.05.011Get rights and content

Highlights

  • Computational aided drug design (CADD) in drug discovery and development.

  • Review about the main stages, strengths and limitations, of a virtual screening campaign.

  • Practical issues and the Achilles heel of the virtual screening protocols.

Abstract

Computational aided drug design (CADD) is presently a key component in the process of drug discovery and development as it offers great promise to drastically reduce cost and time requirements.

In the pharmaceutical arena, virtual screening is normally regarded as the top CADD tool to screen large libraries of chemical structures and reduce them to a key set of likely drug candidates regarding a specific protein target. This chapter provides a comprehensive overview of the receptor-based virtual screening process and of its importance in the present drug discovery and development paradigm. Following a focused contextualization on the subject, the main stages of a virtual screening campaign, including its strengths and limitations, are the subject of particular attention in this review. In all of these stages special consideration will be given to practical issues that are normally the Achilles heel of the virtual screening process.

Introduction

The process of drug discovery is very complex and requires an interdisciplinary effort to design effective and commercially feasible drugs. The objective of drug design is to find a drug that can interact with a specific drug target and modify its activity. The drug targets are generally proteins that perform most of the tasks needed to keep cells alive. Drugs are small molecules that bind to a specific region of a protein and can turn it on or off. Some very powerful drugs, such as antibiotics or anticancer drugs, are used to completely disable a critical protein in the cell. These drugs can kill bacteria or cancer cells.

It is generally recognized that drug discovery and development are very time and resource-consuming processes and the whole process is often compared to searching for a needle in a haystack. It is estimated that a typical drug discovery cycle, from lead identification to clinical trials, can take 17 years with a cost of 800 million US dollars. In this process it is estimated that five out of 40,000 compounds tested in animals eventually reach human testing and only one in five compounds that enter clinical studies is approved. This represents an enormous investment in terms of time, money and human resources. It includes chemical synthesis, purchase, and biological screening of hundreds of thousands of compounds to identify hits followed by their optimization to generate leads, which require further synthesis. In addition, predictability of animal studies in terms of both efficacy and toxicity is frequently suboptimal. Therefore, new approaches are needed to facilitate, expedite and streamline drug discovery and development, save time, money and resources.

On October 5, 1981, Fortune magazine published a cover article entitled “Next Industrial Revolution: Designing Drugs by Computer at Merck”. Some have credited this as being the start of intense interest in computer-aided drug design (CADD)1 [1].

CADD is defined by the IUPAC as all computer assisted techniques used to discover, design and optimize compounds with desired structure and properties. CADD has emerged from recent advances in computational chemistry and computer technology, and promises to revolutionize the design of functional molecules. The ultimate goal of CADD is to virtually screen a large database of compounds to generate a set of hit compounds (active drug candidates), lead compounds (most likely candidates for further evaluation), or optimize known lead compounds, i.e. transform biologically active compounds into suitable drugs by improving their physicochemical, pharmaceutical and ADMET/PK (pharmacokinetic) properties [2].

The fast expansion and popularity of this field of research has been made possible partially by the advances in software and hardware, computational power and sophistication. On the other hand, the knowledge of the 3D shapes of proteins, nucleic acids, and complex assemblies are fundamental to understand all aspects of potential drug targets. It is remarkable that, from 1970 to 2004, 50,000 structures have been deposited on the protein databank, in 2014 this number has tripled to 150,000 and in 2018 it is expected that this latter number doubles. In addition, the increasing digital repositories containing detailed information on potential drugs and other useful compounds provide goldmines for the design of new drugs.

CADD is widely used in the pharmaceutical industry to improve the efficiency of the drug discovery and development pipeline. One method that was quickly adopted was the virtual screening of large compound databases against drug targets. The goal is to select a set of molecules with desirable properties (active, drug-like, lead-like) targeting a specific protein and eliminate compounds with undesirable properties (inactive, reactive, toxic, poor ADMET/PK). The computational methodologies used for this purpose are known as virtual screening methodologies.

The generic definition of virtual screening encompasses many different methodologies, which are generally divided in two main classes: the ligand-based virtual screening methods and the receptor-based virtual screening methods.

Ligand-based virtual screening methods aim to identify molecules sharing common features, both at the chemical and physical levels grounded in the assumption that similar compounds can have similar effects on a drug target [3]. These methods normally discard all information related to the drug target and focus exclusively on the ligand. Within the lock-and-key paradigm, these approaches compare different keys, and neglect the lock. Thus, the model of the receptor is only implicitly built based on what binds to it [4]. The main downside of these methods is that substantial activity data regarding the compounds that are studied are required to get reasonable results.

Receptor-based virtual screening methods, also called structure-based methods, require the existence of a 3D structure of the target. These methods involve explicit molecular docking of each ligand into the binding site of the target, producing a predicted binding mode for each database compound, together with a measure of the quality of the fit of the compound in the target-binding site. This information is then used to sort out ligands that bind strongly to the target protein from ligands that do not. Receptor-based approaches are gaining considerable importance over ligand-based techniques, particularly as more and more 3D structures of target proteins are determined and become available, and also because the results tend to be more reliable and accurate. The current state-of-the-art of receptor-based virtual screening is reviewed in this chapter, and general approaches, successes and pitfalls associated with the technology are highlighted.

Section snippets

The screening process

Receptor-based virtual screening encompasses a variety of sequential computational stages, including target and database preparation, docking and post-docking analysis, and prioritization of compounds for experimental testing. A typical workflow of a receptor-based virtual screening is presented in Fig. 1. All stages of this workflow depend on sound implementation of a wide range of computational techniques that will be discussed in detail in the following sections. In each section special

Target selection

Target selection is among the first stages of a virtual screening campaign and it is pivotal for a successful drug development process. Among the four types of macromolecules that can be targeted (proteins, polysaccharides, lipids and nucleic acids) with small-molecule compounds, proteins, and within those enzymes, are generally the first choice, since their binding pocket properties allow for high specificity, potency and low toxicity. When considering a potential protein target to modify a

Ligand selection

With virtual screening we have the possibility to quickly test a great number of compounds without much effort. However, it is obviously impossible to screen the entire chemical space for a single target in a timely manner, which means that we must somehow restrict the number of compounds to be tested. In order to achieve a manageable library we need to filter out some molecules in advance, which can be achieved with the help of different methods. Still, it is important to bear in mind that for

Molecular docking

Once the target protein and a database of compounds have been selected, molecular docking is ready to run. This is the stage that requires more computational cost and time in the VS and for this reason it is regarded as the heart of any virtual screening campaign.

Molecular docking is a computational method that allows predicting the preferred pose and conformation of one molecule (ligand) in relation to a second one (often larger and called receptor), when the binding between the two forms a

Validation of the VS

Since many stages of the VS, as well as each of those stages, rely on many parameters, it is important to design a protocol to validate it. This will allow to have confidence on the computational results but also to decrease the number of false positives in the final results Generally, the procedures employed to validate the VS campaign and in particular the molecular docking stage are classified in three groups: (1) quality of docked poses, (2) accuracy of scores or affinity estimates and (3)

Post-processing stage

Once all of the compounds from the library database have been docked into the binding pocket of the drug target, it is time to select which ones should carry on to the experimental testing. The easiest way of doing it, is to simply use the scores of the scoring function directly implemented in the docking algorithm, rank the compounds according to these values and take the top scorers for experimental testing. However this is not a straightforward process. At the end there are still too many

Future developments and perspectives

It is generally recognized that drug discovery and development are time and resources consuming processes. There is an ever-growing effort to apply computational power to the combined chemical and biological space in order to streamline drug discovery, design, development and optimization. In the pharmaceutical industry, CADD is being utilized to expedite and facilitate hit identification, hit-to-lead selection, optimize the absorption, distribution, metabolism, excretion and toxicity profile

Acknowledgements

This work has been funded by FEDER/COMPETE and the Fundação para a Ciência e a Tecnologia (FCT) through projects EXCL/QEQ-COM/0394/2012, PTDC/QUI-QUI/121744/2010, IF/01310/2013 and PEST-C/EQB/LA0006/2011.

References (111)

  • Y.Y. Ke et al.

    Eur. J. Med. Chem.

    (2014)
  • S. Kalyaanamoorthy et al.

    Drug Discov. Today

    (2011)
  • H. Park et al.

    Bioorg. Med. Chem. Lett.

    (2009)
  • D.G. Levitt et al.

    J. Mol. Graphics

    (1992)
  • C.A. Reynolds et al.

    J. Mol. Graphics

    (1989)
  • S. Kortagere et al.

    J. Pharmacol. Toxicol. Methods

    (2010)
  • B. Nisius et al.

    J. Biotechnol.

    (2012)
  • E.E. Bolton et al.

    Annu. Rep. Comput. Chem.

    (2008)
  • C.A. Lipinski et al.

    Adv. Drug Deliver Rev.

    (1997)
  • I.D. Kuntz et al.

    J. Mol. Biol.

    (1982)
  • H.A. Gabb et al.

    J. Mol. Biol.

    (1997)
  • C.M. Venkatachalam et al.

    J. Mol. Graph. Model.

    (2003)
  • M. Rarey et al.

    J. Mol. Biol.

    (1996)
  • M.Y. Mizutani et al.

    J. Mol. Biol.

    (1994)
  • W. Welch et al.

    Chem. Biol.

    (1996)
  • Z. Zsoldos et al.

    J. Mol. Graph. Model.

    (2007)
  • G. Jones et al.

    J. Mol. Biol.

    (1995)
  • G. Jones et al.

    J. Mol. Biol.

    (1997)
  • R.M.A. Knegtel et al.

    J. Mol. Biol.

    (1997)
  • D.K. Gehlhaar et al.

    Chem. Biol.

    (1995)
  • H. Gohlke et al.

    J. Mol. Biol.

    (2000)
  • S. Ghosh et al.

    Curr. Opin. Chem. Biol.

    (2006)
  • J.H. Van Drie

    J. Comput. Aided Mol. Des.

    (2007)
  • W.L. Jorgensen

    Science

    (2004)
  • Y. Tanrikulu et al.

    ChemMedChem

    (2009)
  • B. Budzik et al.

    ACS Med. Chem. Lett.

    (2010)
  • M. Hendlich et al.

    J. Mol. Graph. Model.

    (1997)
  • R.A. Laskowski

    J. Mol. Graph.

    (1995)
  • R.L. Desjarlais et al.

    J. Med. Chem.

    (1988)
  • V. Le Guilloux et al.

    BMC Bioinformatics

    (2009)
  • A.T. Laurie et al.

    Bioinformatics

    (2005)
  • R.C. Wade et al.

    J. Med. Chem.

    (1993)
  • M. Weisel et al.

    Chem. Cent. J.

    (2007)
  • S. Henrich et al.

    J. Mol. Recogn.: JMR

    (2010)
  • T. Kortvelyesi et al.

    Proteins

    (2003)
  • P. Schmidtke et al.

    J. Med. Chem.

    (2010)
  • G.M. Sastry et al.

    J. Comput. Aided Mol. Des.

    (2013)
  • B.C. Roberts et al.

    J. Chem. Inf. Model.

    (2008)
  • T. Beuming et al.

    Proteins-Structure Function and Bioinformatics

    (2012)
  • A. Khandelwal et al.

    J. Med. Chem.

    (2005)
  • N.C. Strynadka et al.

    Nat. Struct. Biol.

    (1996)
  • A.J. Williams

    Curr. Opin. Drug Discov. Devel.

    (2008)
  • J.J. Irwin et al.

    J. Chem. Inf. Model.

    (2012)
  • J. Chen et al.

    Bioinformatics

    (2005)
  • R.P. Ghose et al.

    Ann. Intern. Med.

    (1999)
  • D.F. Veber et al.

    J. Med. Chem.

    (2002)
  • D.W. Ritchie

    Curr. Protein Pept. Sci.

    (2008)
  • I. Hashmi et al.

    Proteome Sci.

    (2013)
  • R. Chen et al.

    Proteins-Structure Function and Genetics

    (2003)
  • Y.P. Pang et al.

    J. Comput. Aided Mol. Des.

    (1994)
  • Cited by (0)

    View full text