Biology group issues challenge to computing
And what is true of architecture is true of chemistry and structural biology as well: a molecule’s function — how it behaves, and what it causes to be produced — can be deduced by its form. Knowing a molecule’s precise, three-dimensional geometry tells researcheers how and where it will bind to some other molecule, and what effects the two combined molecules will have when they are united.
Or, as Timothy Lovell, a computational chemist at a pharmaceutical company, summed it up: “Once you have the structure down on paper, you can begin to understand the function.”
Practical applications include the modeling of everything from catalysts to superconductors. The technique is particularly useful in the biological sciences, as it can predict the three-dimensional structure of macromolecules in solution, including substances that are key to understanding how the human body works.
But determining the exact shape of a molecule can be hard, and the problem only increases in complexity when the molecules get larger, such as when a small molecule binds to a larger one. Like a giant Rubik’s cube, there is only one correct answer to a problem with an enormous variety of possibilities; a single molecule with just four rotatable bonds searched in 60-degree increments will generate 1,296 separate permutations. And that results from just studying one aspect of a simple molecule; add more atoms or more features and the problem increases exponentially.
The only technique available to predict the three-dimensional structure of such large complex molecules in solution — such as proteins and DNA — involves nuclear magnetic resonance (NMR) spectroscopy, a field which needs improved computational modeling methods, says the “eNMR” project. Currently, analyses are extremely labor-intensive, and automation would accelerate the pace of research, helping scientists to identify molecules more quickly.
To stimulate these advances, eNMR has launched a new initiative. In September’s Nature Methods, the project issued a ‘manifesto’ to the entire biomolecular Nuclear Magnetic Resonance community to participate in a large-scale test of modern computing algorithms. This community-wide endeavor will potentially improve efficiency, reproducibility and reliability of NMR structure determination. eNMR will be using the Enabling Grids for E-sciencE infrastructure to power their analysis.
“If we can improve this technology, it will help researchers in structural biology to be more productive. This could help shorten the whole process of designing new drugs,” said Alexandre Bonvin of Utrecht University, The Netherlands, a member of the eNMR project and one of the paper’s authors. “Insight into the shape of biomolecules is the starting point for designing new drugs.”
The small molecule ABT-737, for example, was found by screening a chemical library with NMR-based techniques. The discovery of ABT-737 was covered in the 2005 Nature paper “An inhibitor of Bcl-2 family proteins induces regression of solid tumors,” as a promising cancer fighting compound. (Though it has not, as of yet, been marketed.)
The next step
The eNMR project has worked to improve computational methods used for automation since late 2007, using EGEE’s computational resources to calculate molecular structures from NMR data. Their next step is to involve all interested stakeholders in their efforts. Through this challenge — called “Critical Assessment of automated Structure Determination of proteins by NMR” or CASD-NMR — the team invites laboratory researchers to submit molecules (technically the spatial coordinates of the atoms in the molecule with their associated NMR data) to help improve the algorithms used by the global eNMR team.
The CASD-NMR challenge will help computer scientists to automate NMR calculations and test them against blind datasets. The eNMR project and the U.S. National Institute of Health’s (NIH) Protein Structure Initiative are providing data for this challenge, and the CASD-NMR team hopes that other researchers will provide additional data sets.
In the future, automation in NMR will allow ‘unsupervised’ results to be accepted by the community as being correct and viable, ready for inclusion in the Protein Data Bank (PDB) straight away. The PDB is a database that stores macromolecular structural data that is freely and publicly available for further research.
“At this time fully automated methods are not reliable enough to be used blindly; this CASD-NMR experiment will be a valuable tool to see where we stand in automation and improve our methods,” says Bonvin.
CASD-NMR is set up to give the various teams eight weeks to apply automated methods to generate structures at a level of quality comparable to that of structures deposited into the PDB. An assessment meeting is planned for mid-2010 to look at the results. Data are made available for CASD-NMR participants through the e-NMR project’s webpage.
—Danielle Venton, EGEE. The paper's details can be found at “CASD-NMR: critical assessment of automated structure determination by NMR,” Nature Methods, Vol.6 No.9 September 2009 625