“I believe all human disease has a genetic component,” said David Valle, professor of genetic medicine at John Hopkins University, USA, who presented during the July 2011 Annual Short Course on Medical and Experimental Mammalian Genetics at The Jackson Laboratory (JAX), established in 1929 in Maine, USA. In many cases, he said, one gene or a small group of genes can be the catalyst for life threatening disease in humans.
“Today, molecular biologists perhaps know only how 50% of the human genome works,” said Carol Bult, a bioinformatics and developmental genomics researcher at JAX. She is on the hunt for disease-causing genes, with math, mice, and molecular analysis through computing are the tools that will bring her closer to understanding how human genes work.
“The genome is now the basic unit of biology,” said Jonathan Flint of the Wellcome Trust Center for Human Genetics. Genes are pieces or subunits of DNA (deoxyribonucleic acid) or RNA (Ribonucleic acid) that provides the genetic blueprint our bodies use to build proteins from brain cells to skin cells. If damaged or mutated, the same gene or set of genes that help regulate bodily functions can also harm them.
Bult and her collaborators at Massachusetts General Hospital analyzed genomic data of normal lung and diaphragm development to identify over 20 genes that might have a role in causing one of the most common human birth defects, Congenital Diaphragmatic Hernia. This disorder causes a life-threatening hole to manifest in the diaphragm when a baby develops in the womb. Today it occurs in about one in every 2,500 births according to the Association of Congenital Diaphragmatic Hernia Research, Awareness and Support.
Bult and her colleagues had to figure out what genes were involved and how they worked when not malfunctioning. “Understanding how genes work together to build a normal lung and diaphragm provides a natural framework for discovering genes involved in complex disease. We used biology to guide us and to provide a genetic picture,” she said.
First, the research group used DNA microarray technology to perform gene expression profiling of normal lung and diaphragm development in a JAX mouse. A DNA microarray is a collection of microscopic DNA spots attached to a solid surface, such as a quartz chip. Researchers use this device to measure expressions of large numbers of genes simultaneously.
“I did much of the analysis on our High Performance Computing cluster. Then we used these data to identify key developmental genes and pathways. We combined the results of the expression analysis with predictions of functional relationships among genes, and with information on genes already known to be involved in CDH,” said Bult.
They identified a network of genes not previously thought to be involved in CDH and generated a list of potential gene candidates which they could study further. They chose one target gene to validate via experiment.
Bult needed to do a real-world test to back up her numbers, and for this she used mice. A mouse provides a physically realistic model to study the development of specific characteristics of genes (phenotype) that is not possible in humans.
“You can’t do these tests in humans by turning off their genes because that’s unethical,” said Bult. Mice genomes are actually very similar to humans; they get the same diseases that we do. Their life cycles are very short too: five generations can be born in one year so genetic changes can be studied at a much faster rate than in other animals.
“It would not be possible to do the research I do without the mice. They are key to understanding mammalian biology and disease processes,” Bult said.
While many researchers, such as Bult, have used mice extensively for their study, there is a debate in the scientific community about just how useful these results can be for studying disease in humans. A recent study by researchers at Washington University, USA, called into question the reliance on animal models in cardiovascular research. “The problem is the difference in gene expression between the mouse and the human is very very large,” said Igor Efimov, PhD, a biomedical engineer at Washington University.
To these sorts of arguments, Joyce Peterson, public information manager at Jackson Lab said that, “mouse genetics is cheap compared to using other animals. They are fundamental to understanding not just human diseases, but other animals such as your pet dog.”
Bult ordered a custom mouse from one of Jackson Lab’s 5,000-plus genetic mouse strains, and more than 3.2 million mouse embryos and sperm cryogenically frozen in liquid nitrogen (including strains frozen as early as the 1970s).
This particular mouse had the target gene switched off and it developed CDH when developing as a foetus. “This particular potential CDH gene was known to cause heart defects, but it also resulted in a hole in the diaphragm, just as we had predicted,” said Bult. The research paper describing this study will be published later this year. In the near future, she will create a mouse with multiple genes switched off to test the relationship of CDH genes. If successful, the research may lead to new diagnostic methods and therapies.
Now, her biggest challenge is not the type of mouse she creates, but the computing power at her disposal.
“When I did my PhD in the 1980s my desktop was adequate enough to perform analysis. This is not the case anymore.” To understand pbx1 and its associated genes, Bult used a cluster of high performance computers to analyze the genetic data. But even these computer networks are reaching their limit. To understand how hundreds or thousands genes work, a technique called high-throughput sequencing is required. This is the concept of sequencing multiple genes in parallel and producing millions of sequences at once.
A large computing infrastructure is required to handle these larger data sets. At the moment Bult is using Amazon’s EC2 cloud service. “The output of our data is already in the gigabytes and moving into the petabyte scale [1,000 gigabytes]. I don’t need a lot of computing power every day, but whenever I do use one, it has to be scalable. We’re creating a new biology and we need better computation to support our work,” she said.
Steven Salzberg, professor of medicine, biostatistics, and computer science at John Hopkins University is doing just that. “The efficiency of sequencing has increased 250,000-fold since the human genome was completed 10 years ago. The fastest DNA sequencers today can produce 600 billion bases of DNA in one week. That's 200 human genome equivalents,” said Salzberg. In comparison, the original human genome project took 13 years to complete.
“Now analysis has to be automated on a large scale to make any sense of all the data,” Salzberg said. He is currently building a computing grid dedicated to genomic research.
"We will initially have three large servers, each with 512 GBs of RAM and 48 cores, for doing large genome assemblies. We will also have a grid with 2,400 cores, initially with 200 TBs of high-speed disk space. This will all be on a high-speed research network, which we need in order to move very large files around,” he said.
This infrastructure will be for Salzberg’s group and others in his institute to use when fully operational. Salzberg thinks that all genomic research groups will need computing experts. He said, “without advanced computing expertise, you'll have to just watch while others make the discoveries.”