In 2008, molecular biologists around the world joined forces and set out on an ambitious three-year project: to sequence the genome of 1,000 people. Called the 1000 Genomes Project, they hoped to identify the common gene variants across nationalities and to identify the genetic susceptibility of many diseases.
What they didn’t fully anticipate was just how rapid the increase in sequencing technology would be. “Our institute joined the project because we wanted to [perform] second generation sequencing technology. Now, our genome tool kit can analyze thousands more bytes of data – cheaper and faster – than three years ago,” said Li Yingrui from the BGI (formerly the Beijing Genomics Institute, which dropped the name when the headquarters moved to Shenzhen).
So, when they finished sequencing the first 1,000 genomes in mid-2010, they moved the target, and now they are aiming to sequence more than double the original amount: 2,500 genomes instead. The new bottleneck in the project, though, became the efficient transfer and analysis of genetic data after a genome has been sequenced.
The data generated by the project, which is co-led by David Altshuler from the Broad Institute in Cambridge, USA, and Richard Durbin from the Sanger Institute near Cambridge in the UK, is held by and distributed from the European Bioinformatics Institute (EBI) and the US National Center for Biotechnology Information (NCBI), which is part of US National Institute of Health. There will also be a mirror website for data access in Shenzhen (China).
But for now, the largest sequenced data are often shipped between sites by mail.
“I know this is absurd”
“One genetic sequencer can generate half a terabyte of nucleotides [basic structural unit of DNA] per run in one week. There are thousands of sequencers producing data throughout the world,” said Li.
“Once genetic data is processed, it is copied to hard disks and sent via mail to another institute for analysis synchronization. I know this is absurd, but this is a fact,” said Li.
It may only take a week to generate half a terabyte of data, but after it’s generated, researchers can spend up to two weeks copying out data, mailing it and then having it uploaded onto a new machine for analysis. This is because current Internet bandwidth speeds are too slow.
When cloud stops being cost effective
“The main issue for us is that our data sizes are so large, that the cost and difficulty of moving the data to the cloud stops it being cost effective for many jobs. We do use the cloud for the Ensembl genomes database, but only to provide [data] mirrors that are closer to users,” said Phil Butcher, Head of IT at the Sanger Institute, one of the major research institutes involved in the project and located near Cambridge in the UK.
“We have looked at volunteer computing, but it has never seemed sensible because of data and network issues. We distressingly often resort to shipping hard disks around to transfer data between centers, rather than use the internet, or even via Aspera which is faster than ftp [file transfer protocol],” Richard Durbin said.
A team of scientists from labs around the world – a type of academic social network – is the way forward, Li said. Data could be then be stored and analyzed in an academic computing cloud which researchers could access remotely. It’s such an issue for them that the BGI has an open access journal dedicated to the topic: Giga Science.
The show must go on
While data transfer issues continue to distress those in charge, the science coming out of the project nevertheless continues at a fast pace. From the first phase of the project – when the 1,000 genomes were sequenced – the teams found that each person carries approximately 250 to 300 loss-of-function variants, which result in the gene having less or no function,and 50 to 100 variants previously implicated in inherited disorders.
More basically, though, the project plans to characterize over 95% of variants that have a frequency of 1% or higher in each of five major population groups (populations in or with ancestry from the Americas, East Asia, South Asia, Europe, and West Africa).
This will form a “high-resolution genetic map” said Li. This map will then form a baseline for future studies, such as identification of genetic susceptibility to disease.
In fact, the leaps in sequencing technology have allowed the project to increase in scope. “The 1000 Genomes Project is now sampling from several more populations than were originally proposed,” said Thomas Keane, researcher at the Sanger Institute.
“Now we can focus on individual ethnic groups,” Li said. BGI contributes the genomes of two main ethnic Chinese groups to the 1000 genomes project: The North Han, the southern Han (the largest ethnic group in the world) and the sparse Dai people.
The field of molecular biology won’t stop here. Next week, iSGTW will carry a feature about a more focused project with even more genomes, the UK10K project, which will sequence parts of the genome of 10,000 people in the UK.
Comments
I like the valuable
I like the valuable information you provide in your articlesI’ll bookmark your weblog and check again here regularlyI'm quite certain I’ll learn plenty of new stuff right here! Best of luck for the next!
The other interest entry Sekolah Belajar Forex FBS Indonesia also don't forget please go to interest review Konsumen Cerdas Paham Perlindungan Konsumen, also don't forget please go to interest review ESER Unlimited Power Bank also nice Cipto Junaedy dan juga Cipto Junaedy also nice review Iconia PC tablet dengan Windows 8 and also nulis . Please, Love it!
039;s
The winner for that year's Roseaward was really deserving. The awards night was really awesome too. Everyone had a good time. - office 2013 product key
I’ve read your things before
I’ve read your things before and you are just too awesome. I happy what you have got right here. Excellent stuff from you man. You make it entertaining and you still manage to keep it smart. This is truly a great blog. Thanks for sharing. Produk Kecantikan Cetak Yasin Kado Unik Perawatan Payudara
Debates over what government
Debates over what government must do to save the economy are happening almost everywhere, from public offices and school classrooms to wet markets and barber shops. It is argued that it is through its fiscal administrative power that government attempts to resuscitate the dying economy.Azur promotel
know nothing
Hot tub covers can be said to be one of many things that we can talk when we are talking about hot tub. Nowadays, people who decide to have hot tubs in their houses start to consider about the items called as hot tub covers too. Because of this fact, it can of course be concluded that those things surely have benefits that will be good not only for the hot tubs but also for people who own them. Here, we are going to talk about the benefits of hot tub covers in general. The purpose is no other but to let you know that when you have already had the hot tub, it is better for you to think about hot tub covers as well.
the behavior of human beings
the behavior of human beings is not same all the time therefore principles of economics cannot be formulated like the laws of sciences. Further laws of economics are not as exact as the laws of natural sciences. Yvon ammar
Those who make the best
Those who make the best career decisions will be the most successful. Making the best decisions includes weighing each option carefully, doing your research, and not accepting the first tempting offer that comes along.unofi
I hope they will be
I hope they will be successful in this endeavor. They have been doing this for a little while already and it is very noble. - Steven C. Wyer
Kiem tien tren mang
Useful information. Fortunate me I found your site by chance, and I'm shocked why this coincidence didn't took place in advance! I bookmarked it. Kiem tien tren mang
Caribbean jobs online
Usually I don't learn post on blogs, however I wish to say that this write-up very forced me to check out and do so! Your writing taste has been surprised me. Thank you, quite nice post.
Caribbean jobs
@anon Yeah, but internet
@anon
Yeah, but internet speeds here in Shenzhen are bad. While speeds in neighboring Hong Kong are extremely fast.
It's an unfortunate consequence of government control of the internet. Choosing between good internet speed for BGI (and everyone else) and political control, they prefer political control.
Knowledge gap in data transfers
It's astounding how large the knowledge gap is between different scientific disciplines with regard to data transfers.
E.g. the data transferred across the global network for CMS ranges from
about 50-250 TB per day within the last 90 days. This includes single source and destination sites of 50-80 TB/day.
A 10Gbps network link can theoretically transfer about 100 TB per day.
These numbers seem to indicate
that it's not hard to transfer data at rates that
would satisfy the genomics community.
Post new comment