Respiratory infections are the main reason why children under five end up in hospital. However, in up to 40% of the cases, it’s not possible to define the exact cause of the disease and, in many cases, this is because the viruses are still unknown to science.
Identifying as many viruses as possible improves the chances of correct diagnostics, and helps to determine the best treatment for patients. Knowing which virus is responsible for which disease is also very important to detect potential epidemics or to assess the seriousness of viral infections.
Lia van der Hoek and colleagues from the Virus Discovery Unit, at the Academic Medical Center of the University of Amsterdam (AMC), has been working on VIDISCA – a method to spot new viruses from previously unidentified genetic sequences.
The hunt for new viruses starts at the hospitals, with the collection of nose and throat swabs from affected patients. Back at the lab, the first step of the VIDISCA method is to remove residual cells and other biological material, to enrich the sample’s viral genetic material. The genetic sequences are then amplified with standard laboratory techniques.
Success first came in 2004, when the team reported the discovery of the coronavirus NL63, which is implicated in croup – a disease that causes throat swelling and coughing in children under six-years old. The virus was spotted in samples taken from a seven-month old baby, who had been admitted to a Dutch hospital with symptoms of acute respiratory infection.
The genome of the coronavirus NL63 was sequenced, and the analysis showed that the virus was a new species with distinctive features, as the team reported in the journal Nature Medicine.
Over the past six years, van der Hoek’s work on the VIDISCA method has benefitted from developments in the so-called next generation sequencing techniques. The improved version, dubbed VIDISCA-454, was introduced in 2009.
The result of a VIDISCA-454 analysis is a haystack of information that includes a 'needle', or genetic sequence of the unknown virus. As always, finding the needle is difficult. One way of solving the conundrum is to compare the mystery sequences to known viruses catalogued in massive reference databases, such as GenBank.
The US National Center for Biotechnology Information (NCBI) has created the BLAST tool to compare given sequences with databases via the web. But uploading data from VIDISCA-454 to this portal proved to be virtually impossible, given that the average experiment produces approximately 400,000 sequences.
Looking for a solution, van der Hoek contacted Antoine van Kampen, head of the AMC bioinformatics department, who assigned Barbera van Schaik to the problem. Van Schaik developed a workflow – the sequence of computational steps required to perform an analysis – to allow BLAST to run on grid computing resources of the Dutch e-science grid.
The workflows and databases were made available to the Virus Discovery Unit via the e-BioInfra platform developed and operated by the e-bioscience group of Silvia Olabarriaga.
With these tools at hand, van der Hoek’s team is able to analyse a VIDISCA-454 experiment within 24 hours compared to weeks of intensive manual work. A test with 1,444 samples produced 4,783,684 sequences and showed that the analysis can be repeated within 14 hours, compared to 17 days if it would run sequentially on a local server.
Research is now ongoing on the meaning of the more than 4,000,000 sequences identified by VIDISCA-454. Van der Hoek now hopes to find new viruses linked to respiratory infections, diarrhoea, meningitis, encephalitis and other serious diseases.
This is an edited version of an article that first appeared on the EGI website.