Share |

Hidden relationships in extreme data — Sherlock investigates

“I’ll know it when I see it.” It’s a valid method of inquiry — a uniquely human line of investigation, playing to the strengths of human intuition and pattern recognition. It’s also something even the largest, fastest supercomputers can’t handle; they need us to tell them what to find.

With the February launch of its modified YarcData uRiKA graph analytics appliance, called Sherlock, the Pittsburgh Supercomputing Center (PSC)  in Pennsylvania, US, is now providing researchers with the ability to search extremely large and complex bodies of information using a straightforward command similar to ‘find something important’.

Investigating. Image courtesy Rafael Mendoza.

“The fundamental challenge with large-scale, complex networks is that they can’t be investigated piecemeal,” says Nick Nystrom, director of strategic applications at PSC. “Their dense connections make it impossible to divide the data for independent analysis.” For complex networks, this leads to an extreme case of what is called ‘the memory wall’.

Sherlock’s strengths stem from its architecture – an impressive 128 hardware threads per processor, dedicated hardware to speed memory access, and a terabyte of global shared memory.Sherlock is supported by a grant award worth over $1 million from the Strategic Technologies for Cyberinfrastructure program of the US National Science Foundation (NSF). The award is aimed at focusing on and extending these techniques to a wide range of scientific research projects.

Cutting the problem down to size

Songjian Lu and Xinghua Lu of the Department of Biomedical Informatics at the University of Pittsburgh in Pennsylvania, US, are using Sherlock in research on genetic signaling pathways in tumors. “Cancers are mainly caused by the perturbation of a signaling pathway. When the pathway is disturbed, then the expression levels of genes change significantly. This may turn a normal cell into a tumor cell,” explains Songjian Lu.

Scientists are uncovering ever-increasing numbers of cancer-associated genes and chains of signals (employed by the genes to communicate with the cell’s machinery, and with each other). To expand understanding of these relationships, Songjian Lu’s team worked to model them computationally – and immediately hit the memory wall. “Theoretically, we know how to solve them, but practically, we cannot obtain the answers in a thousand or a million years — the running time is an exponential function of the problem input size.”

By applying insightful algorithms and randomly splitting the computer task into a series of short hops through the network, they can recast the problem into a series of parallel investigations into the relationships. Sherlock can potentially tackle this in hours.

Simple epidemic simulation. Video courtesy Pascal Ballet.

Modeling all of us

Working with YarcData, PSC customized Sherlock to include additional Cray nodes having commodity x86 processors.This enables researchers to tackle multi-step calculations that work best with different computer architectures. Heterogeneous computing – compared to brute force, speed, and size – is a more sophisticated way to create additional performance, extended ease of use, and more flexible applications.

“We want to see how the Sherlock platform performs when running agent-based models of infectious disease,” says Shawn Brown, director of public health applications at PSC. “These don’t require a lot of computation, but they’re very memory-based. Sherlock’s architecture has the potential to deliver.”

Sherlock provides uniform access to memory, effectively hiding memory latency. This could enable agent-based models — where “agents” act and interact autonomously — to run magnitudes faster than what is possible on traditional supercomputers. In the case of epidemics, the “agents” are computer models of individual people.

“With the speed Sherlock offers, we can run a real-time simulation of the entire United States,” says Brown. “This is a model that has a computational representation of every single person in the country. Right now it takes about six hours to run; we want to see if we can get it down to the order of minutes with Sherlock. The faster model would allow for a rapidly paced reaction by public health officials to stem emerging epidemics.”

Your rating: None Average: 4.3 (6 votes)

Comments

Post new comment

By submitting this form, you accept the Mollom privacy policy.