It’s an interesting irony of computing that programs – and the programming languages in which they are written – are often much longer-lived than the computing platforms they run on. Computers are constantly evolving as hardware continues to advance, and one of the features of higher-level programming languages is to insulate programmers from those changes. There is a tension however. While programmers need to be able to express their programs independently of particular and ever-changing computers, programmers also want to take advantage themselves, of new features on new computing platforms. Adding new capabilities to existing languages in order to keep up with hardware can result in unwieldy systems.
One important architectural development is the incorporation of graphics processing units (GPUs) into high-performance computing platforms. GPUs will likely be an important technology in the move to exascale computing.
Harlan, developed by Eric Holk, a computer science Ph.D. candidate at Indiana University, US, is a newly released high-level language available for general purpose GPU computing. “The programming models that we have been using are becoming increasingly harder to manage when you scale them up, so we need to be looking at new ways of programming machines.” Holk says.
“One thing researchers who want to use GPUs struggle with, is how to write applications for machines with two different processors and multiple types of parallelism,” says Andrew Lumsdaine,director of the Center for Research in Extreme Scale Technologies (CREST), part of the Pervasive Technology Institute at Indiana University.
“GPUs are highly parallel, but have a different kind of parallelism than multicore CPUs. We’re investigating productive mechanisms for programming machines overall and integrating these approaches into programming,” Lumsdaine says.
Harlan makes it easy for programmers to point out where parallelism is available, what computation they may need, and what data they will likely work with. This gives the language runtime maximum flexibility in mapping programs to different devices.
GPUs and abstraction
“The existing GPU programming models are still very low-level, requiring the programmer to manage many of the details that take attention away from their applications or algorithms.” says, Holk “I’m working on ways to raise the abstraction level, so the programmer doesn’t have to worry about those low-level tasks.”
“Adding layers of abstraction is a tried and true way of making things simpler for the programmer. But it's not simply a matter of adding the layer of abstraction. It’s also creating the right sets of abstractions, related to how particular scientists or scientific programmers think and what they focus on,” adds Lumsdaine.
Similar to the CPU, GPUs are tuned to focus on how to execute certain types of problems. The CPU produces single-thread performance; it takes one execution and runs it as quickly as it can. GPUs however, are willing to sacrifice single-thread performance, but make up for it with better parallel performance.
“CPUs do a lot of out-of-order execution,” Holk notes. “CPUs will reorder a program as it’s executing to make the best use of the hardware. GPUs don't do as much reordering. Instead they switch between many threads that are all running concurrently.”
A GPU has a much higher memory bandwidth than a CPU, so they're good for streaming processing where you're reading in a bunch of data but doing relatively simple computations with that data. Where CPUs tend to do better when you're doing more complicated computations on a single piece of data.
Typically GPU languages have given programmers one dimensional or multi-dimensional arrays of scalar values, which allow for relatively simple computations. “Harlan has vectors (or arrays) built in, but they are more flexible,” Holk says. “If you have a matrix, or array of arrays, most GPU languages restrict the inner arrays to the same length. But in Harlan they don't, which gives the programmer more flexibility.”
In computer science graphs are very general, non-application specific abstractions – a set of things and the relationships between those things. For example, this is how Google maps are represented. Solving problems using graphs in the abstract, allows you to solve all kinds of concrete problems: internet routing, planning the best route on a map, laying out integrated circuits on a chip, or finding the most highly ranked page in a Google search.
Harlan also has a region-based memory system where it assigns related data to the same logical object, called a region. This streamlines situations where a computation, and the data it depends on, moves from the CPU to the GPU. “The nice thing about having region-based memory is that moving between CPU and GPU memory, is essentially the same problem as memory that might be on a different node in a cluster system. So I think a lot of the ideas would work well in a distributed environment, but I haven’t explored it yet,” says Holk.
Accelerating the future
So far, Harlan is one of only two systems that has higher-order procedures running on the GPU. Presently it is still a research quality language, but scientists interested in doing machine learning, as well as running Harlan on other processors such as field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), have already contacted Holk. Holk adds that Harlan is well-suited for SIMD-style (single instruction multiple data) programs as well.
Holk is also thinking about the possibility of writing computational kernels, with the idea that the compiler could look at the kernel and determine if a computation is better suited to a CPU. Even though it was asked to run on a GPU, it will automatically run on the CPU instead, making the GPU available to run other code. “You could also schedule different computations on different hardware resources. That type of flexibility would be extremely useful the more heterogeneous compute accelerators are in play. If the language can handle it, it’s one more thing programmers won’t have to worry about.”