Feature - Joe Hellerstein on cloud programming
Earlier this year, we discussed the possibilities raised by working with parallel programming languages in an interview with John Shalf. This week, iSGTW interviews Joe Hellerstein, the principle investigator for the Berkeley Orders Of Magnitude project, about cloud programming languages. Hellerstein is a professor of computer science at the University of California, Berkeley, where he focuses his work on data-centric systems and the ways in which they drive computing.
Why do we need a language for cloud programming, what makes a cloud programming language different from a parallel programming language, and how do cloud programming languages work? Read on for answers to these and other questions.
iSGTW: What is cloud programming?
Hellerstein: I think of “the cloud” as a new computing platform, the way that the PC or the mobile phone were new platforms when they were introduced. In general, a new platform takes off when creative software engineers find a way to exploit the unique properties that the platform offers. The unique property of cloud computing is to commoditize programmer access to thousands of CPUs and disks.
The next step for cloud computing is to enable creative people to translate their ideas for the platform into reality, by providing a programming environment.
iSGTW: Why do programmers need cloud programming?
Hellerstein: Of course it’s possible to write cloud software with existing programming environments and languages. That’s precisely what goes on inside large clusters at the big hosted services. The problem is that the environments at hand today were designed for single-node programming, and using them to write scalable, robust cloud services is incredibly difficult. As a result, there are very few programmers with the competency to write cloud services that take advantage of the platform’s potential. And even those programmers are only a fraction as productive as they could be. In general, the potential for creativity in developing new technologies on the cloud is being stymied by an inability to quickly write software, learn from the experience, and move forward.
iSGTW: How does cloud programming differ from parallel programming?
Hellerstein: Parallel computation is certainly a big part of the intellectual hurdle in cloud programming. Cloud programming requires programmers to manage not only parallelism, but also the way it interacts with the complexities of distributed computing and database management. Because a cloud platform is made up of clusters of thousands of machines, the platform per se displays component failures on a regular basis. So in addition to parallelism, the programmer has to deal with node failures and large variations in compute and messaging speeds. And it’s likely that any service that uses thousands of CPUs will also manage terabytes to petabytes of storage. For both technical and economic reasons, this typically requires today’s programmers to cobble together custom solutions for data management: dealing with data replication and consistency, and writing custom data analysis and transformation logic.
iSGTW: How does Bloom approach the problem of cloud programming?
Hellerstein: Bloom starts by throwing out the classical von Neumann model of the computer that underlies traditional programming: the array of memory, and step-by-step recipes for a CPU to follow. With a little distance, it’s obvious that this is a horrible abstraction for thousands of machines working in parallel on massive, highly-replicated data sets. And yet all the popular programming environments are built on the von Neumann model. So most of the effort in cloud programming today is wasted bridging this gap. Programmers are forced to write ordered, step-by-step programs for individual machines, and they have to reason about and control how those ordered threads of computation might interleave when run on multiple machines.
Bloom starts from the other direction. Its building blocks provide disordered computation as a default. Instead of an array of memory, it provides unordered sets of data as its core storage abstraction. Instead of ordered computational steps, programmers assemble unordered collections of logical expressions. Using this “disorderly” baseline, Bloom provides additional tools for programmers to identify points in their program where ordering is required to clarify the meaning of their computational properties, independent of parallelism. Once identified, these “points of order” can be imposed via well-known coordination protocols that are provided in the language, and it is relatively easy to build new ones in Bloom. The important point is that programmers are not enticed into introducing order into their programs unless it’s absolutely needed, and as a result they naturally write code that parallelizes well. For similar reasons, Bloom code makes replication of data and computation relatively easy to achieve, which attacks the challenges of distributed databases and fault-tolerant computing.
iSGTW: Are there other approaches to cloud programming that you’ve considered, or heard of others using?
Hellerstein: Bloom came out of the research tradition of logic programming and deductive database languages. The other tradition that’s important in this domain is functional programming. Languages like Erlang and Scala come from that lineage, and are being targeted at cloud computing. There are some similarities to logic programming in these approaches: for example, both traditions try to do away with the von Neumann idea of computations “modifying” cells in a memory. But I think Bloom’s embrace of logic has given us unique advantages, exactly because the core of logic is unordered, and logic provides some powerful tools to reason about when ordering is necessary (when programs require what is called non-monotonic reasoning). Functional programming is still fundamentally an exercise in writing ordered computational elements.
iSGTW: To your knowledge, is anyone else working on a cloud programming language?
Hellerstein: In addition to the functional languages like Erlang and Scala, there are languages and language environments targeted at one or more aspects of the platform. One common example is the MapReduce paradigm from Google, which is supported by the open-source Hadoop project funded by Yahoo. But it’s not a general-purpose cloud programming environment: it’s a narrowly-targeted domain-specific environment for batch analytics and data transformation. Most of the other efforts are more like libraries or programmer services: this includes cloud environments like Microsoft Azure and Google AppEngine. These focus on simplifying access to distributed storage from traditional languages like Java, C# and Python, but provide little assistance with coordinating parallel computation. Microsoft’s LINQ extensions to C# and other languages are a step on the road to a language like Bloom, and bridge some gaps between functional and logic programming. If Microsoft decided to push them through to a full-service cloud language that would be interesting.
iSGTW: Would a cloud programming language such as Bloom be suited to run on a grid? More suited than a parallel programming language?
Hellerstein: I think it would make some things easier, and some harder. Our target with Bloom today is to enable programmers to easily harness many machines to a task; we’re less worried with getting maximal performance out of each machine. As a result, the language is quite high-level, and many issues related to raw performance are taken out of the programmer’s hands. That is great for productivity, but can be frustrating for achieving high performance. In some cases, the programmer may have low-level ideas for performance improvements – especially on a single machine – that the language might prevent them from expressing. Most grid computing applications I’m aware of start from the other end of the spectrum: they are very carefully engineered to maximize performance, but the process of developing them is terribly time-consuming because so many details need to be explicitly laid out. So I think Bloom might make a great rapid prototyping language for grid developers, as an agile “playpen” for the kind of experimentation and lateral thinking that is hard to do if you’re trying to control every aspect of a computation.
As languages mature, these kinds of performance-control issues tend to get ironed out, and the high-productivity languages tend to push aside the low-level languages in all but the most performance-critical cases. So I’m optimistic about the long-term prognosis for a language like Bloom in high-performance environments. But in the short term it’s important to understand the practical tradeoffs.
iSGTW: What’s going on with the BOOM team,
and with Bloom specifically, right now?
Hellerstein: To clarify, BOOM is the acronym of our research project: the Berkeley Orders Of Magnitude project, which is trying to enable people to build Orders Of Magnitude more powerful software with Orders Of Magnitude less effort. It is a team
of about seven of us at Berkeley, with three collaborators at Yahoo Research. Bloom is the language we’re developing in the BOOM project.
We spent the last two years doing two things. First, we wrote a set of large-scale cloud infrastructure called BOOM Analytics in one of our earlier prototype languages, to get some experience with cloud programming. Based on that experience, we developed a new formal logic we call Dedalus that cleans up the math behind our previous ideas, providing a sound theoretical bridge between the tradition of logic programming and the realities of distributed computation.
With that work behind us, we’re in the midst of three main efforts. First, I took the Dedalus foundation logic and wrote a first-generation Bloom programming environment that we’ll be sharing with friends this fall. Second, my team is moving ahead with a higher-performance, much more sophisticated Bloom runtime (or “virtual machine,” to use Java’s terminology) that will replace my prototype when we do a public release of Bloom next spring. And third, we’ve been developing the analysis tools for helping programmers do precision identification of the “points of order” in their program, to minimize programmer effort on parallel coordination, and maximize opportunities for programs to exploit parallelism. As that is ongoing, I’m also working on a programmer-centric book on Bloom to accompany the language when it’s released next year.
That’s the targeted work. We’re also doing a lot of more speculative and academic research on how Bloom and its logic programming roots relate to problems of interest in distributed and parallel computing. For example, how it sheds light on computing folklore about data replication and consistency, how it informs notions of computational complexity in a CPU-rich environment, and the role of time and space in distributed programming.
iSGTW: A lot of niche programming languages fade into obscurity fairly rapidly. Why will Bloom be different?
Hellerstein: Well I never promised it would! To be clear, Bloom is an academic project, and our ultimate goal is not to sell a language, but rather to push the boundaries of how people understand computing. That said, we very much want our science to have impact on practice, and the most effective way I know to do that is to build something real, see how it gets used, and improve it. So Bloom is the vehicle for our ideas. If the end result is that Bloom becomes a popular programming language that would be gratifying. But I’d be very happy to see the ideas from Bloom form the basis of other solutions.
To answer your question at a higher level, the reason I think our work on Bloom is going to make a difference is because of the always-sensitive issue of timing across science and engineering – it’s the right idea at the right time. There are decades of theoretical papers on logic programming that Bloom is building on. Looking back from where we are now, some of that work is fascinating and relevant, some just fascinating, and some neither. The work at that time was not connected to a major technology trend, which deprived it of feedback and focus. But now we are culling that literature for lessons that apply to one of today’s critical problems: parallel and distributed programming with massive compute and storage resources. And as we find connections between the older ideas and the current needs, we are getting practical feedback to help us develop new ideas that push the science even further, and better address the practical needs. It’s exciting to have this kind of positive feedback loop emerge with such a wealth of older theoretical research in the library for inspiration. And our results show genuine benefits: we’re able to build and extend complex distributed software with an order of magnitude less effort than traditional languages allow.
—Interview by Miriam Boon, iSGTW