Q & A - Arie Shoshani talks about Scientific Data Management
Welcome to the petascale era, where virtually every field of science is hungry for computational power, and if we’re not careful, we could drown in the deluge of data.
Under these circumstances, it becomes supremely important to manage data effectively. This in turn suggests the need for more scientists and software developers to receive training in how to defeat the data deluge.
That’s why it’s an auspicious time for Arie Shoshani and Doron Rotem to launch their book, Scientific Data Management: Challenges, Technology, and Deployment.
To learn more about it, iSGTW caught up with Shoshani in his office at Lawrence Berkeley National Laboratory.
iSGTW: What made you get started on this book?
Shoshani: I gave a talk about everything in the scientific data management center at a SIAM conference. After that talk I was approached by two publishers basically saying, “Gee, you know this area is not well covered in general and would you consider doing it.”
iSGTW: How did you react to that?
Shoshani: It was kind of scary at first. I told them that if it was an edited book then it could make sense for me, because I wanted to draw on all the people that I know in the domain. My colleague Doron Rotem agreed to work on this with me. Basically we split the work.
iSGTW: Was it difficult balancing your normal work at Berkeley Labs with the work you were doing on this book?
Shoshani: It would not be appropriate to do all that work under DOE funding and then we get the royalties. But we didn’t want to get our own computers and our own emails to separate the two. So we assigned all of the royalties to the R&D fund here at Lawrence Berkeley Lab. This book was not put together in order to make money; it’s in order to get the technology out there organized and visible to others.
iSGTW: What about the time you spent working on the book?
Shoshani: It’s been hard. It’s in addition to our regular work; given the regular workload, I probably spent time almost every weekend for the last two years.
iSGTW: So in a nutshell, what is the concept behind the book?
Shoshani: The book was from the beginning intended to be not a textbook in an undergraduate class, but rather a collection of topics that are all related from end to end. That is, all aspects of data managenent in the scientific domain. So we’ve gone all the way from hardware to file systems to data simulations, analysis, visualization, collecting metadata, workflow, all that you would find when you’re dealing with scientific data.
iSGTW: If each chapter has a different author, is it more like a collection of papers?
Shoshani: No. We looked at a lot of situations to find the connections between different chapters so that they refer to each other when you’re talking about different types of technology. But we also tried to structure every chapter such that it could be standalone in the sense that it gives you an introduction to the topic, it brings up the issues, talks about solutions, and gives at least one example if not more of applying it.
iSGTW: Was it difficult getting your authors to research and cover the field as a whole, rather than simply writing from their own existing knowledge?
Shoshani: The lead authors are typically people who not only have done some of their own work, but many of them are teaching courses and thus they do cover the field. Each chapter would have an introductory part that talks about, “These are the problems that exist here, this is what other people have done.” Some of the chapters have a list of something like 80 references that cover the field. We made it a condition for people to take on, so it’s very much covering the field. It’s not a paper on their own work at all.
iSGTW: But was it difficult finding authors who could abide by that?
Shoshani: Yes; people are very busy. We had to sometimes negotiate with people and convince them that this is a worthwhile endeavor. In some cases we needed to get on the phone and discuss with them what it is that we want and have them think about it for a little bit.
iSGTW: What topic did you have the most trouble finding an author for?
Shoshani: We wanted a chapter on emerging and new technologies, database technologies, and those are in the commercial field. Those people don’t want to talk about their stuff. So you have to find other people at the universities who can talk about that.
—Miriam Boon, iSGTW