Opinion - From Moore to Metcalf: the network as the next database platform
Innovation in database systems technology has traditionally been driven by the push and pull between Moore’s law and Shugart’s law, which describe the competing exponential growth in both computing power and the volume of the data over which that computing must be applied.
Increasingly, however, it is Metcalf’s law that is putting pressure on state-of-the-art data management.
Metcalf’s law describes the network effects that cause networks to continually expand.
The practical impact of this law is that data-intensive applications are becoming increasingly distributed. Nowhere is this trend more apparent than in the area of scientific grid computing.
As science has become more data-centric, scientific users have come to rely upon relational database technology and the SQL query language as key tools in their data analysis processes.
The inherent benefits of SQL include: productivity due to powerful support for bulk data operations; ease of maintenance-end evolution due to declarative programming; and efficiency due to sophisticated optimization techniques developed over several decades.
In distributed environments, however, current database technology acts only as an endpoint.
Thus, many users still rely on hand-coded solutions for tasks such as filtering, cleaning, and event detection/response in high-volume data streams flowing through networks.
Stream Query Processing turns tradition on its head
An emerging database technology, called Stream Query Processing, has the potential to change all this.
With Stream Query Processing, the traditional database arrangement of persistent data waiting for queries to arrive is turned on its head. In a stream processing system, it is the queries that are persistent, and processing that is initiated by the arrival of new data.
This inverse structure allows queries to continuously generate incremental answers based on the data that have been seen “so far”.
As a result, interactive applications and applications that demand low latency can be written in the familiar and powerful SQL language.
Furthermore, this approach can provide significant performance benefits that result from the ability to optimize all of the queries as a unit—thereby avoiding redundant work—and the intelligent and adaptive placement of query functionality in the network.
Stream Query Processing is an emerging technology that grew out of research performed largely at universities over the past decade.
Early prototypes spawned numerous startup companies, and recent product announcements by existing enterprise software players have led to an expansion of interest and choices in the market.
Grid computing users have come to appreciate the benefits of SQL processing for post hoc data analysis and manipulation, but have believed these benefits were unavailable for on-line processing.
Stream Query Processing with full support for the SQL language removes this unnecessary barrier to productivity and performance.
As such, it represents the natural adaptation of database technology to the increasingly distributed world prescribed by Metcalf’s law.
- Michael J. Franklin
Michael J. Franklin is a professor of computer science at the University of California, Berkeley, U.S., and chief technical officer of Truviso, Inc. He was a keynote speaker at last month's IEEE International Symposium on High Performance Distributed Computing .