 | What’s the best way to count how many of each type of jelly beans are in a jar? Traditional database systems work by emptying the jar of beans, counting them, and then returning them to the jar. Stock image from sxc.hu |
Innovation in database systems technology has traditionally been driven by the push and pull between Moore’s law and Shugart’s law, which describe the competing exponential growth in both computing power and the volume of the data over which that computing must be applied.
Increasingly, however, it is Metcalf’s law that is putting pressure on state-of-the-art data management. Metcalf’s law describes the network effects that cause networks to continually expand. The practical impact of this law is that data-intensive applications are becoming increasingly distributed. Nowhere is this trend more apparent than in the area of scientific grid computing.
As science has become more data-centric, scientific users have come to rely upon relational database technology and the SQL query language as key tools in their data analysis processes. The inherent benefits of SQL include: productivity due to powerful support for bulk data operations; ease of maintenance-end evolution due to declarative programming; and efficiency due to sophisticated optimization techniques developed over several decades. In distributed environments, however, current database technology acts only as an endpoint. Thus, many users still rely on hand-coded solutions for tasks such as filtering, cleaning, and event detection/response in high-volume data streams flowing through networks. Stream Query Processing turns tradition on its head An emerging database technology, called Stream Query Processing, has the potential to change all this. With Stream Query Processing, the traditional database arrangement of persistent data waiting for queries to arrive is turned on its head. In a stream processing system, it is the queries that are persistent, and processing that is initiated by the arrival of new data. This inverse structure allows queries to continuously generate incremental answers based on the data that have been seen “so far”. |