Share |

iSGTW Technology - Weka4WS: distributed data mining using web services


Technology - Weka4WS: distributed data mining using web services

The Waikato Environment for Knowledge Analysis, or WEKA, is software developed at the University of Waikato, New Zealand. It gets its four-letter acronym from New Zealand’s native weka, flightless brown birds about the size of a chicken.
Stock image from

Released in June 2007, Weka4WS is a new tool designed to open the way for worldwide use of data mining services.

Developed at the University of Calabria Grid Computing Lab, Weka4WS extends the open source Weka toolkit for supporting distributed data mining on grid environments.

The original Weka provides a large collection of machine learning algorithms, written in Java, for data pre-processing, classification, clustering, association rules and visualization, which can be invoked through a common Graphical User Interface.

In Weka, the overall data mining process takes place on a single machine, since the algorithms can be executed only locally. Weka4WS extends Weka to support remote grid execution of the data mining algorithms through web services—hence the 4WS.

In this way, distributed data mining algorithms for classification, clustering and association rules can be concurrently executed on decentralized grid nodes.

To enable remote invocation, all the data mining algorithms provided by the Weka library are exposed as a web service, which can be easily deployed on available grid nodes.

Thus, Weka4WS also extends the Weka GUI to enable the invocation of the data mining algorithms that are exposed as web services on remote grid nodes.

The extended Knowledge Flow: Still under development, this component will allow execution of data mining workflows over multiple grid machines.
Image courtesy of Weka4WS

Grid integration

To achieve integration and interoperability with standard grid environments, Weka4WS has been designed by using the Web Services Resource Framework as an enabling technology.

In particular, Weka4WS has been developed by using the WSRF Java library provided by Globus Toolkit 4.

The current version of Weka4WS (1.0, released 7 June 2007), is based on the latest version of Weka (3.4.11, released 1 June 2007) and extends the Weka Explorer component. It runs on *nix platforms and requires Globus Toolkit 4 on both client and server nodes.

The development team is currently working on a new version that will include an extension of the Knowledge Flow component for grid-enabled data mining workflows, as well as support for running the client on any platform (including, for example, Microsoft Windows).

Weka4WS is partially funded by CoreGRID, an EU Network of Excellence on Peer-to-Peer and Grid technologies. Weka4WS is freely downloadable.

- Domenico Talia, University of Calabria, Italy


No votes yet


Post new comment

By submitting this form, you accept the Mollom privacy policy.