Technology - Weka4WS: distributed data mining using web services
Released in June 2007, Weka4WS is a new tool designed to open the way for worldwide use of data mining services.
The original Weka provides a large collection of machine learning algorithms, written in Java, for data pre-processing, classification, clustering, association rules and visualization, which can be invoked through a common Graphical User Interface.
In Weka, the overall data mining process takes place on a single machine, since the algorithms can be executed only locally. Weka4WS extends Weka to support remote grid execution of the data mining algorithms through web services—hence the 4WS.
In this way, distributed data mining algorithms for classification, clustering and association rules can be concurrently executed on decentralized grid nodes.
To enable remote invocation, all the data mining algorithms provided by the Weka library are exposed as a web service, which can be easily deployed on available grid nodes.
Thus, Weka4WS also extends the Weka GUI to enable the invocation of the data mining algorithms that are exposed as web services on remote grid nodes.
To achieve integration and interoperability with standard grid environments, Weka4WS has been designed by using the Web Services Resource Framework as an enabling technology.
In particular, Weka4WS has been developed by using the WSRF Java library provided by Globus Toolkit 4.
The current version of Weka4WS (1.0, released 7 June 2007), is based on the latest version of Weka (3.4.11, released 1 June 2007) and extends the Weka Explorer component. It runs on *nix platforms and requires Globus Toolkit 4 on both client and server nodes.
The development team is currently working on a new version that will include an extension of the Knowledge Flow component for grid-enabled data mining workflows, as well as support for running the client on any platform (including, for example, Microsoft Windows).