|Image courtesy of the Washington State Department of Transportation.|
The standard system for transferring data online can’t handle the data-intensive needs of today’s larger scientific collaborations, resulting in delays and data loss. However, the GridFTP data transfer system is changing all that.
“There is a great need for bulk data movement in almost all areas of science,” said Raj Kettimuthu, technology coordinator of the GridFTP project at Argonne National Laboratory. “Collaborative science experiments have large volumes of data in their repositories, but the data must be distributed among researchers all across the world for analysis. With GridFTP, researchers can move the bulk data quickly and save a lot of time.”
GridFTP, which builds on the widely used File Transfer Protocol, transfers data 20 to 30 times faster than FTP, with speeds of up to 200 megabytes per second. It achieves the speedup by using parallel data movement, wherein large datasets can be broken into several smaller pieces that transfer simultaneously, speeding up the transfer even further. The system has the ability to utilize multiple computers at each end of the transfer, providing multiple levels of parallelism.
Another useful feature not available with FTP is that users who only need to transfer a small subset of a large file can select individual parts to send instead of the entire thing.
And any interrupted transfers can be restarted from that point instead of from the beginning.
Speed is not GridFTP’s only advantage; Kettimuthu and his team have also improved security and made the system more user-friendly. For example, GridFTP now supports security protocol Secure Shell (SSH) in addition to Grid Security Infrastructure (GSI). Obtaining certificates and setting up GSI proved difficult for the average user, and SSH is far simpler, said Kettimuthu.
Today, GridFTP has already been widely adopted in the scientific community. The number of transfers the program handles has doubled since last year, coming in at five million transfers a day and growing.
Developers are currently working on several upgrades to draw in even more users, including a user-friendly graphical interface. This new release of GridFTP, which is expected to come online later this year, will self-install the software so that users no longer have to figure out how to install it themselves. It will also make initiating transfers easier by allowing researchers to simply drag and drop files from a remote host to their local machine or between two remote hosts.
GridFTP developers are also working to create a hosted data transfer service so that in the future users won’t have to install, run, and monitor anything on their own systems. Users would access the hosted service through a web site and specify the files they need transferred, and the service would monitor the transfer send the user an email notification when it is complete.
“Now the focus is on making GridFTP really simple for the average user,” Kettimuthu said. “The hosted service would allow the transfer to run without the user having to babysit it, making things much easier.”
—Amelia Williamson, for iSGTW