Moving your data – It's not always pretty
Moving Day
8. bbFTP
Although bbFTP [23] sounds like it's related to BBCP, it's really not. BBCP was developed at SLAC [24], and bbFTP was developed at IN2P3. [25]. bbFTP is something like FTP, but it uses its own transfer protocol optimized for large files (greater than 2GB). Like BBCP, it works with multiple streams and has compression and some security features. One version is even firewall and NAT friendly [26].
However, bbFTP does not appear to have a way to retain file attributes, including ownership, mode, timestamps, and xattr data.
It is just a simple FTP tool designed for high file transfer rates over FTP (and that's not a bad thing – it just might not be the best option for data migration).
9. GridFTP
Probably the most popular FTP toolkit for transferring data files between hosts is GridFTP [27], which is part of the Globus Toolkit and is designed for transferring data over a WAN. Recall that Globus Toolkit is designed for computing grids, which can comprise systems at distances from one another.
GridFTP was designed to be a standard way to move data across grids.Additionally GridFTP has some unique features that work well for moving data:
- Security – Uses GSI to provide security and authentication.
- Parallel and striped transfer – Improves performance by using multiple simultaneous TCP streams to transfer data.
- Partial file transfer – Allows resumption of interrupted downloads, unlike normal FTP.
- Fault tolerance and restart – Allows interrupted data flows to be restarted, even automatically.
- Automatic TCP optimization – Adjusts the network window and buffer sizes to improve performance, reliability, or both.
Using GridFTP can be challenging because you have to use a number of pieces of Globus on both the old storage and the new storage. However, a version of GridFTP, GridFTP-Lite, replaces GSI with SSH.
This makes things a little easier because the security features might not be needed for data migration outside the data center, just within.
I have not tested GridFTP, so I'm not sure how well it would work for data migration. One concern I have is that if migration of attributes is important, including ownership, timestamps, and xattr data, then GridFTP might not be the best tool.
10. Aspera
Up to this point, I've focused on open source tools for data migration, but I think one commercial tool is worthy of mention. Aspera [28] has some very powerful software that many people have talked about using to transfer data. They use an algorithm called fasp
to transfer data over TCP networks. I've talked to people who said they can transfer data faster than wire speeds, probably because of data compression. Overall, these people are very impressed with Aspera's performance.
Aspera offers synchronization [29] with their tools. According to the website, the tools can "preserve file attributes such as permissions, access times, ownership, etc." I don't know if this means xattr data as well, but at least they can deal with POSIX attributes.