Sometimes on older operating systems, rsync (first choice for copying files from one filesystem to another) may not be available. In such circumstances, you can use tar. If it’s an initial copy of a large amount of data you’re doing, then this may actually be 2 – 4 times faster due to the lack of rsync’s checksum calculations, although rsync would be faster for subsequent delta copies.
timex tar -cf – /src_dir | ( cd /dest_dir ; tar -xpf – )
Add a v to the tar -xpf command if you want to see a scrolling list of files as the files are copied but be aware that this will slow it down. I prefer to leave it out and just periodically ls -al /dest_dir in another terminal to check the files are being written correctly. timex at the front of the command will show you how long it ran for once it completes (may be useful to know).
With the lack of verbose output, if you need confirmation that the command is still running, use ps -fu user_name | grep timex although the originating terminal should not have returned a command prompt unless you backgrounded the process with an & upon execution, or CTRL Z, jobs, bg job_id subsequently. Note that backgrounding the process may hinder your collection of timings so is not recommended if you are timing the operation.
Another alternative would be to pipe the contents of find . -depth into cpio -p thus using cpio’s passthru mode…
timex find . -depth | cpio -pamVd /destination_dir
Note that this command can appear to take a little while to start, before printing a single dot to the screen per file copied (the capital V verbose option as opposed to the lowercase v option)
If you wish to copy data from one block storage device to another, it’d be faster to do it at block level rather than file level. To do this, ensure the filesystems are unmounted, then use the dd command dd if=/dev/src_device of=/dev/dest_device
Do not use dd on mounted filesystems. You will corrupt the data.
Overall progress can be monitored throughout the long copy process with df -h in a separate command window. prepending the cpio command with timex will not yield any times once the command has completed – but it is faster than both tar or rsync for initial large copies of data.
To perform a subsequent catch-up copy of new or changed files, simultaneously deleting any files from the Destination that no longer exist on the Source for a true “syncronisation” of the two sides, much like a mirror synchronisation, use…
timex ./rsync -qazu –delete /src_dir/* /dest_dir
Note this will not include hidden files. To do that, lose the * off the source fs and add a trailing slash to the destination fs
or to catch up the new contents on the Src side to the Dest side and not delete any files on the Dest side that have been deleted on Src, use
rsync -azu –progress /NFS_Src/* /NFS_Dest
a= archive mode; equals –rlptgoD (recursive, links, permissions, times, group, owner and device files preserved)
z = compress file during transfer (optional but generally best practice)
u = update
–progress in place of v (verbose) or q (quiet). A touch faster and more meaningful than a scrolling list of files going up the screen.
Thanks Matt,
a very helpful article, I like the graph!
Alan
I try to keep certain posts succinct. It makes the point. Like I mentioned, let’s not forget that although rsync is slowest, it is the best, and can be used in parallel on each of the top level folders in any one filesystem to effectively “speed it up”. Just pay attention to the ownership and permissions of those top level folders after each re-run as I’ve observed them being reset in the past. Nothing a bit of chmod and chown won’t sort though.