I recently witnessed a huge 40% increase in disk space used moving some files from one SAN to another despite the number of files being the same on both the source and destination filesystems/SANs.
The reason this happens is primarily down to block size.
Ordinarily you wouldn’t notice much difference as all files are different sizes and most files are bigger than the minimum block size of the filesystem, so although some space always gets wasted as the last bit of data is written to the last block before starting the next file in a new block, the amount of wasted space shouldn’t be hugely different between filesystems.
But, what if the files are all roughly the same size, and that size is smaller than the minimum block size, i.e. you’re writing millions of 4Kb files into a filesystem that has an 8Kb block size? Ouch is the answer, as over half of your filesystem capacity will be wasted.
The answer is to make sure that you know a bit more about the data that’s going to be written to the destination filesystem than just it’s top level capacity (returned by a command such as df -h, which actually tells you how many blocks of data are used on the entire filesystem multiplied by the block size to give a more humanly meaningful answer in KB/MB/GB instead of number of just blocks occupied).