Every year there’s a technology that really stands out and impresses me more than any other. This year, that technology is Data Domain. In short it’s like a NAS device, but performs de-duplication of common segments of incoming data in real time, massively reducing how many truly unique blocks of data are ultimately written down to the disks.
When used in conjunction with Networker backup solution, the overhead of de-duplicating all the incoming backup data is offloaded to…
Data Domain can also be used as a virtual tape library more on this in my Networker Cheatsheet here http://www.cyberfella.co.uk/2012/08/28/emc-networker/
…a process DDBoost running on the Networker Storage Node receiving the data from the client over the network prior to forwarding on the partially de-duplicated data stream to the DataDomain device. DDBoost on the storage node performs the first 3 of the 5 stages of de-duplication, with the final 2 stages performed in real-time (as mentioned above) on the DataDomain device itself.
The overhead of all this de-duplication turns out to be insignificant too as it is ultimately less than the overhead of backing up all the data being received from the client in the first place. Indexing of the backup data occurs at the Networker Backup and Recover Server, usually a separate server to the Storage Node which is blissfully unaware of the fact that the data passing through the storage node on it’s way to the “volume” is being de-duplicated.
You may think that all this makes for fast backups but will make for slow recoveries when the de-duplicated data has to be reconstituted, but in tests, recoveries are very fast, possibly due to the reduced number of seek times reading from the hard disks?. It’s damn impressive anyhow. Here’s a real example of just how impressive over 120 days of backing up many Terabytes of data daily…
Yes, you read it right. It makes for a huge 96.9% reduction in the data ultimately being written to disk as it turns 43Tb of data into just 1Tb on disk. Like I said, that’s impressive, and should make you realise how expensive and inefficient it is to carry on storing many copies of the same / static data on your expensive SANs as you can use a DataDomain for NFS and CIFS shares too.