Desktop distro’s have wonderful graphical disk space analysis programs such as Baobab (KDirStat), QDirStat, xdiskusage, duc, JDiskReport and with your desktop distro being connected to the internet, even if you dont already have them installed, installing them from your repositories is easy. You can quickly drill down using these treemapper programs and find the culprit for filling your disk up.
In the datacentre, things are never so easy. You have no internet access, and no local repository configured, and even if you did, you have no change control to install it on a live system, and even if you did, no GUI to view it. All you have is a production problem, a stressed out ops manager and a flashing cursor winking at you -oh and native tools.
Sure, you can use the find command to go looking for files over a certain size,
find ./ -type f -size +1000000M -exec ls -al {} \;
removing a zero and re-running as required until it starts finding something, but you’ll fight with the find command syntax for 15 minutes trying to get it to work, only to be unconvinced of the results. As good as find is, it’s not exactly easy trying to put together a command that does something that should be simple.
Here is a much simpler solution. Just use du. In particular…
du -h –max-depth=1
This will summarize the size of the top level sub-directories underneath your present working directory. You then cd in to the biggest one, run it again and repeat until you basically end up digging down and arriving at the largest file on disk – in my case a 32GB mysql database in /var/lib/mysql/zabbix.
So there you go. Have a play with it and you’ll see what I mean. It’s my favourite way of finding out what’s eating all my disk space.
Using QDIRSTAT on headless servers
We live in strange times, where despite the best efforts of the likes of Edward Snowden to open our eyes to the fact that we’re being monitored at any and every opportunity by the intelligence community, we’re still hell bent on moving our enterprise computing into huge corporate cloud data centres that the CIA and NSA have back doors into. If you think “That’s OK, I have nothing to hide.” then great. How ’bout you hand me your phone and let me go and have a good look around it? Oh, that’s not OK? Well make your mind up, will you? You think you’re gonna be as successful as Google and Amazon if you use their cloud services? Whose cloud service do you think they use? That’s right, their own. So your Cloud is their On Prem. I know, I’m such a cynic.
For those who are tasked with monitoring disk space consumption on their cloud servers, containers, headless stuff, you can use a neat little qdirstat cache file writer to generate a cache file that you can then open in qdirstat on your workstation for analysis.
I’ve summarised its use below, assuming you’ll understand what each command is doing.
ssh myserver
sudo cd /usr/local/bin
sudo wget https://github.com/shundhammer/qdirstat/raw/master/scripts/qdirstat-cache-writer
sudo chmod +x qdirstat-cache-writer
sudo qdirstat-cache-writer / ~/myserver-root.cache.gz
exit
scp "root@myserver:~/*.cache.gz" ~/
sudo chmod 777 ~/*.cache.gz
sudo apt-get install qdirstat
qdirstat --cache ~/myserver-root.cache.gz
I’d like to issue a special thanks to Mike Schlegel in the comments section below for dragging me kicking and screaming into the 21st Century. I guess there’s still some of us out there who are clever enough to be working with Linux but stupid enough that we didn’t buy Bitcoin at 10$ back in 2012 when I started this blog.
Time to get into the 21st century now that it’s already 20 years old:
https://github.com/shundhammer/qdirstat/blob/master/doc/QDirStat-for-Servers.md
Well done for pointing that out Mike.