07. October 2013 2 min read
Linux CPU load - htop load averages
I will present some nice tools which can help you define and locate the source of congestion and that high CPU load. I will also explain the htop load averages, why they are there, what can you do with them and what I think are best limits to base your reactions on.
First of all you should monitor your server or computer load and what is even more important you should log it. Mostly logs are the best weapon to eliminate all your bugs as well as most reliable tool which you have to locate and plug the leaks, may that be memory leaks or CPU load leaks. All this are mostly not instant enough to show in that few minutes you are watching your computer, but might as well show in load averages in htop (or top or even iotop). The "top" line of programs are a great tools to determine CPU, disk and memory loads on your computer or server. Even `uptime` command has CPU load averages included in it where first number is average load over 1 minute, second number is 5 minute load and third number is 15 minute load. This also applies for htop numbers.
Now this numbers may range from 0.0 to 1.0 for EACH CORE. With old computers with 1 processor that means if you see 1.0, you will basically have trouble running anything as CPU will be under constant stress. With more COREs you can expect this number to be number of cores higher.
Reasonable limits which should be set on 5 or 15 minute numbers are 70% load (so 0.7 * number of cores). This should be first indication something is going on, on your server and you better start investigating, while email or something should be sent out once we approach 1.0*number of cores, to save you computer with reboot.