

Over the last few weeks I’ve been looking for a decent monitoring system that would monitor the 40 odd servers at work. Now anyone with even a small foot hold in the open source world has heard of nagios. Nagios is very open ended system that is very flexible but at the same time can be a behemoth to configure. Not to mention that it primarily depends on the SNMP protocol (yes it supports others as well) and there are a ton of plugins to chose from but hardly any clear documentation as everyone has *their* way of running Nagios.
So enter Zabbix. Another open source product that is also backed by the same company with a support contract. Easily to get support via the forums located on that website (Nagios does *not* have official support forums, but links to unofficial support forums) and the main product manager, Alexei, is very easy to contact. Not to mention that the product is stable and after you get the interface down (which is a blessing / hateful considering how entirely segregated the “configuration” section is from the “monitoring” section) its very easy to configure and setup. On the client side I can either do A. SNMP or B. an agentd that runs on the client machine. I have chosen B for all our Linux machines (Fedora and Debian). Easy to setup and it only requires two ports to open on the firewall (makes it easy for the corporate red tape).
It does everything I need: monitor logs, check processes (like httpd/apache, (x)inetd, mysqld, oracle, jboss), monitor critical files for any alterations (via checksum), monitor network bandwidth (out and in), process load (1m, 5m, 15m), memory usage (swap & physical) and uptime. The agent hardly produces any additional load on the machine nor is it a memory hog.
Currently its monitoring our internal machines (3 development, 1 proxy and itself) and it does a mighty fine job. Oh and load on the zabbix server machine. Well monitoring 5 machines is producing a 0.05 – 0.10 load. This is after upgrading the debian box from the 2.4.27 kernel to 2.6.17… when it was running the 2.4.27 kernel its load was 0.8 – 1.0. The thing that takes up the most load is mysqld (mysql 5.0.24) which isn’t too surprising since it does approximately 30,000 queries every hour for the 5 machines currently being monitored.















[...] So about a month ago I wrote about Zabbix. Well it has been a month and after getting the corporate firewall team in Hightstown, NJ to open up specific ports between the two data centers we got Zabbix rolled out to our production environments. So far so good, if I even stop the Apache process for 30 seconds I get paged on my phone. We have severity levels setup so for example if the partition is 80% full it’ll email us as opposed to send SMS messages to our phone. However if a ping fails 3 times in a row then it goes nuts and hits everyone with a SMS. [...]
[...] Zabbix now since beginning of October 2006 . I get quite a few hits from Google off of it including from places that raise my curiosity. In any [...]