Before starting to work in a real company,  I didn’t know anything about devices monitoring. When I arrived there the most impressing thing was how much the Team cares about knowing at every time how was the status of all the devices we need to manage. For that reason there were already installed, and properly configured, the most “popular” monitoring softwares like Nagios, Cacti & SmokePing. All of them are complementary each other, with the exception of Nagios that with 3rd part plugins can become more powerful than as it is out of the box.

In order to avoid a continuos switch from these 3 tools, we start a project  for introducing a pretty new  monitoring software called OpenNMS: our original purpose was to use a unique software for monitoring the infrastructure.  During the past 6 months I focused completely into customising and learning this software and I would like to express my experience about it…

Is SNMP service enabled?

The first approach for all the people that are new to the “World” of NMS (Network management systems) is to learn SNMP protocol, or at least have an idea of which kind of protocol is it and how debug the correct functioning of devices connected to your network. The debugging part is the most important, from my point of view, because once you know that the SNMP queries are properly answered, you are sure that you’ll be able to monitor correctly your system with OpenNMS (and for sure even with the other softwares). For that reason the first thing to do is to enable SNMP Service on your devices (Firewall, Switches, Servers, Routers, Sensors, etc…) and be sure that “snmpwalking” works. snmpwalking is a command runnable from Linux/Windows servers after installing the Net-SNMP software. This software is the same used from OpenNMS to query managed devices and collect the answers concerning status and “performance”.

#snmpwalk -v 2c -c community xxx.xxx.xxx.xxx

This is a typical snmpwalk command that can be launched on a shell or on a MS command-line. As you can see from other websites or man pages, the argument -v regards the version of SNMP used from the host you would like to  snmpwalk, and the -c argument is a community name (like a password) that the monitoring server has to know before.

If snmpwalking doesn’t work? No Response from xxx.xxx.xxx.xxx

  1. SNMP Service is not enabled on the device (check the snmpd.conf file before doing everything)
  2. community name isn’t correct (check on the device and set it correctly)
  3. version argument (-v) isn’t correct. Launch the command with -v 1 if not working with -v 2c
  4. firewall is blocking your SNMP packages

Of course the last issue is the worst one. You’ll have to find out if your SNMP requests are properly delivered to the host you’d like to monitor. This issue has wasted lots of my time and troubleshooting it is not so easy. The biggest problem is that before understanding what’s the problem and how to solve it, you’ll need to learn even how your network is designed. That’s interesting and will guide you through the discover of “all devices”, but will require a big effort. After configuring everything properly you’ll be able to get answers like the one on the left: if you can’t understand what they mean, just think that every string is like an information that OpenNMS will be able to use for doing its job. Well to be honest sometimes you’ll have to teach OpenNMS what some strings means. But that something you’ll do in the future.

OpenNMS is running. Monitor the world.

After this introduction to a primary aspect as SNMP service,  you’re able to start monitoring seriously whatever you want. Suppose OpenNMS is installed correctly and is running smoothly (if not there’re many guides that can help you, like the official one) you’ll have a Web UI that allow you to manage all the features provided by the software. You can create users, notifications, scheduled outages and many other things, but as this will require tons of pages to describe it, the first think I would like to say is a suggestion that regards the method of adding new hosts.

  1. Avoid “Add Node” link from the HomePage on the WebUI
  2. Use exclusively the Manage Provisioning Requisition feature (Admin->Manage Provisioning Requisitions-> YourNameRequisition)

This suggestion is necessary because after several mistakes I figured out how OpenNMS it was working. With this practise you’ll be sure that OpenNMS will add the node but please have patience, it’s not immediate. It will schedule the import and in some minutes (from 5 to 15) you will be able to monitor the UP/DOWN status of the ICMP protocol and all the other services that you defined. To summarise edit the Provisioning Requisition file (via Web UI or edit the xml file), and then press synchronise on the upper level (look at the picture below). At first I ran into the error to change systematically the provisioning file every 2/3 minutes because OpenNMS was not reacting to my changes and to my “synchronisations”. When you start to use OpenNMS you have to be patient, take your time and relax, it will do it.

If you’re running the software on a Debian environment, the provisioning file remember is located on

/etc/opennms/imports/nameyouwant.xml

Here there are listed all the nodes that OpenNMS will have to monitor: as you can see OpenNMS distinghuish from node on the database and node defined. Take a look at the picture below, if there are more nodes defined than the ones database, means that in few minutes the new nodes are going to be imported.

This advice avoid you wastes of time to understand why the “Add Node” button is so useless and why the Provisioning Requisition xml files doesn’t shows your new node added from the HomePage dedicated button.

Delete what is not needed

You’ll realise that monitoring devices costs lots of effort in terms of bandwidth and also CPU/RAM & I/O Disk performance: our monitoring server is a dedicated machine that has a discrete amount of memory/procs and disks throughput. For that reason when monitoring a device is no more needed, delete it!

Here a well tested procedure is to navigate on the Admin-> Delete Nodes and search for the interested devices. If you don’t need to preserve the RRD data (the performance collected in the past) you’ll better check “Delete ?” and “Data ?”, otherwise just check the first option.

Important..

Once you’ve done it, delete it also from the Provisioning Requisition file: if you don’t do it, every time you’ll do a synchronisation for importing new devices, OpenNMS imports even the old nodes that previously you thought to be removed.