When monitoring Windows systems with open source monitoring software, such as Nagios, Naemon or Icinga, one of the widest used solution is NSClient++ (or nscp in short). NSClient++ has been around for a long time; the first public version NSClient++ 0.0.1 (RC5) was released in 2005 (Initial SourceForge release) and has since become de facto standard for monitoring Windows hosts with open source monitoring tools. But in recent years the monitoring eco-system has changed and it's time to adjust.
"Why? What happened?" You might ask. This is what this article is about.
NSClient++, which is short for Nagios Service (Check) Client with the ++ hinting to its programming language C++, was initially developed as a Nagios agent for remote check execution. As the Nagios toolsets, including NSCA (Nagios Service Check Acceptor for submitting passive check results to the Nagios core server) and NRPE (Nagios Remote Plugin Executor for accepting active check requests from the Nagios core server and send back the results), were initially only developed for Unix-based and Linux Operating Systems, NSClient++ bridged the gap and allowed to run service checks on Windows hosts, too.
One of the most common ways to integrate Windows hosts is to use NRPE: The NRPE client (the check_nrpe plugin on the monitoring server) talks to the NRPE Server (which is part of the NSClient++ installation on the Windows host) and tells the server which check command should be executed. NSClient++ then internally runs the check (e.g. check_cpu) and returns the result as response.
This has worked fine for many years - however first a dead and then an evolved NRPE project has caused several problems with NSClient++. Let's dig deeper into these problems.
One of the major problems of NRPE was that the project was stuck at version 2.15 for a long time. NRPE 2.15 was released in September 2013 with no news whatsoever that a newer version would ever be released. NRPE was believed dead and even NSClient++ was referring to NRPE as legacy and insecure.
But then in 2016, NRPE re-emerged with new life and with a new project maintainer (John Frickson). NRPE was released as version 3.0.0 and a lot of changes happened between 2.15 and 3.0.0.
From the official NRPE 3.0.0 release post:
Not only was security addressed, which NRPE was highly criticized for, but also another major problem was tackled: The "payload size". The payload is the size of the response from the NRPE server (NSClient++ on Windows), also known as the packet buffer length or packet size (https://support.nagios.com/kb/article/nrpe-packet-size-explained-518.html). Before NRPE 3.0, the payload size was statically defined to a size of 1024. Responses holding a lot of data (including performance data) would be too big to handle and the check_nrpe plugin would just show "CHECK_NRPE: Receive underflow - only 1026 bytes received (4 expected)" or something similar. With the added -P / --payload-size parameter in NRPE and the possibility to set a higher payload size in NSClient++ using the "payload length" configuration parameter, large responses and lists (e.g. listing all services) can be handled.
Not long after NRPE 3.0.0 was released, development of NRPE picked up speed and more releases came out. In 2019, NRPE 4.0.0 was released and added support for TLS 1.3 and more fixes and enhancements. However in January 2020, a "deprecation notice" was added to the NRPE repository, indicating that the development of NRPE is now officially over.
Only security fixes would be implemented in potential future releases. This basically means: NRPE is dead - again.
As mentioned above, the changes in NRPE 3.0.0 also introduced a higher DH key to improve encryption security (general rule of thumb: everything below 2048 bits is considered weak security). NSClient++ still uses a 512 byte key (nrpe_dh_512.pem) for NRPE communication encryption and can be found in the NSClient++ installation path in the security directory (usually C:\Program Files\NSClient++\security\nrpe_dh_512.pem).
This now causes communication problems when trying to establish a secure connection between the updated client (check_nrpe) and the server (NSClient++):
$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com
CHECK_NRPE: (ssl_err != 5) Error - Could not complete SSL handshake with monitoring: 1
To handle this, a new DH key with 2048 byte size should be created and configured in NSClient++. See this article how to handle NRPE to NSClient++ communication error could not complete SSL handshake.
Additionally to the encryption changes, NRPE 3.x and 4.x both use the newer NRPE v3 protocol in the background. However NSClient++ never implemented NRPE v3 and only accepts the NRPE v2 protocol (named as legacy in NSClient++ documentations). Luckily the newer NRPE versions do support both NRPE v3 and v2 protocols. When running check_nrpe 3.x or 4.x against NSClient++ without additional parameters, first the newer NRPE v3 protocol is tried. The check plugin then detects that the server response uses NRPE v2 (Invalid packet version received from server) and automatically fails back to the legacy v2 protocol:
$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com
CHECK_NRPE: Invalid packet version received from server.
I (0.5.2.35 2018-01-28) seem to be doing fine...
To force the check_nrpe plugin to use the legacy protocol from the beginning, a new parameter -2 (introduced in NRPE 3.0.0) can be used:
$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com -2
I (0.5.2.35 2018-01-28) seem to be doing fine...
This way the warning "invalid packet version received from server" in the first line of the output is omitted - which would cause problems in the central monitoring user interface (usually a monitoring plugin's first line of output is shown in the monitoring UI).
NRPE has one major problem: It has a default payload size of 1024 bytes. Responses from NSClient++, which exceed this limit, result in an error. A good example to see this is to use the check_wmi check command and list all installed services on the target Windows host:
$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com -2 -c check_wmi -a 'query=select Name from Win32_Service'
CHECK_NRPE: Invalid packet type received from server.
Instead of cutting the response, the client (check_nrpe) receives an immediate error "Invalid packet type received from server".
Note: This is not the same error message as "invalid packet version"!
To handle this, the new payload size parameter (-P / --payload-size) can be used. Important here is to understand that this parameter only works in conjunction with the -2 parameter, using the legacy NRPE v2 protocol. NRPE v3 does support a payload size up to 64KB (see NRPE 3.0.0 changelog above) but as mentioned, NSClient++ does not support NRPE v3.
The payload size needs to be defined on both sides: On the client side in the check_nrpe command, and on server side the payload length needs to be defined in nsclient.ini within the NRPE/server block:
[/settings/NRPE/server]
; Extended Payload
payload length=4096
In this case the server (NSClient++) is configured and restarted with a new payload length of 4KB/4096 Bytes. By using this exact same size with the check_nrpe plugin, the big server response can now be handled:
$ /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com --payload-size=4096 -2 -c check_wmi -a "query=select Name from Win32_Service"
AdobeARMservice, AJRouter, ALG, AppIDSvc, Appinfo, AppMgmt, AppReadiness, AppVClient, AppXSvc, AssignedAccessManagerSvc, AudioEndpointBuilder, Audiosrv, autotimesvc, AxInstSV, BDESVC, BFE, BITS, BrokerInfrastructure, Browser, BTAGService, BthAvctpSvc, bthserv, camsvc, CDPSvc, CertPropSvc, ClickToRunSvc, ClipSVC, COMSysApp, CoreMessagingRegistrar, CryptSvc, CscService, DcomLaunch, defragsvc, DeviceAssociationService, DeviceInstall, DevQueryBroker, Dhcp, diagnosticshub.standardcollector.service, diagsvc, DiagTrack, DialogBlockingService, DispBrokerDesktopSvc, DisplayEnhancementService, DmEnrollmentSvc, dmwappushservice, Dnscache, DoSvc, dot3svc, DPS, DsmSvc, DsSvc, DusmSvc, Eaphost, edgeupdate, edgeupdatem, EFS, embeddedmode, EntAppSvc, EPWD, EventLog, EventSystem, Fax, fdPHost, FDResPub, fhsvc, FontCache, FrameServer, GoogleChromeElevationService, gpsvc, GraphicsPerfSvc, gupdate, gupdatem, hidserv, HvHost, icssvc, IKEEXT, InstallService, iphlpsvc, IpxlatCfgSvc, KeyIso, KtmRm, LanmanServer, LanmanWorkstation, lfsvc, LicenseManager, lltdsvc, lmhosts, LSM, LxpSvc, MapsBroker, MicrosoftEdgeElevationService, MixedRealityOpenXRSvc, MozillaMaintenance, mpssvc, MSDTC, MSiSCSI, msiserver, MsKeyboardFilter, NaturalAuthentication, NcaSvc, NcbService, NcdAutoSetup, Netlogon, Netman, netprofm, NetSetupSvc, NetTcpPortSharing, NgcCtnrSvc, NgcSvc, NlaSvc, nscp, nsi, ose, p2pimsvc, p2psvc, PcaSvc, PeerDistSvc, perceptionsimulation, PerfHost, PhoneSvc, pla, PlugPlay, PNRPAutoReg, PNRPsvc, PolicyAgent, Power, PrintNotify, ProfSvc, PushToInstall, QWAVE, RasAuto, RasMan, RemoteAccess, RemoteRegistry, RetailDemo, RmSvc, RpcEptMapper, RpcLocator, RpcSs, SamSs, SCardSvr, ScDeviceEnum, Schedule, SCPolicySvc, SDRSVC, seclogon, SecurityHealthService, SEMgrSvc, SENS, Sense, SensorDataService, SensorService, SensrSvc, SessionEnv, SgrmBroker, SharedAccess, SharedRealitySvc, ShellHWDetection, shpamsvc, smphost, SmsRouter, SNMPTRAP, spectrum, Spooler, sppsvc, SSDPSRV, ssh-agent, SstpSvc, StateRepository, stisvc, StorSvc, svsvc, swprv, SysMain, SystemEventsBroker, TabletInputService, TapiSrv, TermService, Themes, TieringEngineService, TimeBrokerSvc, TokenBroker, TracSrvWrapper, TrkWks, TroubleshootingSvc, TrustedInstaller, tzautoupdate, UevAgentService, uhssvc, UmRdpService, upnphost, UserManager, UsoSvc, VacSvc, VaultSvc, vds, VGAuthService, vm3dservice, vmicguestinterface, vmicheartbeat, vmickvpexchange, vmicrdv, vmicshutdown, vmictimesync, vmicvmsession, vmicvss, VMTools, vmvss, VSS, W32Time, WaaSMedicSvc, WalletService, WarpJITSvc, wbengine, WbioSrvc, Wcmsvc, wcncsvc, WdiServiceHost, WdiSystemHost, WdNisSvc, WebClient, Wecsvc, WEPHOSTSVC, wercplsupport, WerSvc, WFDSConMgrSvc, WiaRpc, WinDefend, WinHttpAutoProxySvc, Winmgmt, WinRM, wisvc, WlanSvc, wlidsvc, wlpasvc, WManSvc, wmiApSrv, WMPNetworkSvc, workfolderssvc, WpcMonSvc, WPDBusEnum, WpnService, wscsvc, WSearch, wuauserv, WwanSvc, XblAuthManager, XblGameSave, XboxGipSvc, XboxNetApiSvc, AarSvc_6f2b5, BcastDVRUserService_6f2b5, BluetoothUserService_6f2b5, CaptureService_6f2b5, cbdhsvc_6f2b5, CDPUserSvc_6f2b5, ConsentUxUserSvc_6f2b5, CredentialEnrollmentManagerUserSvc_6f2b5, DeviceAssociationBrokerSvc_6f2b5, DevicePickerUserSvc_6f2b5, DevicesFlowUserSvc_6f2b5, MessagingService_6f2b5, OneSyncSvc_6f2b5, PimIndexMaintenanceSvc_6f2b5, PrintWorkflowUserSvc_6f2b5, UdkUserSvc_6f2b5, UnistoreSvc_6f2b5, UserDataSvc_6f2b5, WpnUserService_6f2b5
Of course the question arises, why NRPE v3 was never implemented in NSClient++. Actually NSClient++ implemented its own REST API and clearly focused development on the API instead of running behind the NRPE developers. The API can be started by enabling the "WEBServer" module in nsclient.ini:
; Modules
[/modules]
CheckExternalScripts = 1
CheckHelpers = 1
CheckNSCP = 1
CheckDisk = 1
CheckSystem = 1
CheckWMI = 1
NSClientServer = 1
CheckEventLog = 1
NSCAClient = 1
NRPEServer = 1
CheckLogFile = 1
SimpleFileWriter = 1
SimpleCache = 1
WEBServer = 1
Similar to the NRPE server settings section, there is also a WEB server setting:
# Section for WEB (WEBServer.dll) (check_WEB) protocol options.
[/settings/WEB/server]
allowed hosts=127.0.0.1,192.168.15.0/24
cache allowed hosts=true
certificate=${certificate-path}/certificate.pem
port=8443
threads=10
This config snippet allows requests from localhost (127.0.0.1) and from the range 192.168.15.0/24 to access the NSClient++ web server/API. By default the web server listens on port 8443. A password can be defined in nsclient.ini - either in the [/settings/default] or in the [/settings/WEB/server] section.
Note: If the password is defined in the [/settings/default] section of nsclient.ini, the password is applied to all check types (using NSClient++, NRPE, API).
This module also enables a password protected user interface which can be used to check the current system usage and the configuration of NSClient++ but also to manually execute checks. This is a great helper for troubleshooting!
When switching to the "Queries" tab, a list of predefined checks can be selected.
Do they look familiar? Actually they should. These are the exact same (internal) NSClient++ commands which can be executed with NRPE. After a check command is selected, the check can be executed in the "Run" tab. Here the result of a simple "check_drivesize" check:
One of the tricky things with NSClient++ is always to find the exact methods of setting filters and thresholds. But the input field is actually a great helper, which automatically shows additional arguments to the selected check:
The check commands can therefore be tested directly by applying attributes to the checks. Here the warning threshold was manually lowered to 50% disk usage (default is 80%):
Of course these checks should now be executed via NSClient++'s API and not via the user interface. Basically the URL for the checks is the same as when visiting the user interface, followed by /query/check_command?arguments. Special characters (such as space or percent sign) need to be URL-encoded. Meaning: A space becomes %20 and a percent becomes %25.
The following curl command does the same check as manually executed in the UI before, with a manual warning threshold at 50% disk usage. Note that the password to access the web server/api (defined in nsclient.ini in the default section) is submitted as additional HTTP header:
# curl -k -s -H 'password: 1234' 'https://windowshost.example.com:8443/query/check_drivesize?warning=used>50%25' | python -m json.tool
{
"header": {
"source_id": ""
},
"payload": [
{
"command": "check_drivesize",
"lines": [
{
"message": "WARNING C:\\: 31.287GB/58.918GB used",
"perf": [
{
"alias": "C:\\ used",
"float_value": {
"critical": 53.02614784240723,
"maximum": 58.91794204711914,
"minimum": 0.0,
"unit": "GB",
"value": 31.28728485107422,
"warning": 29.45897102355957
}
},
{
"alias": "C:\\ used %",
"float_value": {
"critical": 90.0,
"maximum": 100.0,
"minimum": 0.0,
"unit": "%",
"value": 53.0,
"warning": 50.0
}
},
{
"alias": "D:\\ used",
"float_value": {
"critical": 0.0,
"maximum": 0.0,
"minimum": 0.0,
"unit": "B",
"value": 0.0,
"warning": 0.0
}
}
]
}
],
"result": "WARNING"
}
]
}
This shows the same response as seen in the user interface, however in a JSON format which is easily parse-able. But does that mean the wheel needs to be invented again and a monitoring plugin such as check_nrpe needs to be written first? Luckily not, as there are two already existing monitoring plugins which do the job!
The monitoring plugin check_nscp_api is part of Icinga 2 but can also be manually compiled from the source code. Users having installed icinga2 packages should be able to find the plugin check_nscp_api in the default monitoring plugins path (/usr/lib/nagios/plugins). The plugin is installed through the icinga2-bin package:
root@icinga2:~# dpkg -S /usr/lib/nagios/plugins/check_nscp_api
icinga2-bin: /usr/lib/nagios/plugins/check_nscp_api
An alternative would be the check_nsc_web monitoring plugin.
Using the information from the NSClient++ user interface and API, the same check (check_drivesize) is used with the same attributes (warning=used>50%) again. Using the plugin even makes it easier as the arguments are placed into a string and therefore does not require any URL encoding:
# /usr/lib/nagios/plugins/check_nscp_api -H windowshost.example.com -P 8443 --password 1234 -q check_drivesize -a "warning=used>50%"
check_drivesize WARNING C:\: 31.288GB/58.918GB used | 'C:\ used'=31.288002GB;29.458971;53.026148;0;58.917942 'C:\ used %'=53%;50;90;0;100 'D:\ used'=0B;0;0;0;0
Great! The plugin correctly returns the WARNING status for drive C: as disk usage is above 50% - and performance data is also shown.
As written above, one of the NRPE v2 problems is the response size from the server (payload length/size). In the example above, listing the services using the check_wmi command did not work without manually increasing the NRPE payload. What about the API? Are there any limitations?
# /usr/lib/nagios/plugins/check_nscp_api -H windowshost.example.com -P 8443 --password 1234 -q check_wmi -a "query=select Name from Win32_Service"
check_wmi AdobeARMservice, AJRouter, ALG, AppIDSvc, Appinfo, AppMgmt, AppReadiness, AppVClient, AppXSvc, AssignedAccessManagerSvc, AudioEndpointBuilder, Audiosrv, autotimesvc, AxInstSV, BDESVC, BFE, BITS, BrokerInfrastructure, Browser, BTAGService, BthAvctpSvc, bthserv, camsvc, CDPSvc, CertPropSvc, ClickToRunSvc, ClipSVC, COMSysApp, CoreMessagingRegistrar, CryptSvc, CscService, DcomLaunch, defragsvc, DeviceAssociationService, DeviceInstall, DevQueryBroker, Dhcp, diagnosticshub.standardcollector.service, diagsvc, DiagTrack, DialogBlockingService, DispBrokerDesktopSvc, DisplayEnhancementService, DmEnrollmentSvc, dmwappushservice, Dnscache, DoSvc, dot3svc, DPS, DsmSvc, DsSvc, DusmSvc, Eaphost, edgeupdate, edgeupdatem, EFS, embeddedmode, EntAppSvc, EPWD, EventLog, EventSystem, Fax, fdPHost, FDResPub, fhsvc, FontCache, FrameServer, GoogleChromeElevationService, gpsvc, GraphicsPerfSvc, gupdate, gupdatem, hidserv, HvHost, icssvc, IKEEXT, InstallService, iphlpsvc, IpxlatCfgSvc, KeyIso, KtmRm, LanmanServer, LanmanWorkstation, lfsvc, LicenseManager, lltdsvc, lmhosts, LSM, LxpSvc, MapsBroker, MicrosoftEdgeElevationService, MixedRealityOpenXRSvc, MozillaMaintenance, mpssvc, MSDTC, MSiSCSI, msiserver, MsKeyboardFilter, NaturalAuthentication, NcaSvc, NcbService, NcdAutoSetup, Netlogon, Netman, netprofm, NetSetupSvc, NetTcpPortSharing, NgcCtnrSvc, NgcSvc, NlaSvc, nscp, nsi, ose, p2pimsvc, p2psvc, PcaSvc, PeerDistSvc, perceptionsimulation, PerfHost, PhoneSvc, pla, PlugPlay, PNRPAutoReg, PNRPsvc, PolicyAgent, Power, PrintNotify, ProfSvc, PushToInstall, QWAVE, RasAuto, RasMan, RemoteAccess, RemoteRegistry, RetailDemo, RmSvc, RpcEptMapper, RpcLocator, RpcSs, SamSs, SCardSvr, ScDeviceEnum, Schedule, SCPolicySvc, SDRSVC, seclogon, SecurityHealthService, SEMgrSvc, SENS, Sense, SensorDataService, SensorService, SensrSvc, SessionEnv, SgrmBroker, SharedAccess, SharedRealitySvc, ShellHWDetection, shpamsvc, smphost, SmsRouter, SNMPTRAP, spectrum, Spooler, sppsvc, SSDPSRV, ssh-agent, SstpSvc, StateRepository, stisvc, StorSvc, svsvc, swprv, SysMain, SystemEventsBroker, TabletInputService, TapiSrv, TermService, Themes, TieringEngineService, TimeBrokerSvc, TokenBroker, TracSrvWrapper, TrkWks, TroubleshootingSvc, TrustedInstaller, tzautoupdate, UevAgentService, uhssvc, UmRdpService, upnphost, UserManager, UsoSvc, VacSvc, VaultSvc, vds, VGAuthService, vm3dservice, vmicguestinterface, vmicheartbeat, vmickvpexchange, vmicrdv, vmicshutdown, vmictimesync, vmicvmsession, vmicvss, VMTools, vmvss, VSS, W32Time, WaaSMedicSvc, WalletService, WarpJITSvc, wbengine, WbioSrvc, Wcmsvc, wcncsvc, WdiServiceHost, WdiSystemHost, WdNisSvc, WebClient, Wecsvc, WEPHOSTSVC, wercplsupport, WerSvc, WFDSConMgrSvc, WiaRpc, WinDefend, WinHttpAutoProxySvc, Winmgmt, WinRM, wisvc, WlanSvc, wlidsvc, wlpasvc, WManSvc, wmiApSrv, WMPNetworkSvc, workfolderssvc, WpcMonSvc, WPDBusEnum, WpnService, wscsvc, WSearch, wuauserv, WwanSvc, XblAuthManager, XblGameSave, XboxGipSvc, XboxNetApiSvc, AarSvc_6f2b5, BcastDVRUserService_6f2b5, BluetoothUserService_6f2b5, CaptureService_6f2b5, cbdhsvc_6f2b5, CDPUserSvc_6f2b5, ConsentUxUserSvc_6f2b5, CredentialEnrollmentManagerUserSvc_6f2b5, DeviceAssociationBrokerSvc_6f2b5, DevicePickerUserSvc_6f2b5, DevicesFlowUserSvc_6f2b5, MessagingService_6f2b5, OneSyncSvc_6f2b5, PimIndexMaintenanceSvc_6f2b5, PrintWorkflowUserSvc_6f2b5, UdkUserSvc_6f2b5, UnistoreSvc_6f2b5, UserDataSvc_6f2b5, WpnUserService_6f2b5 |
By using the API, the answer from NSClient++ comes via HTTP, which does not have such limits (technically speaking there are certain limits, such as request header size limits, but they don't apply in this scenario).
And what about speed? Which check returns a result faster?
root@icinga2:~# time /usr/lib/nagios/plugins/check_nrpe -H windowshost.example.com --payload-size=4096 -2 -c check_wmi -a "query=select Name from Win32_Service"
AdobeARMservice, AJRouter, [...]
real 0m0.079s
user 0m0.011s
sys 0m0.013s
root@icinga2:~# time /usr/lib/nagios/plugins/check_nscp_api -H windowshost.example.com -P 8443 --password 1234 -q check_wmi -a "query=select Name from Win32_Service"
check_wmi AdobeARMservice, AJRouter, [...] |
real 0m0.060s
user 0m0.025s
sys 0m0.010s
The time comparison was run a couple of times in a row. Although both checks methods show almost the same response time, in 90% of all checks, the API method was slightly faster. Using the NSClient++ API is therefore definitely a worthy NRPE replacement!
Infiniroot is more than a managed server hoster in Switzerland. We are experts in building solutions using open source software. Monitoring is one part of this. We love sharing our solutions and are available as consultants for helping your on-prem installation or to provide data knowledge transfer, such as an introduction to Icinga 2 workshop.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder