In the past decade, a lot of applications have switched from on-premise installations into the cloud (software as a service). Salesforce is such an example. Although I don't understand the hype around it (purely technically speaking), a lot of companies have migrated their customer database/crm systems into Salesforce.
There's a problem though. Again, technically speaking. What about monitoring? An on-premise software is usually (at least when I'm in charge) covered by a monitoring system. But how does that work with a cloud service as Salesforce?
In the past few months I enabled a simple check of simply monitoring the login page of the relevant Salesforce instance. However this didn't show problem inside the instance. The login form pretty much showed up always, indicating no problems whatsoever.
Yesterday I came across the Salesforce Status API. This API is supposed to return the status of the Salesforce services, including specific instances. Let's try that with curl:
$ curl https://api.status.salesforce.com/v1/instances/CS110/status
{"key":"CS110","location":"EMEA","environment":"sandbox","releaseVersion":"Summer '19 Patch 12.1","releaseNumber":"220.12.1","status":"OK","isActive":true,"Services":[{"key":"coreService","order":1,"isCore":true},{"key":"liveAgent","order":20,"isCore":false},{"key":"search","order":5,"isCore":false},{"key":"analytics","order":10,"isCore":false},{"key":"CPQandBilling","order":100,"isCore":false}],"Products":[{"key":"Service_Cloud","order":10},{"key":"Sales_Cloud","order":1},{"key":"LiveAgent_Omni-Channel","order":20},{"key":"CPQ_and_Billing","order":100},{"key":"Financial_Services_Cloud","order":80},{"key":"Lightning_Platform","order":30},{"key":"Einstein_Analytics","order":60},{"key":"Health_Cloud","order":90},{"key":"Community_Cloud","order":40}],"Incidents":[],"Maintenances":[{"id":45691,"message":{"maintenanceType":"release","availability":"fullyAvailable","eventStatus":"confirmed"},"externalId":"a3GB0000000TmzFMAS","name":"QTC Summer '19 Major Release (220) - R2a","plannedStartTime":"2019-06-15T06:30:00.000Z","plannedEndTime":"2019-06-15T08:30:00.000Z","additionalInformation":"All orgs running CPQ versions 208.x, 210.x. 212.x. 214.x 216.x and 218.x and on the listed instances will be upgraded to CPQ Summer '19 (220.x) package.\r\nThe Billing package, if installed, will also be upgraded to Billing Summer'19 (220.x) package along with the relevant payment gateways.\r\nThe Advanced Approvals package, if installed, will also be upgraded to the latest version.","isCore":false,"affectsAll":false,"createdAt":"2019-04-18T18:19:34.076Z","updatedAt":"2019-06-14T06:19:49.076Z","MaintenanceImpacts":[],"MaintenanceEvents":[{"id":23633,"type":"reminder","message":"This maintenance will happen in 10 days.","createdAt":"2019-06-04T06:36:55.217Z","updatedAt":"2019-06-04T06:36:55.229Z"},{"id":19559,"type":"scheduled","message":"This maintenance is scheduled.","createdAt":"2019-04-18T18:19:34.107Z","updatedAt":"2019-04-18T18:19:34.107Z"}],"instanceKeys":["CS110"],"serviceKeys":["CPQandBilling"]},{"id":29551,"message":{"maintenanceType":"release","availability":"unavailable","eventStatus":"confirmed"},"externalId":"a3GB00000008fLZMAY","name":"Summer '19 Major Release","plannedStartTime":"2019-06-14T23:00:00.000Z","plannedEndTime":"2019-06-14T23:05:00.000Z","additionalInformation":null,"isCore":true,"affectsAll":true,"createdAt":"2019-02-01T00:18:01.602Z","updatedAt":"2019-06-13T22:49:53.385Z","MaintenanceImpacts":[{"id":4015,"startTime":"2019-06-14T23:00:09.000Z","endTime":"2019-06-14T23:00:48.000Z","type":"deployingRelease","severity":"maintenance","createdAt":"2019-06-14T23:00:13.364Z","updatedAt":"2019-06-14T23:00:50.478Z","startTimeCreatedAt":"2019-06-14T23:00:13.365Z","startTimeModifiedAt":null,"endTimeCreatedAt":"2019-06-14T23:00:50.478Z","endTimeModifiedAt":null}],"MaintenanceEvents":[{"id":24845,"type":"majorReleaseFeaturesEnabled","message":"The upgrade activities are now complete and all major release features are available.","createdAt":"2019-06-15T01:11:41.156Z","updatedAt":"2019-06-15T01:11:41.156Z"},{"id":24798,"type":"majorReleaseReleaseIsLive","message":"The release is now live. The instance should be generally available as we continue to perform upgrade activities including feature enablement, which typically completes within six hours and no later than 24 hours.","createdAt":"2019-06-14T23:00:55.852Z","updatedAt":"2019-06-14T23:00:55.852Z"},{"id":24779,"type":"majorRelease10MinutesToZdt","message":"The release is about to begin.","createdAt":"2019-06-14T22:50:02.815Z","updatedAt":"2019-06-14T22:50:02.815Z"},{"id":23440,"type":"reminder","message":"This maintenance will happen in 10 days.","createdAt":"2019-06-03T23:06:13.568Z","updatedAt":"2019-06-03T23:06:13.578Z"},{"id":20279,"type":"scheduled","message":"This maintenance is scheduled.","createdAt":"2019-04-18T18:36:39.859Z","updatedAt":"2019-04-18T18:36:39.872Z"}],"instanceKeys":["CS110"],"serviceKeys":["coreService"]},{"id":46456,"message":{"maintenanceType":"release","availability":"fullyAvailable","eventStatus":"confirmed"},"externalId":"a3GB0000000TrPVMA0","name":"Health Cloud - Summer '19 R2ARelease","plannedStartTime":"2019-06-14T23:00:00.000Z","plannedEndTime":"2019-06-15T06:00:00.000Z","additionalInformation":null,"isCore":false,"affectsAll":false,"createdAt":"2019-05-21T18:49:21.731Z","updatedAt":"2019-06-13T22:50:04.319Z","MaintenanceImpacts":[],"MaintenanceEvents":[{"id":23470,"type":"reminder","message":"This maintenance will happen in 10 days.","createdAt":"2019-06-03T23:06:38.031Z","updatedAt":"2019-06-03T23:06:38.042Z"},{"id":22861,"type":"scheduled","message":"This maintenance is scheduled.","createdAt":"2019-05-21T18:49:21.751Z","updatedAt":"2019-05-21T18:49:21.751Z"}],"instanceKeys":["CS110"],"serviceKeys":[]},{"id":46475,"message":{"maintenanceType":"release","availability":"fullyAvailable","eventStatus":"confirmed"},"externalId":"a3GB0000000TrPQMA0","name":"Financial Services Cloud - Summer '19 R2ARelease","plannedStartTime":"2019-06-14T23:00:00.000Z","plannedEndTime":"2019-06-15T06:00:00.000Z","additionalInformation":null,"isCore":false,"affectsAll":false,"createdAt":"2019-05-21T18:49:28.532Z","updatedAt":"2019-06-13T22:49:57.831Z","MaintenanceImpacts":[],"MaintenanceEvents":[{"id":23452,"type":"reminder","message":"This maintenance will happen in 10 days.","createdAt":"2019-06-03T23:06:30.701Z","updatedAt":"2019-06-03T23:06:30.711Z"},{"id":22880,"type":"scheduled","message":"This maintenance is scheduled.","createdAt":"2019-05-21T18:49:28.550Z","updatedAt":"2019-05-21T18:49:28.550Z"}],"instanceKeys":["CS110"],"serviceKeys":[]},{"id":29658,"message":{"maintenanceType":"release","availability":"unavailable","eventStatus":"confirmed"},"externalId":"a3GB0000000L3NHMA0","name":"Spring '20 Major Release","plannedStartTime":"2020-02-15T00:00:00.000Z","plannedEndTime":"2020-02-15T00:05:00.000Z","additionalInformation":null,"isCore":true,"affectsAll":true,"createdAt":"2019-02-02T00:48:04.036Z","updatedAt":"2019-07-05T09:34:02.110Z","MaintenanceImpacts":[],"MaintenanceEvents":[],"instanceKeys":["CS110"],"serviceKeys":["coreService"]},{"id":29552,"message":{"maintenanceType":"release","availability":"unavailable","eventStatus":"confirmed"},"externalId":"a3GB0000000CnFrMAK","name":"Winter '20 Major Release","plannedStartTime":"2019-10-11T23:00:00.000Z","plannedEndTime":"2019-10-11T23:05:00.000Z","additionalInformation":null,"isCore":true,"affectsAll":true,"createdAt":"2019-02-01T00:18:01.897Z","updatedAt":"2019-07-05T09:41:32.902Z","MaintenanceImpacts":[],"MaintenanceEvents":[],"instanceKeys":["CS110"],"serviceKeys":["coreService"]},{"id":46843,"message":{"maintenanceType":"release","availability":"unavailable","eventStatus":"confirmed"},"externalId":"a3GB0000000L58fMAC","name":"Summer '20Major Release","plannedStartTime":"2020-06-12T23:00:00.000Z","plannedEndTime":"2020-06-12T23:05:00.000Z","additionalInformation":null,"isCore":true,"affectsAll":true,"createdAt":"2019-05-30T14:19:44.427Z","updatedAt":"2019-07-05T09:35:22.852Z","MaintenanceImpacts":[],"MaintenanceEvents":[],"instanceKeys":["CS110"],"serviceKeys":["coreService"]}],"Tags":[]}
Yes, that's a lot of information in json format. The most important one is the "status" key, which can be extracted using a json parser (here jshon):
$ curl -s https://api.status.salesforce.com/v1/instances/CS110/status | jshon -e status
"OK"
Using the monitoring plugin check_http we can now check for the appearance of "status:"OK":
$ /usr/lib/nagios/plugins/check_http -I api.status.salesforce.com -H api.status.salesforce.com -u /v1/instances/CS110/status -S --sni -s '"status":"OK"'
HTTP OK: HTTP/1.1 200 OK - 7681 bytes in 0.713 second response time |time=0.713329s;;;0.000000 size=7681B;;;0
In Icinga 1.x, Nagios, Naemon and Shinken you would typically define a service using a command using the check_http plugin. This might need some preparation on the command definition to support all the required parameters. Here's an example:
# check_http_api command definition with more arguments
define command{
command_name check_http_api
command_line $USER1$/check_http -H $ARG1$ -S --sni -u $ARG2$ -s $ARG3$
}
And the service example, using a dummy host "externalchecks":
# Salesforce Sales Cloud Instance CS110
define service{
use generic-service
host_name externalchecks
service_description HTTP Salesforce Sales Cloud Instance CS110
check_command check_http_api!api.status.salesforce.com!/v1/instances/CS110/status!'"status":"OK"'
}
In Icinga 2 you can use the "http" command as is, however you have to escape the double-quotes for the expected string:
# check Salesforce CS110
object Service "HTTP Salesforce Sales Cloud Instance CS110" {
import "generic-service"
host_name = "externalchecks"
check_command = "http"
vars.http_address = "api.status.salesforce.com"
vars.http_vhost = "api.status.salesforce.com"
vars.http_uri = "/v1/instances/CS110/status"
vars.http_string = "\"status\":\"OK\""
vars.http_ssl = true
vars.http_sni = true
}
The monitoring system, whatever software you use, can of course only work correctly, if it receives correct information. The same applies here. As there is no other way around it, we "have to trust" the output of the API. That's the (monitoring) dilemma with cloud software.
I guess we'll see after a couple of months if the user experience and the monitoring alerts match.
On every 1st of the month, SLA reports are generated based on the values coming from our monitoring. The Marketing Cloud instance had a significant drop in July 2019:
I first thought of an error in our monitoring, but the status API did indeed not return the value "OK" for the json key "status":
The reason for this very long downtime could also be an announced maintenance window of which I and our Icinga 2 monitoring were not informed about. Nevertheless, it's good to see that the Salesforce status API seems to correctly indicate failures or performance degradation.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder