How to monitor Salesforce Cloud and integrate into your monitoring software

Written by - 0 comments

Published on July 5th 2019 - last updated on August 2nd 2019 - Listed in Monitoring Internet


In the past decade, a lot of applications have switched from on-premise installations into the cloud (software as a service). Salesforce is such an example. Although I don't understand the hype around it (purely technically speaking), a lot of companies have migrated their customer database/crm systems into Salesforce.

There's a problem though. Again, technically speaking. What about monitoring? An on-premise software is usually (at least when I'm in charge) covered by a monitoring system. But how does that work with a cloud service as Salesforce?

In the past few months I enabled a simple check of simply monitoring the login page of the relevant Salesforce instance. However this didn't show problem inside the instance. The login form pretty much showed up always, indicating no problems whatsoever.

Hurray, there's a status API

Yesterday I came across the Salesforce Status API. This API is supposed to return the status of the Salesforce services, including specific instances. Let's try that with curl:

$ curl https://api.status.salesforce.com/v1/instances/CS110/status
{"key":"CS110","location":"EMEA","environment":"sandbox","releaseVersion":"Summer '19 Patch 12.1","releaseNumber":"220.12.1","status":"OK","isActive":true,"Services":[{"key":"coreService","order":1,"isCore":true},{"key":"liveAgent","order":20,"isCore":false},{"key":"search","order":5,"isCore":false},{"key":"analytics","order":10,"isCore":false},{"key":"CPQandBilling","order":100,"isCore":false}],"Products":[{"key":"Service_Cloud","order":10},{"key":"Sales_Cloud","order":1},{"key":"LiveAgent_Omni-Channel","order":20},{"key":"CPQ_and_Billing","order":100},{"key":"Financial_Services_Cloud","order":80},{"key":"Lightning_Platform","order":30},{"key":"Einstein_Analytics","order":60},{"key":"Health_Cloud","order":90},{"key":"Community_Cloud","order":40}],"Incidents":[],"Maintenances":[{"id":45691,"message":{"maintenanceType":"release","availability":"fullyAvailable","eventStatus":"confirmed"},"externalId":"a3GB0000000TmzFMAS","name":"QTC Summer '19 Major Release (220) - R2a","plannedStartTime":"2019-06-15T06:30:00.000Z","plannedEndTime":"2019-06-15T08:30:00.000Z","additionalInformation":"All orgs running CPQ versions 208.x, 210.x. 212.x. 214.x 216.x and 218.x and on the listed instances will be upgraded to CPQ Summer '19 (220.x) package.\r\nThe Billing package, if installed, will also be upgraded to Billing Summer'19 (220.x) package along with the relevant payment gateways.\r\nThe Advanced Approvals package, if installed, will also be upgraded to the latest version.","isCore":false,"affectsAll":false,"createdAt":"2019-04-18T18:19:34.076Z","updatedAt":"2019-06-14T06:19:49.076Z","MaintenanceImpacts":[],"MaintenanceEvents":[{"id":23633,"type":"reminder","message":"This maintenance will happen in 10 days.","createdAt":"2019-06-04T06:36:55.217Z","updatedAt":"2019-06-04T06:36:55.229Z"},{"id":19559,"type":"scheduled","message":"This maintenance is scheduled.","createdAt":"2019-04-18T18:19:34.107Z","updatedAt":"2019-04-18T18:19:34.107Z"}],"instanceKeys":["CS110"],"serviceKeys":["CPQandBilling"]},{"id":29551,"message":{"maintenanceType":"release","availability":"unavailable","eventStatus":"confirmed"},"externalId":"a3GB00000008fLZMAY","name":"Summer '19 Major Release","plannedStartTime":"2019-06-14T23:00:00.000Z","plannedEndTime":"2019-06-14T23:05:00.000Z","additionalInformation":null,"isCore":true,"affectsAll":true,"createdAt":"2019-02-01T00:18:01.602Z","updatedAt":"2019-06-13T22:49:53.385Z","MaintenanceImpacts":[{"id":4015,"startTime":"2019-06-14T23:00:09.000Z","endTime":"2019-06-14T23:00:48.000Z","type":"deployingRelease","severity":"maintenance","createdAt":"2019-06-14T23:00:13.364Z","updatedAt":"2019-06-14T23:00:50.478Z","startTimeCreatedAt":"2019-06-14T23:00:13.365Z","startTimeModifiedAt":null,"endTimeCreatedAt":"2019-06-14T23:00:50.478Z","endTimeModifiedAt":null}],"MaintenanceEvents":[{"id":24845,"type":"majorReleaseFeaturesEnabled","message":"The upgrade activities are now complete and all major release features are available.","createdAt":"2019-06-15T01:11:41.156Z","updatedAt":"2019-06-15T01:11:41.156Z"},{"id":24798,"type":"majorReleaseReleaseIsLive","message":"The release is now live. The instance should be generally available as we continue to perform upgrade activities including feature enablement, which typically completes within six hours and no later than 24 hours.","createdAt":"2019-06-14T23:00:55.852Z","updatedAt":"2019-06-14T23:00:55.852Z"},{"id":24779,"type":"majorRelease10MinutesToZdt","message":"The release is about to begin.","createdAt":"2019-06-14T22:50:02.815Z","updatedAt":"2019-06-14T22:50:02.815Z"},{"id":23440,"type":"reminder","message":"This maintenance will happen in 10 days.","createdAt":"2019-06-03T23:06:13.568Z","updatedAt":"2019-06-03T23:06:13.578Z"},{"id":20279,"type":"scheduled","message":"This maintenance is scheduled.","createdAt":"2019-04-18T18:36:39.859Z","updatedAt":"2019-04-18T18:36:39.872Z"}],"instanceKeys":["CS110"],"serviceKeys":["coreService"]},{"id":46456,"message":{"maintenanceType":"release","availability":"fullyAvailable","eventStatus":"confirmed"},"externalId":"a3GB0000000TrPVMA0","name":"Health Cloud - Summer '19 R2ARelease","plannedStartTime":"2019-06-14T23:00:00.000Z","plannedEndTime":"2019-06-15T06:00:00.000Z","additionalInformation":null,"isCore":false,"affectsAll":false,"createdAt":"2019-05-21T18:49:21.731Z","updatedAt":"2019-06-13T22:50:04.319Z","MaintenanceImpacts":[],"MaintenanceEvents":[{"id":23470,"type":"reminder","message":"This maintenance will happen in 10 days.","createdAt":"2019-06-03T23:06:38.031Z","updatedAt":"2019-06-03T23:06:38.042Z"},{"id":22861,"type":"scheduled","message":"This maintenance is scheduled.","createdAt":"2019-05-21T18:49:21.751Z","updatedAt":"2019-05-21T18:49:21.751Z"}],"instanceKeys":["CS110"],"serviceKeys":[]},{"id":46475,"message":{"maintenanceType":"release","availability":"fullyAvailable","eventStatus":"confirmed"},"externalId":"a3GB0000000TrPQMA0","name":"Financial Services Cloud - Summer '19 R2ARelease","plannedStartTime":"2019-06-14T23:00:00.000Z","plannedEndTime":"2019-06-15T06:00:00.000Z","additionalInformation":null,"isCore":false,"affectsAll":false,"createdAt":"2019-05-21T18:49:28.532Z","updatedAt":"2019-06-13T22:49:57.831Z","MaintenanceImpacts":[],"MaintenanceEvents":[{"id":23452,"type":"reminder","message":"This maintenance will happen in 10 days.","createdAt":"2019-06-03T23:06:30.701Z","updatedAt":"2019-06-03T23:06:30.711Z"},{"id":22880,"type":"scheduled","message":"This maintenance is scheduled.","createdAt":"2019-05-21T18:49:28.550Z","updatedAt":"2019-05-21T18:49:28.550Z"}],"instanceKeys":["CS110"],"serviceKeys":[]},{"id":29658,"message":{"maintenanceType":"release","availability":"unavailable","eventStatus":"confirmed"},"externalId":"a3GB0000000L3NHMA0","name":"Spring '20 Major Release","plannedStartTime":"2020-02-15T00:00:00.000Z","plannedEndTime":"2020-02-15T00:05:00.000Z","additionalInformation":null,"isCore":true,"affectsAll":true,"createdAt":"2019-02-02T00:48:04.036Z","updatedAt":"2019-07-05T09:34:02.110Z","MaintenanceImpacts":[],"MaintenanceEvents":[],"instanceKeys":["CS110"],"serviceKeys":["coreService"]},{"id":29552,"message":{"maintenanceType":"release","availability":"unavailable","eventStatus":"confirmed"},"externalId":"a3GB0000000CnFrMAK","name":"Winter '20 Major Release","plannedStartTime":"2019-10-11T23:00:00.000Z","plannedEndTime":"2019-10-11T23:05:00.000Z","additionalInformation":null,"isCore":true,"affectsAll":true,"createdAt":"2019-02-01T00:18:01.897Z","updatedAt":"2019-07-05T09:41:32.902Z","MaintenanceImpacts":[],"MaintenanceEvents":[],"instanceKeys":["CS110"],"serviceKeys":["coreService"]},{"id":46843,"message":{"maintenanceType":"release","availability":"unavailable","eventStatus":"confirmed"},"externalId":"a3GB0000000L58fMAC","name":"Summer '20Major Release","plannedStartTime":"2020-06-12T23:00:00.000Z","plannedEndTime":"2020-06-12T23:05:00.000Z","additionalInformation":null,"isCore":true,"affectsAll":true,"createdAt":"2019-05-30T14:19:44.427Z","updatedAt":"2019-07-05T09:35:22.852Z","MaintenanceImpacts":[],"MaintenanceEvents":[],"instanceKeys":["CS110"],"serviceKeys":["coreService"]}],"Tags":[]}

Yes, that's a lot of information in json format. The most important one is the "status" key, which can be extracted using a json parser (here jshon):

$ curl -s https://api.status.salesforce.com/v1/instances/CS110/status | jshon -e status
"OK"

Using the monitoring plugin check_http we can now check for the appearance of "status:"OK":

$ /usr/lib/nagios/plugins/check_http -I api.status.salesforce.com -H api.status.salesforce.com -u /v1/instances/CS110/status -S --sni -s '"status":"OK"'
HTTP OK: HTTP/1.1 200 OK - 7681 bytes in 0.713 second response time |time=0.713329s;;;0.000000 size=7681B;;;0

Integration as a monitoring service

Icinga 1.x, Nagios, Naemon and Shinken

In Icinga 1.x, Nagios, Naemon and Shinken you would typically define a service using a command using the check_http plugin. This might need some preparation on the command definition to support all the required parameters. Here's an example:

# check_http_api command definition with more arguments
define command{
  command_name check_http_api
  command_line $USER1$/check_http -H $ARG1$ -S --sni -u $ARG2$ -s $ARG3$
}

And the service example, using a dummy host "externalchecks":

# Salesforce Sales Cloud Instance CS110
define service{
  use generic-service
  host_name externalchecks
  service_description HTTP Salesforce Sales Cloud Instance CS110
  check_command check_http_api!api.status.salesforce.com!/v1/instances/CS110/status!'"status":"OK"'
}

Icinga 2

In Icinga 2 you can use the "http" command as is, however you have to escape the double-quotes for the expected string:

# check Salesforce CS110
object Service "HTTP Salesforce Sales Cloud Instance CS110" {
  import "generic-service"
  host_name = "externalchecks"
  check_command = "http"
  vars.http_address = "api.status.salesforce.com"
  vars.http_vhost = "api.status.salesforce.com"
  vars.http_uri = "/v1/instances/CS110/status"
  vars.http_string = "\"status\":\"OK\""
  vars.http_ssl = true
  vars.http_sni = true
}

Is the API honest?

The monitoring system, whatever software you use, can of course only work correctly, if it receives correct information. The same applies here. As there is no other way around it, we "have to trust" the output of the API. That's the (monitoring) dilemma with cloud software.

I guess we'll see after a couple of months if the user experience and the monitoring alerts match.

Update August 2nd 2019: Yes, the API is pretty honest!

On every 1st of the month, SLA reports are generated based on the values coming from our monitoring. The Marketing Cloud instance had a significant drop in July 2019:

Marketing Cloud SLA graph

I first thought of an error in our monitoring, but the status API did indeed not return the value "OK" for the json key "status":

Marketing Cloud Status API Alerts

The reason for this very long downtime could also be an announced maintenance window of which I and our Icinga 2 monitoring were not informed about. Nevertheless, it's good to see that the Salesforce status API seems to correctly indicate failures or performance degradation.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.