Incorrect alerts are being issued for FME Cloud instances
Incident Report for Safe Software
Postmortem

We’d like to apologize to all FME Cloud customers who were affected by incorrect alerts for their instances.

On March 2nd, 2023 Safe Software noticed that incorrect alerts were being triggered for FME Cloud instances for disk space and memory usage. Internal investigation showed that the composite data for disk space was missing and memory data was wrong. We reached out to our service provider for metrics and alerting (Librato) and updated the Safe Software status page to show degraded performance for FME Cloud Dashboards.
Overnight it appeared to recover, so the status was moved to monitoring while we waited for a response or confirmation from Librato.

On March 8th Librato reported that one of their internal services responsible for metrics was occasionally failing to keep up with realtime traffic, and as a result a subset of metrics were being impacted. This is fixed now.

Safe Software has not had any incorrect alerts since March 3rd. In addition to the response from Librato we are confident this issue has been resolved.
The status of FME Cloud Dashboards has returned to operational and the incident is resolved on status.safe.com.

Posted Mar 08, 2023 - 13:58 PST

Resolved
This incident has been resolved.
From our service provider: "The summary of the problem is that one of the internal services responsible for the metrics portion of the product was occasionally failing to keep up with realtime traffic, and as a result a subset of metrics were being impacted.
This should be fixed now."
Posted Mar 06, 2023 - 09:11 PST
Monitoring
Monitoring and alerts appears to have recovered. We are still unsure of the root cause and waiting to get a resolution from our service provider before resolving this incident.
Posted Mar 03, 2023 - 15:27 PST
Update
Currently there are still issues with FME Cloud metrics and alerting. Our service provider is looking into the issue but at this time we do not have an ETA for when things will be fixed.
Posted Mar 02, 2023 - 17:34 PST
Identified
FME Cloud is experiencing issues with the service it uses for alerting. Alerts are being incorrectly triggered relating to disk and memory. Please disregard the alerts.

We are currently waiting for assistance from our metrics and alerting service.
Posted Mar 02, 2023 - 10:18 PST
This incident affected: FME Flow Hosted (FME Flow Hosted Dashboard/API, FME Flow Hosted Instances).