Possible Issue affecting availability of session data in all regions

Incident Report for Learnosity

Postmortem

During the period between 19:00 UTC and 19:39 UTC, Learnosity support and infrastructure teams observed a moderate frequency of interconnectivity issues within our US data center - AWS US-EAST-1 (Virginia). A number of these issues were observed in other SaaS platforms using AWS during this time as well - namely Datadog and Slack during this period.

As a result of this, during this time period a small percentage of save/submit requests resulted in a save:failure event being raised in the Items API. At current time of investigation, we believe the incidence rate of this to be ~1%. This would have been retried automatically by the application, and subsequently presented an appropriate dialog for an end user to retry the save appropriately - as such, the actual affected number of users would likely be far lower.

During this period, as designed, Learnosity’s session scoring queue created a backlog to ensure no data loss while we were experiencing connectivity issues. This queue was fully cleared by 19:55 UTC. During this time, 66% of sessions were still succesfully processed and available via the Data API in under a second, with this returning to 98% of sessions being processed and available via the Data API in under a second from then on.

We have open investigations with AWS at this point in time - and should Learnosity discover additional information that allows us to become more resilient to connectivity issues such as these, our support team will update this post mortem with further shareable information and actions.

Posted Sep 24, 2020 - 17:15 EDT

Resolved

As of 8:09 UTC, the issue affecting availability of session data has been resolved in the US region.

Learnosity is continuing to investigate with third-party vendors for a complete root cause analysis.

Learnosity Support and Systems Engineering teams will follow up with additional information as it becomes available.

Please reach out if you have any questions or concerns.

Posted Sep 24, 2020 - 16:11 EDT

Update

As of 7:58 UTC, analytics endpoints have restored to operational performance.

Learnosity Support and Systems Engineering teams are continuing to monitor the issue, and will follow on with an update and resolution as soon as possible.

Posted Sep 24, 2020 - 15:59 EDT

Update

As of 7:48 UTC, we are still investigating this issue.

Session data availability and report loading is improving and we're investigating if wider outages in third-party services are a contributing factor.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.

Posted Sep 24, 2020 - 15:53 EDT

Update

As of 7:32 UTC, we are still investigating this issue.

IE and AU regions appear unaffected. Continuing to investigate impact on isolated segments of US region.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.

Posted Sep 24, 2020 - 15:35 EDT

Update

As of 7:28 UTC, we are still investigating possible delays in the availability of session data.

At present this appears to be isolated and not affecting all customers.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.

Posted Sep 24, 2020 - 15:29 EDT

Investigating

As of 7:18 UTC, We are currently investigating possible delays in retrieving post-submission session data affecting all regions.

As far as we can tell right now, assessment delivery is unaffected, and all sessions are being captured.

Learnosity Support and Systems Engineering teams are actively investigating the issue, and will follow on with an update and resolution as soon as possible.

Posted Sep 24, 2020 - 15:24 EDT

This incident affected: AMER || Analytics (Loading and rendering of reports, Availability of session information), APAC || Analytics (Loading and rendering of reports, Availability of session information), and EMEA || Analytics (Loading and rendering of reports, Availability of session Information).