During the period between 19:00 UTC and 19:39 UTC, Learnosity support and infrastructure teams observed a moderate frequency of interconnectivity issues within our US data center - AWS US-EAST-1 (Virginia). A number of these issues were observed in other SaaS platforms using AWS during this time as well - namely Datadog and Slack during this period.
As a result of this, during this time period a small percentage of save/submit requests resulted in a save:failure event being raised in the Items API. At current time of investigation, we believe the incidence rate of this to be ~1%. This would have been retried automatically by the application, and subsequently presented an appropriate dialog for an end user to retry the save appropriately - as such, the actual affected number of users would likely be far lower.
During this period, as designed, Learnosity’s session scoring queue created a backlog to ensure no data loss while we were experiencing connectivity issues. This queue was fully cleared by 19:55 UTC. During this time, 66% of sessions were still succesfully processed and available via the Data API in under a second, with this returning to 98% of sessions being processed and available via the Data API in under a second from then on.
We have open investigations with AWS at this point in time - and should Learnosity discover additional information that allows us to become more resilient to connectivity issues such as these, our support team will update this post mortem with further shareable information and actions.