Issue affecting scoring delays in US East 1 (VA)
Incident Report for Learnosity
Postmortem

An underlying storage system corruption, linked to a single row in the database table where scoring errors are saved, caused delays in the asynchronous scoring queue. Prior to correction, queries run against this table could take anywhere from a partial second up to several seconds to complete, and this impact was compounded by an unusually high number of sessions scoring errors submitted at one time.

Because this issue only surfaced when rare scoring errors were logged, it was not immediately detected. Originally, the number and size of RDS instances were increased to work through the scoring backlog but the queue cleared before this process could be completed. This led us to the fact that delays only occurred when errors were persisted, and led to the discovery, and deletion, of the problem table row.

Additional measures have been put in place to monitor the write speed of errors, as well as session data, and the issue has not resurfaced since then.

Posted Jan 21, 2022 - 16:00 EST

Resolved
As of 16:53 UTC, we have resolved the issue affecting scoring delays in the US East 1 (VA) region.

Learnosity Support and Systems Engineering teams will follow up with a post mortem once we have completed root cause analysis and finalized any next steps or preventative measures required.

Please reach out if you have any questions or concerns.
Posted Oct 27, 2021 - 12:54 EDT
Update
As of 16:22 UTC, the scoring queue is empty, with sessions scoring too quickly to create a backlog.

Learnosity Support and Systems Engineering teams will continue to monitor for 30 more minutes and then, in the absence of any further delays, mark this issue resolved.
Posted Oct 27, 2021 - 12:23 EDT
Update
As of 16:00 UTC, scoring throughput remains to normal. The backlog in the queues is continuing to reduce.

Learnosity Support and Systems Engineering teams are continuing to monitor the issue, and will follow on with an update and resolution as soon as possible.
Posted Oct 27, 2021 - 12:03 EDT
Monitoring
As of 15:46 UTC, scoring throughput has returned to normal. We are still seeing a backlog in some scoring queues but it is rapidly reducing.

Learnosity Support and Systems Engineering teams are continuing to monitor the issue, and will follow on with an update and resolution as soon as possible.
Posted Oct 27, 2021 - 11:49 EDT
Update
As of 15:00 UTC, the Learnosity Systems Engineering team is working on mitigating the delays affecting the availability of newly scored session data in US East 1.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Oct 27, 2021 - 11:07 EDT
Update
As of 14:30 UTC, we are continuing to investigate an uptick in scoring delays that are affecting the US East 1 (VA) region.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Oct 27, 2021 - 10:30 EDT
Investigating
As of 14:00 UTC, we are currently experiencing minor delays in scoring of submitted sessions affecting the US East 1 region (VA).

Learnosity Support and Systems Engineering teams are actively investigating the issue, and will follow on with an update and resolution as soon as possible.
Posted Oct 27, 2021 - 10:11 EDT
This incident affected: AMER || Data Centric (Updating session response scores).