Issue affecting availability of recently submitted session analytics in US-East-1

Incident Report for Learnosity

Postmortem

Affected Systems and Regions

On 19 August 2025, Learnosity experienced degraded performance in our analytics stack, affecting session data availability in US-East-1 for a subset of customers. This affected the Reports API and Data API, with no other stacks affected, and no data loss. 

Investigation

Monitoring detected a rapid increase in unprocessed and retried messages, along with elevated lock contention in the sessions database. The root cause was traced to a customer implementation issue generating an extraordinarily high number of submissions. This drove excessive retries, magnifying actual traffic volume. 

The use of time-ordered v7 UUIDs for session IDs, normally handled without issue, became problematic under this contention. Uniqueness checks on each session ID required more resources and triggered a succession of temporary deadlocks. These deadlocks would usually self-resolve, but the amplified traffic prevented recovery, turning a minor issue into a sustained queue blockage.

Resolution

Once the issue was identified, Learnosity moved the customer to a dedicated, isolated sync queue, preventing cross‑tenant impact while we investigated. We applied targeted rate limits for the isolated service to protect the database, and drained the backlog. Where safe, long‑running queries were terminated to free locks and allow forward progress. 

To support faster diagnosis, Learnosity enabled detailed deadlock logging and expanded metrics around message retries, abandonment, and per‑session activity. Learnosity also worked with the customer to adjust implementation settings, reducing combined saves and submits by two orders of magnitude. Session IDs were also switched to v4 UUIDs which simplified uniqueness checks further preventing deadlocks.

Immediately after these changes were put into use, queues began to rapidly recover, and normal processing resumed. Most sessions for the subset of affected customers saw short delays, while the most significantly delayed session took ~6 hours before final persistence. Throughout, we identified no data loss.

Prevention

To prevent recurrence, we are:

  • Implementing targeted load tests and contention simulations to replicate high-parallelism patterns.
  • Reviewing customer identifier schemes for session IDs and auditing usage across all customers (initial checks confirm none of our Top 50 customers currently use v7 UUIDs).
  • Analyzing adoption of per-tenant fair-use queues (or equivalent fair-share policies) to cap burst throughput from a single tenant and protect shared infrastructure.
Posted Sep 19, 2025 - 15:15 EDT

Resolved

As of 21:30 UTC, we are have resolved the issue affecting availability of session data in US-East-1.

Learnosity Support and Systems Engineering teams will follow up with a post mortem once we have completed root cause analysis and finalized any next steps or preventative measures required.

Please reach out if you have any questions or concerns.
Posted Aug 19, 2025 - 17:31 EDT

Monitoring

As of 21:00 UTC, all sessions have been cleared from the scoring backlog queue and all submissions are being processed normally.

Learnosity Support and Systems Engineering teams will monitor this situation for a further 30 minutes before calling it resolved.
Posted Aug 19, 2025 - 17:05 EDT

Update

As of 20:30 UTC, the scoring queue backlog is almost empty and new sessions will soon be processed without delay.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Aug 19, 2025 - 16:35 EDT

Update

As of 19:30 UTC, we are now processing queued sessions rapidly and more than half of the backlog has already cleared..

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Aug 19, 2025 - 15:56 EDT

Update

As of 18:30 UTC, initial remediation efforts have tripled the rate of queued session processing and we are continuing to work toward a full resolution.

Access to recently submitted session results via the Data API and Reports API remains the only affected part of the Learnosity ecosystem. New submissions continue to be safely queued for scoring while the degraded performance remains.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Aug 19, 2025 - 14:42 EDT

Identified

(Note: We're correcting cited UTC times to the 24 hr format and will include both forms in this update only.)

As of 5:30pm/17:30 UTC, we've identified a possible contributing cause for the degraded performance in our analytics stack.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Aug 19, 2025 - 13:34 EDT

Update

As of 4:30 UTC, we are continuing to investigate degraded performance in our Data and Reports APIs.

Only availability of recently submitted session data is affected. Historic session data, as well as all other API stacks, remain unaffected. New submissions are persisting correctly with no data loss and these submission are being queued for scoring.

Learnosity Support and Systems Engineering teams are continuing to actively investigate the issue, and will follow on with an update and resolution as soon as possible.
Posted Aug 19, 2025 - 12:35 EDT

Investigating

As of 3:40 UTC, we are experiencing degraded performance in our analytics stack affecting the availability of session data in US-East-1 for a subset of customers.

This is affecting the Reports API and Data API. Neither authoring nor assessment stacks are affected, and there is no data loss. Submitted sessions are queueing for processing.

Learnosity Support and Systems Engineering teams are actively investigating the issue, and will follow on with an update and resolution as soon as possible.
Posted Aug 19, 2025 - 12:11 EDT
This incident affected: AMER || Analytics (Availability of session information).