Degraded performance affecting Live Progress Report in US-East-1

Incident Report for Learnosity

Postmortem

Affected Systems and Regions

On 2026-04-16, Learnosity experienced a service degradation impacting the Live Progress report for a small subset of customers in the us-east-1 region. The issue began at approximately 13:12 UTC and was resolved at 14:40 UTC. The total duration of customer impact was approximately 88 minutes.

Investigation

The issue was detected following elevated error rates on the load balancer serving the eventbus service. Investigation determined that an unhealthy condition within the EC2 instances led to elevated CPU utilization and memory pressure within the Auto Scaling Group, resulting in application instability and repeated service restarts. Inconsistent recovery caused traffic to concentrate on a subset of hosts, further elevating error rates. A secondary effect of the instability was increased request pressure on downstream dependencies.

Resolution

Service was restored by stabilizing the affected EC2 instance and restoring consistent application availability across the Auto Scaling Group. Load distribution normalized once all instances returned to a healthy state.

Prevention

Learnosity is implementing the following measures to mitigate:

Improve service startup and dependency handling to ensure consistent recovery behavior
Review resource thresholds to reduce the likelihood of similar instability
Enhance monitoring to detect and respond more quickly to similar conditions

Posted May 04, 2026 - 13:21 EDT

Resolved

As of 14:30 UTC, the Live Progress report event degradation issue in the us-east-1 region has been resolved. Additional capacity processed the event load and these systems are operating normally.

Brief additional latency during this scaling may have been introduced for users of Learnosity's premium Firehose feature, but we've seen no evidence that this affected customers. This, too, was resolved once scaling was complete.

Learnosity Support and Systems Engineering teams will follow up with a post mortem once we have completed root cause analysis and finalized any next steps or preventative measures required.

Please reach out if you have any questions or concerns.

Posted Apr 16, 2026 - 11:23 EDT

Investigating

As of 13:30 UTC, we are currently experiencing slow downs affecting the Live Progress report in the us-east-1 region.

Atypical use of eventbus may also be affected, such as custom reports/implementations.

Learnosity Support and Systems Engineering teams are actively investigating the issue, and will follow on with an update and resolution as soon as possible.

Posted Apr 16, 2026 - 10:08 EDT

This incident affected: AMER || Analytics (Live Progress (Live Activity by User) report).