On 2026-04-16, Learnosity experienced a service degradation impacting the Live Progress report for a small subset of customers in the us-east-1 region. The issue began at approximately 13:12 UTC and was resolved at 14:40 UTC. The total duration of customer impact was approximately 88 minutes.
The issue was detected following elevated error rates on the load balancer serving the eventbus service. Investigation determined that an unhealthy condition within the EC2 instances led to elevated CPU utilization and memory pressure within the Auto Scaling Group, resulting in application instability and repeated service restarts. Inconsistent recovery caused traffic to concentrate on a subset of hosts, further elevating error rates. A secondary effect of the instability was increased request pressure on downstream dependencies.
Service was restored by stabilizing the affected EC2 instance and restoring consistent application availability across the Auto Scaling Group. Load distribution normalized once all instances returned to a healthy state.
Learnosity is implementing the following measures to mitigate: