Affected Systems and Regions
On 2025-01-09, Learnosity experienced a brief partial outage affecting our analytics stacks, specifically the Reports API and Data API in the AMER region. The issue began at 13:30 UTC and was resolved at 14:06 UTC, lasting 36 minutes. All other API's were unaffected and there was no loss of data.
Investigation
We discovered that a large number of atypical, inefficient Data API queries requested by customers were taking too long to complete. This prevented other queries from running in a timely manner, creating a backlog. It was determined that an additional database index would significantly improve the response times of these types of queries.
Resolution
Immediately upon discovering this issue, the impacted database instances were successfully scaled up to ease the backlog of requests. The additional index was implemented and all remaining queries were processed quickly. Affected APIs returned to normal operations, and further monitoring ensured the issue was fully resolved.
Prevention
Following further testing, the new index is working well and has become a permanent part of the system. We are also adding new automated monitoring and regression testing to ensure similar requests perform as expected.