Vault login errors

Incident Report for Keeper Status Page

Postmortem

Incident Summary

The Keeper DevOps team was alerted to API errors in the US region. Upon investigation, we identified a high number of database deadlocks affecting performance. To mitigate the immediate impact, we restarted the affected writer instance.

Root Cause

Historically, similar issues were linked to specific queries impacting MySQL 8.x updates on the RDS Aurora platform. In this case, we discovered that a utility designed to monitor deadlocks and capture diagnostic data was inadvertently contributing to the problem. A known bug in the MySQL minor version used in production caused our data-gathering query—intended to analyze deadlocks—to trigger additional locking, exacerbating the issue.

Resolution & Mitigation

  • We promptly disabled the problematic monitoring utility and applied database parameter optimizations in collaboration with the AWS team to enhance performance under peak load.
  • These changes were fully implemented within 24 hours, and no further excessive row locks have been observed.
  • Our DBA team is actively working with backend engineers to further optimize write operations for improved performance.
  • A backend code update with additional optimizations is scheduled for release the week of March 17.

We appreciate your patience and will continue to monitor and enhance system performance to prevent similar issues in the future.

Posted Mar 10, 2025 - 21:41 PDT

Resolved

Between 5:15PM PST and 5:30PM PST, there were a high number of Vault Login errors.
Posted Mar 10, 2025 - 17:00 PDT