At 9:21AM PST we received notifications of API errors coming from our backend systems. The issue resolved itself without any required action at 9:31AM PST. During that 10 minute period, there was a database lock in the primary writer instance causing new login sessions to fail.
Recently, Keeper upgraded all production RDS Aurora databases to MySQL 8.x. This updated database engine is sensitive to one particular SQL query that was causing large table scans to occur. The code changes and QA process was already under way when this issue occurred. The backend code change which addresses this issue passed QA at 12:48PM PST and we released the fix to prevent this issue from occurring again (Release Jira ticket REL-4633). We’re keeping a close eye and will issue additional updates as necessary.