At 11:08 AM PST, DevOps and customers reported sporadic error messages occurring after login for KeeperPAM customers with connections and tunnels enabled. While vault login itself was not affected, the errors were triggered asynchronously after login due to a communication issue with the KeeperPAM router endpoint.
Further investigation determined that a "scheduler" database within the AWS RDS environment - used for certain PAM scheduling operations - was encountering an unexpected error. The operations team resolved the underlying database issue and restarted the affected services.
This scheduler service was inadvertently impacting connection establishment. To prevent similar issues in the future, we will be implementing software changes to ensure that scheduling-related errors cannot affect connection functionality.
Additional monitoring and alerting have also been implemented for the scheduler databases to help detect and prevent similar issues going forward.
Full service for connection and tunneling capabilities was restored by 11:40 AM PST.