At 9:05 AM PST, alerts were triggered for issues affecting KeeperPAM connections managed through Keeper’s ECS deployments in the US-EAST region. There were no recent changes to the application or environment.
Investigation identified a low-level concurrency bug in the Keeper EPM service that caused request failures under high simultaneous load. These failures led to instability in the ECS services supporting KeeperPAM connections.
As a temporary mitigation, we blocked the error condition, restoring KeeperPAM connectivity by approximately 1:00 PM PST.
The engineering team then developed and deployed an updated Keeper Router version to address the underlying issue and prevent EPM agents from triggering server errors. The fix was fully validated by 3:00 PM PST, at which point all KeeperPAM services were stable and operating normally.