On 2nd May 2025, Quickwork’s APIM Gateway in the Mumbai region experienced a service disruption for 11 minutes, starting at 9:22 AM IST. The issue occurred during a routine automatic upgrade to apply security patches on the host machine.
Root Cause:
During the upgrade process, the new APIM Gateway instances were unable to connect to the database due to a restrictive security group configuration. This security constraint blocked essential network access required for service registration. As a result:
1. New instances failed to register with the load balancer. 2. Existing instances were terminated before new ones became healthy and available. 3. This led to a brief outage in APIM Gateway availability.
Impact:
1. APIM Gateway was unavailable for 11 minutes. 2. Customer API traffic and service operations in the Mumbai region were briefly affected.
Remediation Actions:
1. Security Group Fix: Corrected the security group rules to allow necessary connectivity between APIM services and the database. 2. Disruption Policy Implemented: Applied a proper disruption policy to ensure new services are fully up and healthy before old ones are terminated during upgrades. 3. Preventive Configuration Check: Added validation checks to avoid such misconfigurations in future deployments.