APIM Gateway Increased 5XX

Incident Report for Quickwork

Resolved

Incident Summary:

On 2nd May 2025, Quickwork’s APIM Gateway in the Mumbai region experienced a service disruption for 11 minutes, starting at 9:22 AM IST. The issue occurred during a routine automatic upgrade to apply security patches on the host machine.

Root Cause:

During the upgrade process, the new APIM Gateway instances were unable to connect to the database due to a restrictive security group configuration. This security constraint blocked essential network access required for service registration. As a result:

1. New instances failed to register with the load balancer.
2. Existing instances were terminated before new ones became healthy and available.
3. This led to a brief outage in APIM Gateway availability.

Impact:

1. APIM Gateway was unavailable for 11 minutes.
2. Customer API traffic and service operations in the Mumbai region were briefly affected.

Remediation Actions:

1. Security Group Fix:
Corrected the security group rules to allow necessary connectivity between APIM services and the database.
2. Disruption Policy Implemented:
Applied a proper disruption policy to ensure new services are fully up and healthy before old ones are terminated during upgrades.
3. Preventive Configuration Check:
Added validation checks to avoid such misconfigurations in future deployments.
Posted May 02, 2025 - 03:52 UTC