Quickwork Downtime(Mumbai) - Unscheduled

Incident Report for Quickwork

Resolved

Root Cause Analysis (RCA) Report

Incident Summary:Quickwork experienced a downtime of 19 minutes on February 12, 2025, at 10:10 AM IST in the Mumbai cluster region. This incident impacted all entry-point components, causing webhooks and APIs to return 503 error codes.

Root Cause:The issue was triggered due to an installation/upgrade process that modified authentication config maps. The modification introduced a syntactical error, rendering the configuration unacceptable to the underlying authentication service. Consequently, the entry points to the cluster became inaccessible.

Impact:

Webhooks & APIs: All entry-point components were affected, leading to 503 error responses.

Polling & CDC Services: These services were unaffected and resumed operations normally once the system recovered.

Resolution Steps:

Cluster Access Restoration (10 minutes): Due to the incorrect authentication configuration, an alternative authentication method was enabled to regain access to the cluster.

Configuration Fix (9 minutes): The incorrect authentication config maps were reverted to their correct syntax, restoring normal operations.

Preventive Measures:To mitigate the risk of similar incidents in the future, Quickwork has implemented the following actions:

Multi-Mode Authentication: Multiple forms of authentication are now enabled by default to ensure cluster accessibility even in case of configuration issues.

Enhanced Validation Checks: Additional validation mechanisms have been introduced to detect and prevent syntactical errors in authentication config maps before deployment.

Improved Rollback Procedures: Streamlined rollback mechanisms have been established to accelerate the restoration process in case of misconfigurations.

Conclusion:The issue resulted in a temporary service disruption but was promptly identified and resolved within 19 minutes. Quickwork apologizes for the inconvenience caused and remains committed to enhancing the reliability of our platform.

Date of Report: February 12, 2025
Prepared by: Quickwork Operations Team
Posted Feb 12, 2025 - 04:40 UTC