Webhook Circuit Breaker (WCB)

This article explains the functionality of how the WCB incident happens and how it impacts the business.

Written By Ops UrbanPiper (Collaborator)

Updated at December 30th, 2021

As you might already be aware we have a retry logic (3 attempts) in place for dealing with request failures (Read timeout/ Bad Request/ Connection time-out) while making the webhook callback to 3rd party URLs. In order to keep our infra stable and to avoid making repeated retries to those systems whose webhook endpoints are DOWN for prolonged time or throwing the same set of BAD REQUEST, again and again, we have come up with an implementation termed as a - "Webhook Circuit Breaker" (WCB).

For our system when communicating to third-party systems, we have kept a Connection timeout of 3 seconds and Read timeout of 5 seconds. It is expected that any third party listening to our webhooks data is responded back within this time window.

When a 3rd party system URL throws more than 15 request failures within a minute for particular biz, our system detects that breach and automatically disables all the webhook endpoints configured for that biz. At this point, the webhook callback is said to have tripped the circuit breaker. The disabled webhook endpoints will be of the same domain originated URLs.

When the WCB is tripped, no webhook payloads will be pushed to the particular 3rd party URLs. If even a single webhook callback URL (say Rider Status Update) associated with the 3rd party breaches the throttle, then other webhook callbacks associated with the same hostname will also be disabled. Our system checks from which domain/hostname we are receiving the failures and when the failure rate is breached, our system automatically disables all the webhook endpoints associated with that same domain URL.

The time for which all webhooks configured for a particular host will be deactivated depends on the number of times the WCB has tripped in the past week for the same host. If there have been >= 5 incidents in the past week, all webhooks will be disabled for 3 mins. If the number of incidents is less than 5, then the webhooks will be disabled for 1 min.

For the orders which were not relayed in the timeframe of webhook deactivation, our system tries to push them at once when the webhooks are re-enabled. When we try to re-push the data and at that time if the third-party system is not responding again, the WCB will get triggered again. 

We have an email notification trigger in place when the WCB gets tripped. Please check with the concerned ACM/OM/Integration PoC to get it configured for the business you on board to our platform and also share the list of email ids to get the notifications. You can also configure your email ids under the Contact Emails field in Quint's Business Profile page to receive email notifications.

The email alert will provide you with the details of:

  • short information about WCB
  • which business was impacted
  • which all URLs got disabled
  • Count of WCB tripped in last 7 days
  • Status code

The email alert notification will have a subject - Webhooks disabled: {{biz_name}}

Note: When the webhooks are enabled and our system does try to push the unsent orders at once to third party URLs. In this event, whatever the state of the order present in the Urbanpiper system, the same will be present in the Order Relay payload under the state/order_state attribute.

For example - At the time of order push in retry attempt, if the state of the order is Acknowledged in our system due to status change happens from external tool say - satellite, then in the Order Placed payload, the state/order_state attribute will have the Acknowledged status instead of Placed. Make sure that you do not expect only a Placed state value to be passed in that attribute.

In case if you have any questions, please reach out to us.

Was this article helpful?