Explanation about load balance endpoints and endpoint suspension

In this post i will provide explanation about load balance endpoints and endpoint suspension. If we have 2 endpoints in load balanced manner then it will behave as below.
If both endpoints are in working condition.
Requests with route to endpoint_1 and then next request will go to endpoint_2. Likewise it will repeat.
If endpoint_1 failed to serve requests.
If endpoint_1 failed to serve requests when load balanced endpoints used it will detect endpoint_1 failure and then route request to endpoint_2. You can see details in following log. It says it detect one endpoint failure and endpoint suspended for 30 seconds.
[2017-05-30 23:08:26,152] WARN - ConnectCallback Connection refused or failed for : /172.17.0.1:8081
[2017-05-30 23:08:26,153] WARN - EndpointContext Endpoint : admin--CalculatorAPI_APIproductionEndpoint_0_0 will be marked SUSPENDED as it failed
[2017-05-30 23:08:26,154] WARN - EndpointContext Suspending endpoint : admin--CalculatorAPI_APIproductionEndpoint_0_0 - last suspend duration was : 30000ms and current suspend duration is : 30000ms - Next retry after : Tue May 30 23:08:56 IST 2017
[2017-05-30 23:08:26,154] WARN - LoadbalanceEndpoint Endpoint [admin--CalculatorAPI_APIproductionEndpoint_0] Detect a Failure in a child endpoint : Endpoint [admin--CalculatorAPI_APIproductionEndpoint_0_0]
After you see above log in output it will not route requests to endpoint_1 for 30 seconds(30000ms). Then if you send request after 30 seconds it will again route to endpoint_1 and since its not available request will go to endpoint_2. Same cycle repeats until endpoint_1 available to serve requests.
If both endpoint_1 and endpoint_2 failed.
Then it will go to endpoint_1 and once failure detect it will go to endpoint_2. Then once it realized all endpoint belong to that load balanced endpoint it will not accept further requests and send error message. It will not go into loop and it will go through all endpoints onetime and stop processing request(by sending proper error). Please see below logs.
Detect first endpoint_1 failure
[2017-05-30 23:41:58,643] WARN - ConnectCallback Connection refused or failed for : /172.17.0.1:8081
[2017-05-30 23:41:58,646] WARN - EndpointContext Endpoint : admin--CalculatorAPI_APIproductionEndpoint_0_0 will be marked SUSPENDED as it failed
[2017-05-30 23:41:58,648] WARN - EndpointContext Suspending endpoint : admin--CalculatorAPI_APIproductionEndpoint_0_0 - last suspend duration was : 70000ms and current suspend duration is : 70000ms - Next retry after : Tue May 30 23:43:08 IST 2017
[2017-05-30 23:41:58,648] WARN - LoadbalanceEndpoint Endpoint [admin--CalculatorAPI_APIproductionEndpoint_0] Detect a Failure in a child endpoint : Endpoint [admin--CalculatorAPI_APIproductionEndpoint_0_0]

 
Detect endpoint_2 failure
[2017-05-30 23:41:58,651] WARN - ConnectCallback Connection refused or failed for : /172.17.0.1:8080
[2017-05-30 23:41:58,654] WARN - EndpointContext Endpoint : admin--CalculatorAPI_APIproductionEndpoint_0_1 will be marked SUSPENDED as it failed
[2017-05-30 23:41:58,656] WARN - EndpointContext Suspending endpoint : admin--CalculatorAPI_APIproductionEndpoint_0_1 - current suspend duration is : 30000ms - Next retry after : Tue May 30 23:42:28 IST 2017
[2017-05-30 23:41:58,657] WARN - LoadbalanceEndpoint Endpoint [admin--CalculatorAPI_APIproductionEndpoint_0] Detect a Failure in a child endpoint : Endpoint [admin--CalculatorAPI_APIproductionEndpoint_0_1]

Once it realized both load balanced endpoint failed it will print error saying no child endpoints to process requests.
[2017-05-30 23:41:58,657] WARN - LoadbalanceEndpoint Loadbalance endpoint : admin--CalculatorAPI_APIproductionEndpoint_0 - no ready child endpoints
[2017-05-30 23:41:58,667] INFO - LogMediator STATUS = Executing default 'fault' sequence, ERROR_CODE = 101503, ERROR_MESSAGE = Error connecting to the back end

When endpoint suspension happens, it will work as follows.
For the suspend duration after the first failure, this equation does not applies. Also when endpoint changed from active to suspend state, suspension duration will be exactly initial duration.

This equation only applies when endpoint already in suspended state and suspension duration expired.

next suspension time period = Min (Initial suspension duration * Progression Factor , Max suspend time).

2 comments:

  1. Hey Sanjeewa,The above content was quite insightful.
    I am using wso2 in load balancing mode and is facing a peculiar issue.When I configure my API in my primary server the same API is not getting reflected at my secondary server.Both my servers are using postgres DB.If you have come across similar issues any suggestion would be great help.

    PS:A work around for this issue is I manually have to put the API xml in my secondary server.

    ReplyDelete
  2. Hi Jecob,
    In order to replicate API across multiple servers we will need to configure something to do deployment synchronization. These articles may help you to understand concept and implementation details.

    https://docs.wso2.com/display/AM210/Configuring+rsync+for+Deployment+Synchronization
    https://docs.wso2.com/display/AM210/Configuring+an+Active-Active+Deployment

    Thanks,
    sanjeewa.

    ReplyDelete

Empowering the Future of API Management: Unveiling the Journey of WSO2 API Platform for Kubernetes (APK) Project and the Anticipated Alpha Release

  Introduction In the ever-evolving realm of API management, our journey embarked on the APK project eight months ago, and now, with great a...