z-logo
open-access-imgOpen Access
Service-Level Objective-Aware Load-Adaptive Timeout: Balancing Failure Rate and Latency in Microservices Communication
Author(s) -
Hiroki Hanada,
Kiesuke Ishibashi
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3596118
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Microservices architectures enable scalable and modular application design but introduce reliability challenges due to their distributed nature. Timeout configurations are critical for maintaining system reliability, as they directly impact latency and failure rate Service-Level Objective (SLO) compliance. However, current timeout settings are often based on best practices rather than systematic optimization, as determining the optimal timeout is challenging. The difficulty arises from the need to balance SLO constraints while adapting to dynamically changing load and system capacity, making static configurations inherently suboptimal. To address this, this study proposes a load-adaptive timeout mechanism that dynamically adjusts timeout values to optimize reliability across different load conditions. Under normal load, the method minimizes latency while maintaining failure rate SLO compliance. Under overload, where meeting both objectives becomes infeasible, it prioritizes failure rate reduction while ensuring latency SLO compliance. By allocating the initial portion of the timeout duration to transmission delay during downstream overload and failure, the method naturally exhibits load shedding and circuit-breaking behavior, preventing the bottleneck service from being overwhelmed. The proposed method was implemented as an open-source Go library and evaluated using the Online Boutique benchmark under various load conditions. Results show that it reduces average and tail latencies by 40% and 55%, respectively, under normal load and short-lived overload. Under prolonged overload, it minimizes failure rates, reducing deviations from the failure rate SLO by 18%. These findings demonstrate the effectiveness of adaptive timeout control in maintaining microservices reliability while dynamically responding to changing system conditions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom