Abstract
It can be difficult to attribute network link errors that occur during software rollout to the old or new versions of the software. Reliable software rollout is important; yet network errors are costly to diagnose, and a new software rollout takes substantial time. This disclosure describes techniques to automatically diagnose network errors in a fleet of computers, to pinpoint the error to an old or a new software version, and to enable decision regarding whether to continue a software rollout or to abort it, e.g., rollback the new software version. Data from software stacks, error logs, network topology daemons, and monitoring tools is collected. A graph of affected machines is constructed based on link error propagation, software versions, error locations, error timing, etc. Rollout/rollback decisions are made based on the structure of the graph, e.g., based on the dominant sub-graph whose sum of link errors exceeds a threshold.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
NA, "Automatic Resolution of Network Link Errors That Occur During Software Rollout", Technical Disclosure Commons, (June 18, 2025)
https://www.tdcommons.org/dpubs_series/8247