For large networks, e.g., that include hundreds of thousands to millions of network devices, the event of a packet drop somewhere in the network is fairly frequent. Packet drops can result in data loss. Analyzing and recovering from packet drops is currently a time-consuming process. This disclosure describes techniques that incorporate packet-drop intelligence into network devices, such that debugging and root-cause analysis of packet drops is made thorough, efficient, and real-time. A debugging engine is incorporated into a network device such as a switch and runs as part of the switch operating system. The engine leverages the device hardware to perform various counter collections, trap network flows, collect debugging data, inject debug packets, etc. The techniques enable efficient and economic debugging of network issues such as connectivity problems, control-plane protocol time-outs, memory errors, hardware failures, congestion drops, etc.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
N/A, "Fast Packet-drop Analysis and Network Self-Recovery", Technical Disclosure Commons, (September 17, 2020)