A microburst in nature is a localized column of sinking air (downdraft) within a thunderstorm, usually no more than 2.5 miles in diameter, and typically a lot less. Microbursts can cause extensive damage at the surface, and in some instances, can be life-threatening.

In computer networks, a microburst is defined as a short-term burst of traffic, typically lasting only milliseconds, which saturates the link (Ethernet, Gigabit, 10 Gigabit, etc.). A microburst is a serious network concern, since even short-term network saturation means some users are blocked during the period of saturation. Since the de-facto industry standard for the measurement of network utilization is bit per second (bps), microbursts often go undetected since they get averaged out over a second. In most cases, network monitoring systems don´t alert on the saturation because it doesn’t exist over a full second. End-user experience can range from nothing, if enough network traffic is buffered, to performance bottlenecks caused by slower throughput or, worse yet, connection drops.

In order to identify a microburst, precise measurement of the network traffic on a link at microsecond granularity, along with at least millisecond visualization is required. Here’s a real-world example of how to identify a microburst.

In this example, the measurement point is at a TAP inserted into the 10 Gbps link of a data center connection. We measured 45 seconds of network traffic using a Savvius Omnipliance TL. The Expert system of Omnipeek immediately alerts on irregularities on OSI layers 2 to 7. These alerts can be sorted based on any of the available columns, including by count, layer, etc. In this case we sort by count and see TCP retransmissions, “Non Responsive” peer alerts, slow acknowledgements, etc.

blog1

Picture 1: Omnipeek Expert system with flows categorized by protocols/applications and Expert events sorted by number of occurrences.

blog2

Picture 2: Graph of overall utilization at one second granularity along with top applications.

When network utilization is graphed using the typical bps as in Picture 2, the maximum full duplex peak is 2.54 Gbps – nothing to worry about on a 10 Gbps link with a full duplex capacity of 20 Gbps (send and receive – 10 Gbps in each direction).

One thing we notice in the Compass Expert Events summary is that there are a fairly large number of events relating to slow network issues, especially for a 45 second capture. Compass can graph the occurrence of Expert events in time, and by doing so it is clear that there is a similarity in the gradient between the Expert events and the overall network utilization:

blog3

Picture 3: Omnipeek´s Compass feature can graph the occurrence of Expert events over time.

Since the number of slow network events is large, let us go back to the utilization graph and investigate the spikes more closely. We can drill down deeper and see millisecond granularity, and at this granularity we see multiple spikes up to 9.845 Mbit per millisecond. Transferred to seconds (simply multiplied with 1000), this would be 9.845 Gbps, which, if it happens in one direction, is filling up our 10 Gig link completely.

blog4

Picture 4: Network utilization in millisecond granularity with multiple spikes close to 10 Mbit per millisecond.

Interestingly, in Picture 4 the top protocol has changed to CIFS. So what happened?

 

blog5

Picture 5: The usual utilization with TCP traffic is purple; CIFS spikes are brown.

With normal utilization of up to 6 Mbit per millisecond of TCP traffic, CIFS spikes of up to 6 Mbit per millisecond push utilization to 12 Mbit per millisecond, and this simply exceeds the capacity for one direction of a 10Gbps link. At this point switches are not capable of buffering the traffic until the bursts are gone causing packets to drop and ultimately causing TCP retransmissions, as the Expert events show.

Savvius Omnipeek provides a very intuitive and cost efficient way to verify if microbursts occur in your network, and when, where, and how network performance is suffering. To start a free            30-day trial of Omnipeek today visit us here.

Written by: Matthias Lichtenegger