Queueing in data networks

Modern data networks route or switch relatively small ‘packets’ of data across shared links that along with the switching nodes, form the wider data network.

One of the roles of the switches is to receive packets on one link, and send them onwards on the most appropriate link. Since links may be at different speeds, and many links may source packets to be sent on any link, there exists a mechanism in the switch to store packets pending transmission, in the simplest case it is a first come first served link queue.

The function of the queue then is to hold packets until they can be sent on the link, and to offer them in first come first served order. That raises two important questions:

  • how long will packets be delayed;
  • how many slots does the queue need.

Queuing theory gives us a method of estimating these quantities.

Lets make some assumptions about the traffic:

  • service requests arrive randomly in time; and
  • service time is exponentially distributed with an average time of 1.

screenshot-19_10_16-08_25_32

Above is a plot of normalised average response time (service time + queue wait) vs resource (link) utilisation (pu means per unit). It can be seen that when the link utilisation is 0.5pu (50% busy), that response time is 2pu (ie twice the average service time), twice that needed to send an average packet at very low utilisation. Response time rapidly degrades:

  • at 70% link busy, response time is 3.3 times packet transmission time; and
  • at 90% link busy, response time is 10 times packet transmission time.

To ensure packets are not discarded, the queue need sufficient slots to hold packets even in most peak bursts. Whilst at 70% link utilisation, the average queue size is 2.3 slots (3.3-1), a larger queue size accommodates bursts better. Discarded packets can severely affect performance, not only are they likely to be resent after some delay and network overhead, they can break a higher level protocol unit in simple systems and waste the link capacity and other links used to send the rest of that protocol unit.

clip-233Above is a plot of the response times of 500 pings to an IP router. The same test to the prior router is clustered very tightly around 8ms, and a huge blowout in service time on around 20% of the pings is apparent between the two routers, almost certainly due to the connecting link running at excessive utilisation (ie, it is not big enough).

There are several mechanisms by which excessive link utilisation rapidly spoils network performance. Whilst it might seem wasteful to have resource at less than 100% utilisation, it is essential to good performance.

The bottom line is that shared links in such networks cannot be run above about 70% average utilisation without serious degradation in performance.