|
|
  In this article we will show how to monitor network health from the client perspective using our AreWeDown tool. We will then disrupt communication from the client perspective to the server by using a ping flood, and will solve the problem using traffic shaping.
Let's start out with a healthy network:
| 2005-08-06 08:13:50 | are@10.50.100.190 | 101 |
| 2005-08-06 08:14:12 | are@10.10.10.11 | 100 |
| 2005-08-06 08:14:12 | are@10.10.10.11 | 101 |
| 2005-08-06 08:14:20 | are@10.50.100.190 | 100 |
| 2005-08-06 08:14:20 | are@10.50.100.190 | 101 |
| 2005-08-06 08:14:42 | are@10.10.10.11 | 100 |
| 2005-08-06 08:14:42 | are@10.10.10.11 | 101 |
| 2005-08-06 08:14:50 | are@10.50.100.190 | 100 |
| 2005-08-06 08:14:50 | are@10.50.100.190 | 101 |
| 2005-08-06 08:15:12 | are@10.10.10.11 | 100 |
| 2005-08-06 08:15:12 | are@10.10.10.11 | 101 |
| 2005-08-06 08:15:20 | are@10.50.100.190 | 100 |
| 2005-08-06 08:15:20 | are@10.50.100.190 | 101 |
|
See this article for information on the utility we are using to test with. Basically, this measures network health by showing how long it takes to make two consecutive TCP requests from the client's perspective. The 101 entry is sent right after the 100 entry, so they should be pretty much happening at the same time.
A ping looks like this:
[root@srv-1 usr-1]# ping 10.10.10.11
PING 10.10.10.11 (10.10.10.11) 56(84) bytes of data.
64 bytes from 10.10.10.11: icmp_seq=0 ttl=127 time=18.6 ms
64 bytes from 10.10.10.11: icmp_seq=1 ttl=127 time=18.7 ms
64 bytes from 10.10.10.11: icmp_seq=2 ttl=127 time=18.6 ms
64 bytes from 10.10.10.11: icmp_seq=3 ttl=127 time=18.7 ms
64 bytes from 10.10.10.11: icmp_seq=4 ttl=127 time=18.6 ms
--- 10.10.10.11 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 18.623/18.697/18.762/0.053 ms, pipe 2
[root@srv-1 usr-1]#
|
Our router stats look like this:
router#show interfaces Async 5
Async5 is up, line protocol is up
Hardware is Async Serial
Internet address is 10.10.10.10/24
MTU 1500 bytes, BW 9 Kbit, DLY 100000 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation PPP, loopback not set
Keepalive not set
DTR is pulsed for 5 seconds on reset
LCP Open
Open: IPCP
Last input 00:00:09, output 00:00:09, output hang never
Last clearing of "show interface" counters 00:29:33
Input queue: 1/75/0 (size/max/drops); Total output drops: 0
Queueing strategy: weighted fair
Output queue: 0/1000/64/0 (size/max total/threshold/drops)
Conversations 0/1/16 (active/max active/max total)
Reserved Conversations 0/0 (allocated/max allocated)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
615 packets input, 45313 bytes, 0 no buffer
Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
1 input errors, 1 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
565 packets output, 35812 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 output buffer failures, 0 output buffers swapped out
0 carrier transitions
router#
|
This is running a workstation via PPP through the Aux port on a Cisco 1720 router we have in our lab. More information on this configuration is available in this article.
Now, let's kill the network connection with a ping flood:
[root@srv-1 usr-1]# ping -f -s 1000 10.10.10.11
PING 10.10.10.11 (10.10.10.11) 1000(1028) bytes of data.
................................................................................
|
You can see the ping stats climb:
64 bytes from 10.10.10.11: icmp_seq=3 ttl=127 time=18.6 ms
64 bytes from 10.10.10.11: icmp_seq=4 ttl=127 time=18.6 ms
64 bytes from 10.10.10.11: icmp_seq=5 ttl=127 time=4882 ms
64 bytes from 10.10.10.11: icmp_seq=34 ttl=127 time=6474 ms
64 bytes from 10.10.10.11: icmp_seq=47 ttl=127 time=6697 ms
64 bytes from 10.10.10.11: icmp_seq=53 ttl=127 time=6787 ms
64 bytes from 10.10.10.11: icmp_seq=68 ttl=127 time=7011 ms
64 bytes from 10.10.10.11: icmp_seq=69 ttl=127 time=6935 ms
64 bytes from 10.10.10.11: icmp_seq=87 ttl=127 time=7327 ms
64 bytes from 10.10.10.11: icmp_seq=88 ttl=127 time=7252 ms
|
We are starting to see some delays between 100 and 101 on the AreWeDown tool:
| 2005-08-06 08:25:17 | are@10.10.10.11 | 100 |
| 2005-08-06 08:25:18 | are@10.10.10.11 | 101 |
| 2005-08-06 08:25:21 | are@10.50.100.190 | 100 |
| 2005-08-06 08:25:21 | are@10.50.100.190 | 101 |
| 2005-08-06 08:25:49 | are@10.10.10.11 | 100 |
| 2005-08-06 08:25:50 | are@10.50.100.190 | 100 |
| 2005-08-06 08:25:50 | are@10.50.100.190 | 101 |
| 2005-08-06 08:25:51 | are@10.10.10.11 | 101 |
|
Our output queue is at the drop threshold on the router:
router#show interfaces Async 5
Async5 is up, line protocol is up
Hardware is Async Serial
Internet address is 10.10.10.10/24
MTU 1500 bytes, BW 9 Kbit, DLY 100000 usec,
reliability 255/255, txload 141/255, rxload 140/255
Encapsulation PPP, loopback not set
Keepalive not set
DTR is pulsed for 5 seconds on reset
LCP Open
Open: IPCP
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters 00:35:37
Input queue: 1/75/0 (size/max/drops); Total output drops: 7997
Queueing strategy: weighted fair
Output queue: 64/1000/64/7997 (size/max total/threshold/drops)
Conversations 1/2/16 (active/max active/max total)
Reserved Conversations 0/0 (allocated/max allocated)
5 minute input rate 49000 bits/sec, 11 packets/sec
5 minute output rate 50000 bits/sec, 11 packets/sec
2347 packets input, 1714050 bytes, 0 no buffer
Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
3 input errors, 2 CRC, 0 frame, 1 overrun, 0 ignored, 0 abort
2324 packets output, 1734040 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 output buffer failures, 0 output buffers swapped out
0 carrier transitions
router#
|
Things are getting worse:
| 2005-08-06 08:27:26 | are@10.10.10.11 | 100 |
| 2005-08-06 08:27:29 | are@10.10.10.11 | 101 |
| 2005-08-06 08:27:51 | are@10.50.100.190 | 100 |
| 2005-08-06 08:27:51 | are@10.50.100.190 | 101 |
| 2005-08-06 08:27:58 | are@10.10.10.11 | 100 |
| 2005-08-06 08:28:02 | are@10.10.10.11 | 101 |
|
Four seconds, now, between 100 and 101. Our ping:
64 bytes from 10.10.10.11: icmp_seq=209 ttl=127 time=8998 ms
64 bytes from 10.10.10.11: icmp_seq=215 ttl=127 time=9254 ms
64 bytes from 10.10.10.11: icmp_seq=217 ttl=127 time=9186 ms
64 bytes from 10.10.10.11: icmp_seq=262 ttl=127 time=9989 ms
|
We are now unresponsive:
| 2005-08-06 08:29:36 | are@10.10.10.11 | 100 |
| 2005-08-06 08:29:41 | are@10.10.10.11 | 101 |
| 2005-08-06 08:29:51 | are@10.50.100.190 | 100 |
| 2005-08-06 08:29:51 | are@10.50.100.190 | 101 |
| 2005-08-06 08:30:21 | are@10.50.100.190 | 100 |
| 2005-08-06 08:30:21 | are@10.50.100.190 | 101 |
|
We should see an entry every 30 seconds, but .11 has stopped sending requests. Our ping responses are climbing as well:
64 bytes from 10.10.10.11: icmp_seq=385 ttl=127 time=11831 ms
64 bytes from 10.10.10.11: icmp_seq=386 ttl=127 time=11756 ms
64 bytes from 10.10.10.11: icmp_seq=387 ttl=127 time=11680 ms
64 bytes from 10.10.10.11: icmp_seq=388 ttl=127 time=11604 ms
64 bytes from 10.10.10.11: icmp_seq=406 ttl=127 time=12122 ms
|
Our router stats:
Async5 is up, line protocol is up
Hardware is Async Serial
Internet address is 10.10.10.10/24
MTU 1500 bytes, BW 9 Kbit, DLY 100000 usec,
reliability 255/255, txload 252/255, rxload 255/255
Encapsulation PPP, loopback not set
Keepalive not set
DTR is pulsed for 5 seconds on reset
LCP Open
Open: IPCP
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters 00:40:27
Input queue: 1/75/0 (size/max/drops); Total output drops: 23735
Queueing strategy: weighted fair
Output queue: 64/1000/64/23735 (size/max total/threshold/drops)
Conversations 1/2/16 (active/max active/max total)
Reserved Conversations 0/0 (allocated/max allocated)
5 minute input rate 88000 bits/sec, 10 packets/sec
5 minute output rate 91000 bits/sec, 11 packets/sec
5632 packets input, 4974777 bytes, 0 no buffer
Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
3 input errors, 2 CRC, 0 frame, 1 overrun, 0 ignored, 0 abort
5652 packets output, 5042433 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 output buffer failures, 0 output buffers swapped out
0 carrier transitions
router#
|
Our output queue is still at the threshold and we are dropping a lot of packets. One fix would be to disallow ICMP. But, we are going to assume that we want ICMP allowed. Another fix for this is to apply traffic shaping:
router#conf term
Enter configuration commands, one per line. End with CNTL/Z.
router(config)#int Async 5
router(config-if)#traffic-shape rate 80000
router(config-if)#exit
router(config)#exit
router#
|
Our output queue is back down:
router#show interfaces Async 5
Async5 is up, line protocol is up
Hardware is Async Serial
Internet address is 10.10.10.10/24
MTU 1500 bytes, BW 9 Kbit, DLY 100000 usec,
reliability 255/255, txload 18/255, rxload 255/255
Encapsulation PPP, loopback not set
Keepalive not set
DTR is pulsed for 5 seconds on reset
LCP Open
Open: IPCP
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters 00:43:08
Input queue: 1/75/0 (size/max/drops); Total output drops: 32551
Queueing strategy: weighted fair
Output queue: 0/1000/64/30046 (size/max total/threshold/drops)
Conversations 0/2/16 (active/max active/max total)
Reserved Conversations 0/0 (allocated/max allocated)
5 minute input rate 89000 bits/sec, 11 packets/sec
5 minute output rate 86000 bits/sec, 10 packets/sec
7439 packets input, 6773226 bytes, 0 no buffer
Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
11 input errors, 10 CRC, 0 frame, 1 overrun, 0 ignored, 0 abort
7475 packets output, 6851326 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 output buffer failures, 0 output buffers swapped out
0 carrier transitions
router#
|
Our client can now talk again:
| 2005-08-06 08:33:21 | are@10.50.100.190 | 101 |
| 2005-08-06 08:33:51 | are@10.50.100.190 | 100 |
| 2005-08-06 08:33:51 | are@10.50.100.190 | 101 |
| 2005-08-06 08:34:21 | are@10.50.100.190 | 100 |
| 2005-08-06 08:34:21 | are@10.50.100.190 | 101 |
| 2005-08-06 08:34:51 | are@10.50.100.190 | 100 |
| 2005-08-06 08:34:51 | are@10.50.100.190 | 101 |
| 2005-08-06 08:35:08 | are@10.10.10.11 | 100 |
| 2005-08-06 08:35:10 | are@10.10.10.11 | 101 |
| 2005-08-06 08:35:20 | are@10.10.10.11 | 100 |
| 2005-08-06 08:35:21 | are@10.50.100.190 | 100 |
| 2005-08-06 08:35:21 | are@10.50.100.190 | 101 |
| 2005-08-06 08:35:21 | are@10.10.10.11 | 101 |
| 2005-08-06 08:35:43 | are@10.10.10.11 | 100 |
| 2005-08-06 08:35:43 | are@10.10.10.11 | 101 |
|
Now, the ping times from the flood machine are still high:
64 bytes from 10.10.10.11: icmp_seq=538 ttl=127 time=14117 ms
64 bytes from 10.10.10.11: icmp_seq=557 ttl=127 time=14413 ms
64 bytes from 10.10.10.11: icmp_seq=558 ttl=127 time=14338 ms
64 bytes from 10.10.10.11: icmp_seq=559 ttl=127 time=18114 ms
|
But, from another machine:
[root@pippi ~]# ping 10.10.10.11
PING 10.10.10.11 (10.10.10.11) 56(84) bytes of data.
64 bytes from 10.10.10.11: icmp_seq=0 ttl=54 time=80.3 ms
64 bytes from 10.10.10.11: icmp_seq=1 ttl=54 time=81.5 ms
64 bytes from 10.10.10.11: icmp_seq=2 ttl=54 time=81.5 ms
64 bytes from 10.10.10.11: icmp_seq=3 ttl=54 time=82.8 ms
|
If we then ping flood from that same host, it will eventually drop most of the packets:
64 bytes from 10.10.10.11: icmp_seq=2 ttl=54 time=774 ms
64 bytes from 10.10.10.11: icmp_seq=8 ttl=54 time=763 ms
--- 10.10.10.11 ping statistics ---
27 packets transmitted, 2 received, 92% loss, time 26204ms
rtt min/avg/max/mdev = 763.927/769.305/774.684/5.449 ms
[root@mondo root]#
|
This is as it should be.
For more information on Cisco traffic shaping, see this article.
|
|