Links

Content Skeleton

This Page

Previous topic

cms01 DNS HTTP 10 seconds delay

Next topic

HGPU01

Network Troubleshooting

Overview

The problem that inspired creation of this page turned out to be caused by the cms02 ethernet cable being plugged into the wrong NIC. How that happened following the powercut over the weekend remains unknown, but checking apache logs suggests it happened at about 10:26 on Monday morning.

RESOLVED as due to accidental miscabling

Lessons learned, regarding appropriate troubleshooting sequence:

  1. 1st priority check apache and other logs to piece together the sequence of events (its kinda uncomfortable doing this on the console in server room, but its necessary)
  2. check which NIC the cable is attached to first, label the correct one

Sequence of events

The reason it took a while to work this out is that the other NIC worked partially (IPV6 traffic was moving).

01/Aug/2014:22:20:41 +0800   last apache access before power outage on Friday evening
04/Aug/2014:08:11:07 +0800   first apache access after power regained and system rebooted on Monday morning
04/Aug/2014:10:25:28 +0800   last apache access at ~10:25 Monday
blyth    tty1                          Tue Aug  5 11:14 - 11:16  (00:01)
reboot   system boot  2.6.9-78.0.22.EL Mon Aug  4 11:46         (2+00:01)
root     tty1                          Mon Aug  4 11:23 - 11:36  (00:12)
reboot   system boot  2.6.9-78.0.22.EL Mon Aug  4 11:23         (2+00:24)
root     tty1                          Mon Aug  4 11:21 - down   (00:00)
blyth    tty1                          Mon Aug  4 11:12 - 11:12  (00:00)
reboot   system boot  2.6.9-78.0.22.EL Mon Aug  4 08:10          (03:11)

pinging

  1. Pinging 127.0.0.1 (loopback) tests that your TCP/IP stack isn’t corrupt.
  2. Pinging your ip address address (e.g., 192.168.1.100) tests if your NIC is functioning.
  3. Pinging your gateway tests if your LAN is working.
    • as root: netstat -nr to findout the gateway
  4. Pinging an external address tests if routing and your internet connection are working.

Troubleshooting Tools

ifconfig

  1. The ifconfig command without any arguments gives you all the active interfaces on your system.
    • Interfaces will not appear if they are shut down:
    • Interfaces will appear if they are activated, but have no link.
    • look for UP in the ifconfig output
    • An interface with a 169.254.x.x address signifies a failure to communicate with the DHCP server.
  2. Bring interfaces up/down with ifconfig eth0 up

ethtool

[root@cms01 blyth]# which ethtool
/sbin/ethtool

[root@cms01 blyth]# ethtool eth0
Settings for eth0:
    Supported ports: [ TP MII ]
    Supported link modes:   10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
    Advertised auto-negotiation: Yes
    Speed: 100Mb/s
    Duplex: Full
    Port: MII
    PHYAD: 24
    Transceiver: internal
    Auto-negotiation: on
    Current message level: 0x00000001 (1)
    Link detected: yes
[root@cms01 blyth]# ethtool eth1
Settings for eth1:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
                            1000baseT/Full
    Advertised auto-negotiation: Yes
    Speed: Unknown! (65535)
    Duplex: Unknown! (255)
    Port: Twisted Pair
    PHYAD: 0
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: d
    Wake-on: d
    Current message level: 0x00000007 (7)
    Link detected: no
[root@cms01 blyth]# ethtool -S eth0
NIC statistics:
     tx_deferred: 0
     tx_multiple_collisions: 0
     rx_bad_ssd: 26
[root@cms01 blyth]# ethtool -S eth1
NIC statistics:
     rx_packets: 0
     tx_packets: 0
     rx_bytes: 0
     tx_bytes: 0
     ...             # many other stats, all 0

mii-tool

  1. reveals that cms02 has inet6 interface that succeeds to come up ?
[root@cms01 blyth]# mii-tool -v
eth0: negotiated 100baseTx-FD, link ok
  product info: vendor 00:10:5a, model 0 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
eth1: no link
  product info: vendor 00:50:43, model 2 rev 3
  basic mode:   autonegotiation enabled
  basic status: no link
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
[root@cms01 blyth]#

arp

arp command lets you view and manage the Address Resolution Protocol (ARP) cache.

As DNS translates between host names and IP addresses, ARP translates between MAC addresses (Layer 2) and IP addresses (Layer 3). When a host attempts to communicate with another host on the same subnet, it must first know the destination host’s MAC address. If there is no entry in the sending host’s ARP cache for the destination MAC address, ARP sends out a broadcast (to all hosts in the subnet) asking the host with the target IP address to send back its MAC address. These IP-to-MAC mappings build up in the ARP cache which the arp command lets you view and modify.

IP address to MAC address cache, was empty on cms02 had just the gateway on cms01:

[root@cms01 blyth]# arp -a

dmesg

The boot output from miscabled cms02 had:

ADDRCONF(NETDEV_UP): eth0: link is not ready

netstat

Activity on the interfaces:

[root@cms01 blyth]# netstat -i
Kernel Interface table
Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0       1500   0 14542429      0      0      1  3798504      0      0      0 BMRU
eth1       1500   0        0      0      0      0        0      0      0      0 BMU
lo        16436   0     1468      0      0      0     1468      0      0      0 LRU

Find the gateway:

[root@cms01 blyth]# netstat -nr
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
140.112.101.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth1
192.168.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth1
0.0.0.0         140.112.101.254 0.0.0.0         UG        0 0          0 eth0

Ping the gateway, works on cms01 got Destination Unreachable on miscabled cms02:

[root@cms01 blyth]# ping 140.112.101.254
PING 140.112.101.254 (140.112.101.254) 56(84) bytes of data.
64 bytes from 140.112.101.254: icmp_seq=0 ttl=255 time=0.657 ms
64 bytes from 140.112.101.254: icmp_seq=1 ttl=255 time=5.24 ms
64 bytes from 140.112.101.254: icmp_seq=2 ttl=255 time=0.593 ms

--- 140.112.101.254 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.593/2.164/5.243/2.177 ms, pipe 2

Checking configuration

[root@cms01 blyth]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

Stop/start network service

service network restart

cms02 inet6 redherring

Try connecting cms02 via inet6 interface:

[root@cms01 blyth]# ping6 -I eth1 fe80::207:e9ff:fe13:ea50/64
unknown host

[root@cms01 blyth]# ping6 -I eth1 fe80::207:e9ff:fe13:ea50
connect: Network is unreachable
[root@cms01 blyth]# ping6 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=0 ttl=64 time=0.062 ms
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=0.019 ms


[root@cms01 blyth]# route -A inet6
Kernel IPv6 routing table
Destination                                 Next Hop                                Flags Metric Ref    Use Iface

Checklist

If you can ping both the loopback address and your own IP address but not hosts in the local subnet, try to clear out the ARP cache and reload it. This can be done by using the Arp utility on the command line interface (CLI). First display the cache entries with the arp -a or arp -g commands. Delete the entries with arp -d <IP address>. For the full list of options available for the arp command, go here.

Check Config

[blyth@cms01 network-scripts]$ cat /etc/sysconfig/network
[blyth@cms01 network-scripts]$ cat ifcfg-eth0

The global directive NETWORKING_IPV6 is required in the /etc/sysconfig/network conf file to globally enable IPv6 static, DHCP, or autoconf configuration. Refer to Section D.1.13, “/etc/sysconfig/network”

[blyth@cms01 network-scripts]$ rpm -qf /etc/sysconfig/network
file /etc/sysconfig/network is not owned by any package