Networking on Red Hat Enterprise Linux

The following post is an attempt at covering Linux Network Configuration end-to-end to a “bit better than reasonable level”.  The brevity of the post is by design since it is the sort of post that is mostly referred to as a reference or quick lookup guide to remind me, and others, of the name of that file, or that command that does…

As much as I love UNIX and Linux, since everything is a command or a file, the downside of that is the requirement of the knowledge up front to a certain extent (largely alleviated by Google these days) and in terms of the command line, is not that intuitive, even with the help of man pages.

Sometimes you just need to look something up that you know you’ve done before, but it was a few months ago or a year or two ago and you just need that post to point you back in the right direction.

 

You can configure a NIC on the fly with

ifconfig eth0 ip-address netmask subnet-mask

The permanent configuration that will be read at boot time or when the /etc/init.d/network restart occurs is held in /etc/sysconfig/network-scripts/ifcfg-eth0 etc

If you need to write a config file from scratch, use this as a template/guide

DEVICE=eth0

BOOTPROTO=static

IPADDR=ip-address

NETMASK=subnet-mask

HWADDR=pre-populated-MAC-address

ONBOOT=yes

USERCTL=no

MTU=1500

TYPE=Ethernet

ETHTOOL_OPTS=”

When you’re done, restart networking

/etc/init.d/network restart

and check they all come up.  If not, recheck the ifcfg-eth files in /etc/sysconfig/network-scripts, paying attention to the ONBOOT=yes line.

To test which of your physical nics corresponds to the linux os network device, disconnect a cable and use

ethtool eth0

paying attention to the bottom line which reads “link detected – YES” or “link detected – NO”

If there is a PCI NIC in the system, RHEL may assign it’s ports eth0 and eth1 taking priority over the embedded nics on the system board.  This is generally not an expected behaviour if you’re new to it.

check all network configurations with

ifconfig -a | less

check the DNS addresses are populated in /etc/resolv.conf and perform an nslookup to verify network connectivity as ping packets are often dropped by firewalls.

Setting a default gateway

You can configure a default gateway in /etc/sysconfig/network

e.g. Add the line

GATEWAY=<ip-of-default-router>

Speed and Duplex setting can be viewed using

ethtool eth1

and

dmesg | grep -i duplex

or using mii-tool

Display all active TCP ports along with process ID and name using the port

netstat -atp

Display routing table in numeric form

netstat -r -nr

Display all netstat statistics

netstat -as

List open files that are network related

lsof -i

MAC Address to Device listing

arp -v

Look for connected interfaces “link detected  -yes”

ethtool eth0

Display run levels where networking starts

chkconfig network –list

Display network status

/etc/init.d/network status   or  /sbin/service/network status

Display all network device configuration

ifconfig -a

Useful files where networking configuration is stored

    /etc/hosts       -will overrride other forms of name resolution contained in /etc/nsswitch.conf

/etc/resolv.conf       -contains the IP addresses of DNS servers used for name resolution in TCP/IP networks.

/etc/nsswitch.conf       -controls the order that names are resolved to IP addresses, i.e. files, nis, dns

/etc/sysconfig/network-scripts/ifcfg-eth0

Display interfaces and metrics

netstat -i

Create an SSH tunnel of port 2381 (hpsmh) on remote host to local port (use 1025 up)

ssh -f username@ip_address -L 1025:ip_address:2381 -N

i.e. browsing to http://localhost:1025 is the same as http://remotehost:2381

 Troubleshooting a NIC

Below is an example of a busy backup network interface on a backup server.  Note how its dropping packets etc.

eth4      Link encap:Ethernet  HWaddr 10:1F:74:8B:8F:8X

          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1

          RX packets:22053199483 errors:40041 dropped:18775 overruns:46 frame:0

          TX packets:8811133044 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:31314447740529 (28.4 TiB)  TX bytes:6356693939792 (5.7 TiB)

          Memory:fbec0000-fbee0000

 

Possible Causes of Ethernet Errors

Collisions: Signifies when the NIC card detects itself and another server on the LAN attempting data transmissions at the same time. Collisions can be expected as a normal part of Ethernet operation and are typically below 0.1% of all frames sent. Higher error rates are likely to be caused by faulty NIC cards or poorly terminated cables.

Single Collisions: The Ethernet frame went through after only one collision

Multiple Collisions: The NIC had to attempt multiple times before successfully sending the frame due to collisions.

CRC Errors: Frames were sent but were corrupted in transit. The presence of CRC errors, but not many collisions usually is an indication of electrical noise. Make sure that you are using the correct type of cable, that the cabling is undamaged and that the connectors are securely fastened.

Frame Errors: An incorrect CRC and a non-integer number of bytes are received. This is usually the result of collisions or a bad Ethernet device.

FIFO and Overrun Errors: The number of times that the NIC was unable of handing data to its memory buffers because the data rate the capabilities of the hardware. This is usually a sign of excessive traffic.

Length Errors: The received frame length was less than or exceeded the Ethernet standard. This is most frequently due to incompatible duplex settings.

Carrier Errors: Errors are caused by the NIC card losing its link connection to the hub or switch. Check for faulty cabling or faulty interfaces on the NIC and networking equipment.

 

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

Booting an in-band VirtualCentre Server VM from the ESXi console

If your VirtualCentre server is itself a VM, then it’ll be running on an ESXi host.  In the event that the ESXi host is restarted without vMotioning the VirtualCenter Server first (such as when the management network is irrecoveraby unresponsive), then depending on your environment, you may not be able to get a remote connection to the vm after the host has restarted.  In this scenario, you’d need to be able to boot the VM from the unsupported console.  This is how to do it.

Connect to the iLo or equivalent management interface to the ESX host, send an Alt-F1 and type unsupported followed by the root password to obtain a prompt on the unsupported console.

Identify the VM’s resident on the host

vim-cmd vmsvc/getallvms

Identify the current power state of the vm running virtual centre

vim-cmd vmsvc/power.getstate ##           where ## is the number of the vm identified above

Power on the vm

vim-cmd vmsvc/power.on ##                       where ## is the number of the vm identified above

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

ESXi hosts keep dropping out of vCenter cluster

If your esxi hosts in your cluster keep going into a Not Responding or Disconnected state, then the following things should be immediately checked – the DNS addresses on the ESXi hosts, the hosts files on the ESXi hosts, the Managed IP Address setting in vCenter, and the hosts file on the vCenter Server.

DNS addresses on the ESXi hosts management interface are both configured to point to the DNS servers containing the A records of the ESXi hosts and vCenter Server itself.  Check they’re not pointing to the wrong DNS servers and log onto the unsupported console and perform an nslookup of the vCenter Server to check.


The resilience of name resolution being dependent on an external service should be bolstered with hosts files at either end.

From the vSphere Client, log in to vCenter Server
Navigate to Administration > vCenter Server Settings > Runtime Settings and review the Managed IP Address setting.
Verify that the address is correct (use ipconfig to discover the correct IP address for the vCenter management (v)LAN NIC).  Check all octets for correctness – be aware that the octets may not match that of the ESXi hosts so check the design document / consult the infrastructure architect.

Correct the entry and click OK to save your changes and close the dialog.
Restart the ESXi host(s) if it’s locked up.  If not, Connect it back in to vCenter

If you need to restart it, and the vcenter server is running on  a vm on it, then you should try to connect to the vcenter server over rdp first and shut it down.

Note: Once the esxi host has restarted, then you can power on the in-band vc vm using the instructions here… http://www.cyberfella.co.uk/2012/05/01/booting-vm-from-console/

Disable HA on the Cluster.

Put ESXi host into Maintenance Mode, moving off any powered on or powered off VM’s
Remove host from all DvSwitches in Home, Inventory, Networking
Remove the host from Cluster
Add host back into Cluster to push out new management agents and config containing corrected ipaddress of vCenter.
Take out of Maintenance mode.

Re-add host to DvSwitches.

Repeat for all ESXi hosts that were unstable.

It should now all stabilise (90 second wait).  If so, re-enable HA on the cluster.   If not use the following knowledge base article from VMware to trouble shoot other potential areas such as the firewall betwen the esxi hosts and vcenter server (if present).

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003409

A few days after this was written, I noticed hosts rarely disconnecting, but it was still happening.  Adding a hosts file to each esxi host so that they can all resolve each others names with ot without DNS services being available, as well as that of the vcenter server, and clicking “reconfigure for vware ha” on each host from within vcenter, seems to have regained some stability.

The most immediate place to look for problems is the summary tab for each host in vcenter.  The trouble is, that this usually gives very little away, usually describing a symptom rather than describing possible reasons for it.  The best place to look is in the logs – not the messages log from the black and yellow console, but the vcenter and ha logs.   Log onto the unsupported console on the esxi hosts and tail the logs below.

/var/log/vmware/vpx/vpxa.log     Shows Agent can’t send heartbeat, No route to host. errors.

/var/log/vmware/aam/vmware_hostname.log       Shows date and timestamp and “Node hostname has started receiving heartbeats from node hostname”  informational events for intra-esxi host communications.

It’s worth noting that aam is a Legato heartbeat technology and is massively dependent on DNS being right.

Four days after writing this, the hosts once again began to enter a not responding state, followed by a disconnected state.

I have always suspected that cluster heartbeats are falling foul to log files being shipped to remote syslog servers.  In esxi, there is a lot of logging going on, some of it, such as the entries in /var/log/vmware/hostd.log are also replicated to /var/log/messages, effectively doubling the amount of logging, which then has to all be replicated to (in my case) two remote syslog servers.  This all amounts to a pretty continous stream of data travelling over the bonded physical nics that ultimately handle all traffic to not only the management network, but also vmotion.  What alerted me to the suspicion that this could be the cause of my problems, was slow vmotion when migrating guests between hosts.  Also, when running esxi on cisco ucs with snmp monitoring enabled, there is a lot of informational logging activity for hardware that is healthy (status Green).

Whilst my preference would be to split the bonded nics (no loss of redundancy on cisco UCS provided the vnics are set to failover at the ucs level),  separating management and vmotion traffic, I have massively reduced the amount of logs being generated by making the following edit in

/etc/vmware/hostd/config.xml

outputToSyslog=false

This stops the duplication of hostd log entries being written to /var/log/messages.  You may be able to make similar changes to other agents, to make further reductions – I don’t know.  It’s worth noting that if you make this change, you’ll need to issue the following command to restart hostd.

/etc/init.d/hostd restart

Another change I made was to create a new HA enabled cluster in virtualcentre, and after migrating all guests off each esxi host, place each host into maintenance mode and move it to the new cluster.  Upon taking the esxi hosts out of maintenance mode, some re-enabled/re-deployed ha agents successfully, some did not.  For those that didn’t, a restart of the management agents from the local console was sufficient to make it work (Reconfigure for HA).  The problem with esxi is that after reboot, the logs are cleared, so if a host has lost its management network connection and its local console has seized up, then you can’t read the logs (unless you’re using a remote syslog server).  These hosts management agents were obviously dying which will ultimately take down the management network too if you leave it long enough, yet theres no warning that this is going on until the host goes into a not responding state – visible in virtual centre.

Since making this change, the esxi hosts have not lost contact with the vcenter server at all (terminated their management agents daemons), or had their management networks seize to a halt in over a week.  Based upon my observations to date with this issue, I’m claiming this as a success and am very relieved to have got to the bottom of it.

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

New PC time. Meet the Lenovo Ideacenter Q180.

Despite being a massive fan of the Acer Aspire Revo 3600 that I bought a few years ago, my little £149.99 nettop is just a touch slow these days and could probably do with being upgraded.

My instant reaction was to go for the £174.97 Acer Aspire Revo 3700 and it would be a great choice too, however it’s 1.8GHz CPU would appear to have been trumped slightly by the 2.13GHz CPU in the £179.99 offering from Lenovo – The IdeaCenter Q180.

Yes you read those prices right.  I use Linux (free) and keep the price of my hardware as low as possible.  Any compromises on performance will be offset by operating system choice and subsequent tuning, although I’m not expecting to have to do an awful lot of that given it’s “whopping” 2GB RAM, 2.13GHz CPU and ATI Radeon graphics chip (full spec given below).  Watch this space.

Continued here..

Xubuntu 64 bit vs Crunchbang 64 bit

Processor

Intel Atom D2700 Dual Core 2.13GHz,
1MB L2 Cache

Memory

2GB DDR3 1066MHZ
soDIMM

Hard Drive

320GB SATA

Optical Drive

None

Software

Operating system: DOS

Display

Monitor Not Included

Graphics

ATI Radeon HD 6450 – 512MB

Networking

LAN: 10/100/1000 Gigabit Fast Ethernet
WLAN: 802.11b/g/n

Interfaces

3 x USB 2.0
2 x USB 3.0
1 x HDMI
1 x SPDIF

Expansion

7 in 1 Card Reader

Warranty

1 Year Manufacturer Warranty

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

Adding a persistent static route

Whether your Windows server or Linux server has multiple NICs to connect it to multiple VLANs and/or networks, sometimes you’ll need to configure a static route so that your server knows which interface to use in order to get to the syslog server, ntp server etc if it is to use a route other than the default gateway.

Solaris

vi /etc/gateways    (if this file doesn’t exist read on…)

Underneath>>    net 192.168.0.1 gateway 192.168.0.254 metric 1 passive

Add the following>>       net 10.0.0.0 gateway 10.8.2.65 metric 1 active

Note:  The default gateway is set in the /etc/defaultrouter file

If the /etc/gateways file doesn’t exist, then static routes may have been added “the old way”

cd /etc/rc2.d

ls | grep static

You may see a startup file called Snnnstatic_routes.  Inside this script will be non-persistent static routes added using lines that read something like /usr/sbin/route add 10.0.0.0/24 10.8.2.65 1   Append your routes to the ‘start’ section, not forgetting to add a corresponding route delete command in the ‘stop’ section.

Red Hat Linux

echo ‘10.0.0.0/24 via 10.8.2.65’ >> /etc/sysconfig/network-scripts/route-eth0

service network restart

route -n to view the current routing table

Windows

route -p ADD 10.0.0.0 MASK 255.255.255.0 10.8.2.65

netstat -rn to view the routing table

Excellent examples here: http://www.thegeekstuff.com/2012/04/route-examples/

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

Enabling SSH on VMWare ESXi hosts

Log on to the local console,  Alt + F1 and type unsupported  ENTER to connect to the console.

Enter the password when prompted.

vi /etc/inetd.conf

Remove the # at the beginning of the #ssh line to uncomment the ssh service.

:wq! to write the changes and quit the vi editor.

Identify the inetd process using ps | grep inetd

restart the inetd service with kill -HUP pid

clear, exit, Alt + F2 to log out of the unsupported console.  Esc to log out the local management console.

You’ll notice a warning appear in vSphere client stating that the remote administration console has been enabled.  This is considered a security risk, but it is possible to suppress the warning if you wish to leave it open (not recommended).

Before quitting the console, type

esxcli system settings advanced set -o /UserVars/SuppressShellWarning -i 1

to disable the warning, or

esxcli system settings advanced set -o /UserVars/SuppressShellWarning -i 0

to re-enable it (recommended if you disable the ssh console again).

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

Management Logs on Cisco UCS Blades

As of firmware 1.4.1 (old now), the Management Logs tab was renamed SEL Logs.

If you’re running VMWare ESXi on a Cisco M200 blade, then you may notice a hardware event trigger in vCenter Server, with a fault of System Board 0 SEL_FULLNESS.

This occurs when the UCS Management Log for a given blade breaches it’s own monitoring threshold of 90% full.

To clear it, Log into UCS Manager, Equipment tab, Servers, Server n, SEL Logs tab, and Backup or Clear the log.

Don’t forget to at least take a look at the log to make sure it hasn’t filled due to real, unresolved hardware problems.  The SEL Log logs absolutely everything that goes on to the extent of even logging LED’s as they turn on and off on the equipment, so these logs fill quite quickly.

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

Renaming a vSwitch in VMWare ESXi

Your vSwitches visible in vSphere client are allocated names, e.g. vSwitch0, vSwitch 1 and so on.  In order to create dvSwitches (Distributed vSwitches), you need to point vSphere Client at a VirtualCenter Server, not directly at an ESX host in order to access the enterprise features enabled therein.


Going back to pain old vSwitches though, the names need to match if you have VMotion VMKernel ports contained inside them, and if they don’t then it won’t work.

You soon realise that you can’t rename a vSwitch from within vSphere Client either – oh no!  Deleting it and recreating it may be a problem too if there are VM’s living inside an internal Virtual Machines network that cannot be VMotioned away to another host.

The good news is that you can fix this scenario using the “unsupported” console on the ESX host.

At the ESX Console, log in and hit Alt-F1 then type unsupported and hit Enter.  You won’t see the word “unsupported” appear as you type it but upon hitting Enter, you’ll be prompted for the root password.  Type it in and hit Enter.

You be presented with a Linuxesque command prompt.  If you don’t do vi, go find someone who does or you’re about to break stuff.

cd /etc/vmware

vi esx.conf

Search for “name” using Esc, /name, Enter and keep hitting n (next) until you find the incorrectly named vSwitch.  Change the word by hitting Esc, cw followed by the correct name, followed by Esc.

/net/vswitch/child[0001]/name = “vSwitch4

If you’re happy the name has been changed correctly in esx.conf, hit Esc, :wq! and hit Enter to write the changes back to disk and quit vi.

Back at the Linux prompt, type clear to clear the screen, and type exit and hit Enter to log out of the console.

Alt-F2 will close the “Unsupported Console” returning you back to the black and yellow ESX Console.

Esc to log out, then finally F11 to restart the host.

When the ESX host restarts, you can reconnect using vSphere Client and the vSwitch will now have the correct name.

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail