Booting an in-band VirtualCentre Server VM from the ESXi console

If your VirtualCentre server is itself a VM, then it’ll be running on an ESXi host.  In the event that the ESXi host is restarted without vMotioning the VirtualCenter Server first (such as when the management network is irrecoveraby unresponsive), then depending on your environment, you may not be able to get a remote connection to the vm after the host has restarted.  In this scenario, you’d need to be able to boot the VM from the unsupported console.  This is how to do it.

Connect to the iLo or equivalent management interface to the ESX host, send an Alt-F1 and type unsupported followed by the root password to obtain a prompt on the unsupported console.

Identify the VM’s resident on the host

vim-cmd vmsvc/getallvms

Identify the current power state of the vm running virtual centre

vim-cmd vmsvc/power.getstate ##           where ## is the number of the vm identified above

Power on the vm

vim-cmd vmsvc/power.on ##                       where ## is the number of the vm identified above

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

ESXi hosts keep dropping out of vCenter cluster

If your esxi hosts in your cluster keep going into a Not Responding or Disconnected state, then the following things should be immediately checked – the DNS addresses on the ESXi hosts, the hosts files on the ESXi hosts, the Managed IP Address setting in vCenter, and the hosts file on the vCenter Server.

DNS addresses on the ESXi hosts management interface are both configured to point to the DNS servers containing the A records of the ESXi hosts and vCenter Server itself.  Check they’re not pointing to the wrong DNS servers and log onto the unsupported console and perform an nslookup of the vCenter Server to check.


The resilience of name resolution being dependent on an external service should be bolstered with hosts files at either end.

From the vSphere Client, log in to vCenter Server
Navigate to Administration > vCenter Server Settings > Runtime Settings and review the Managed IP Address setting.
Verify that the address is correct (use ipconfig to discover the correct IP address for the vCenter management (v)LAN NIC).  Check all octets for correctness – be aware that the octets may not match that of the ESXi hosts so check the design document / consult the infrastructure architect.

Correct the entry and click OK to save your changes and close the dialog.
Restart the ESXi host(s) if it’s locked up.  If not, Connect it back in to vCenter

If you need to restart it, and the vcenter server is running on  a vm on it, then you should try to connect to the vcenter server over rdp first and shut it down.

Note: Once the esxi host has restarted, then you can power on the in-band vc vm using the instructions here… http://www.cyberfella.co.uk/2012/05/01/booting-vm-from-console/

Disable HA on the Cluster.

Put ESXi host into Maintenance Mode, moving off any powered on or powered off VM’s
Remove host from all DvSwitches in Home, Inventory, Networking
Remove the host from Cluster
Add host back into Cluster to push out new management agents and config containing corrected ipaddress of vCenter.
Take out of Maintenance mode.

Re-add host to DvSwitches.

Repeat for all ESXi hosts that were unstable.

It should now all stabilise (90 second wait).  If so, re-enable HA on the cluster.   If not use the following knowledge base article from VMware to trouble shoot other potential areas such as the firewall betwen the esxi hosts and vcenter server (if present).

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003409

A few days after this was written, I noticed hosts rarely disconnecting, but it was still happening.  Adding a hosts file to each esxi host so that they can all resolve each others names with ot without DNS services being available, as well as that of the vcenter server, and clicking “reconfigure for vware ha” on each host from within vcenter, seems to have regained some stability.

The most immediate place to look for problems is the summary tab for each host in vcenter.  The trouble is, that this usually gives very little away, usually describing a symptom rather than describing possible reasons for it.  The best place to look is in the logs – not the messages log from the black and yellow console, but the vcenter and ha logs.   Log onto the unsupported console on the esxi hosts and tail the logs below.

/var/log/vmware/vpx/vpxa.log     Shows Agent can’t send heartbeat, No route to host. errors.

/var/log/vmware/aam/vmware_hostname.log       Shows date and timestamp and “Node hostname has started receiving heartbeats from node hostname”  informational events for intra-esxi host communications.

It’s worth noting that aam is a Legato heartbeat technology and is massively dependent on DNS being right.

Four days after writing this, the hosts once again began to enter a not responding state, followed by a disconnected state.

I have always suspected that cluster heartbeats are falling foul to log files being shipped to remote syslog servers.  In esxi, there is a lot of logging going on, some of it, such as the entries in /var/log/vmware/hostd.log are also replicated to /var/log/messages, effectively doubling the amount of logging, which then has to all be replicated to (in my case) two remote syslog servers.  This all amounts to a pretty continous stream of data travelling over the bonded physical nics that ultimately handle all traffic to not only the management network, but also vmotion.  What alerted me to the suspicion that this could be the cause of my problems, was slow vmotion when migrating guests between hosts.  Also, when running esxi on cisco ucs with snmp monitoring enabled, there is a lot of informational logging activity for hardware that is healthy (status Green).

Whilst my preference would be to split the bonded nics (no loss of redundancy on cisco UCS provided the vnics are set to failover at the ucs level),  separating management and vmotion traffic, I have massively reduced the amount of logs being generated by making the following edit in

/etc/vmware/hostd/config.xml

outputToSyslog=false

This stops the duplication of hostd log entries being written to /var/log/messages.  You may be able to make similar changes to other agents, to make further reductions – I don’t know.  It’s worth noting that if you make this change, you’ll need to issue the following command to restart hostd.

/etc/init.d/hostd restart

Another change I made was to create a new HA enabled cluster in virtualcentre, and after migrating all guests off each esxi host, place each host into maintenance mode and move it to the new cluster.  Upon taking the esxi hosts out of maintenance mode, some re-enabled/re-deployed ha agents successfully, some did not.  For those that didn’t, a restart of the management agents from the local console was sufficient to make it work (Reconfigure for HA).  The problem with esxi is that after reboot, the logs are cleared, so if a host has lost its management network connection and its local console has seized up, then you can’t read the logs (unless you’re using a remote syslog server).  These hosts management agents were obviously dying which will ultimately take down the management network too if you leave it long enough, yet theres no warning that this is going on until the host goes into a not responding state – visible in virtual centre.

Since making this change, the esxi hosts have not lost contact with the vcenter server at all (terminated their management agents daemons), or had their management networks seize to a halt in over a week.  Based upon my observations to date with this issue, I’m claiming this as a success and am very relieved to have got to the bottom of it.

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

New PC time. Meet the Lenovo Ideacenter Q180.

Despite being a massive fan of the Acer Aspire Revo 3600 that I bought a few years ago, my little £149.99 nettop is just a touch slow these days and could probably do with being upgraded.

My instant reaction was to go for the £174.97 Acer Aspire Revo 3700 and it would be a great choice too, however it’s 1.8GHz CPU would appear to have been trumped slightly by the 2.13GHz CPU in the £179.99 offering from Lenovo – The IdeaCenter Q180.

Yes you read those prices right.  I use Linux (free) and keep the price of my hardware as low as possible.  Any compromises on performance will be offset by operating system choice and subsequent tuning, although I’m not expecting to have to do an awful lot of that given it’s “whopping” 2GB RAM, 2.13GHz CPU and ATI Radeon graphics chip (full spec given below).  Watch this space.

Continued here..

Xubuntu 64 bit vs Crunchbang 64 bit

Processor

Intel Atom D2700 Dual Core 2.13GHz,
1MB L2 Cache

Memory

2GB DDR3 1066MHZ
soDIMM

Hard Drive

320GB SATA

Optical Drive

None

Software

Operating system: DOS

Display

Monitor Not Included

Graphics

ATI Radeon HD 6450 – 512MB

Networking

LAN: 10/100/1000 Gigabit Fast Ethernet
WLAN: 802.11b/g/n

Interfaces

3 x USB 2.0
2 x USB 3.0
1 x HDMI
1 x SPDIF

Expansion

7 in 1 Card Reader

Warranty

1 Year Manufacturer Warranty

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

Adding a persistent static route

Whether your Windows server or Linux server has multiple NICs to connect it to multiple VLANs and/or networks, sometimes you’ll need to configure a static route so that your server knows which interface to use in order to get to the syslog server, ntp server etc if it is to use a route other than the default gateway.

Solaris

vi /etc/gateways    (if this file doesn’t exist read on…)

Underneath>>    net 192.168.0.1 gateway 192.168.0.254 metric 1 passive

Add the following>>       net 10.0.0.0 gateway 10.8.2.65 metric 1 active

Note:  The default gateway is set in the /etc/defaultrouter file

If the /etc/gateways file doesn’t exist, then static routes may have been added “the old way”

cd /etc/rc2.d

ls | grep static

You may see a startup file called Snnnstatic_routes.  Inside this script will be non-persistent static routes added using lines that read something like /usr/sbin/route add 10.0.0.0/24 10.8.2.65 1   Append your routes to the ‘start’ section, not forgetting to add a corresponding route delete command in the ‘stop’ section.

Red Hat Linux

echo ‘10.0.0.0/24 via 10.8.2.65’ >> /etc/sysconfig/network-scripts/route-eth0

service network restart

route -n to view the current routing table

Windows

route -p ADD 10.0.0.0 MASK 255.255.255.0 10.8.2.65

netstat -rn to view the routing table

Excellent examples here: http://www.thegeekstuff.com/2012/04/route-examples/

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

Enabling SSH on VMWare ESXi hosts

Log on to the local console,  Alt + F1 and type unsupported  ENTER to connect to the console.

Enter the password when prompted.

vi /etc/inetd.conf

Remove the # at the beginning of the #ssh line to uncomment the ssh service.

:wq! to write the changes and quit the vi editor.

Identify the inetd process using ps | grep inetd

restart the inetd service with kill -HUP pid

clear, exit, Alt + F2 to log out of the unsupported console.  Esc to log out the local management console.

You’ll notice a warning appear in vSphere client stating that the remote administration console has been enabled.  This is considered a security risk, but it is possible to suppress the warning if you wish to leave it open (not recommended).

Before quitting the console, type

esxcli system settings advanced set -o /UserVars/SuppressShellWarning -i 1

to disable the warning, or

esxcli system settings advanced set -o /UserVars/SuppressShellWarning -i 0

to re-enable it (recommended if you disable the ssh console again).

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

Management Logs on Cisco UCS Blades

As of firmware 1.4.1 (old now), the Management Logs tab was renamed SEL Logs.

If you’re running VMWare ESXi on a Cisco M200 blade, then you may notice a hardware event trigger in vCenter Server, with a fault of System Board 0 SEL_FULLNESS.

This occurs when the UCS Management Log for a given blade breaches it’s own monitoring threshold of 90% full.

To clear it, Log into UCS Manager, Equipment tab, Servers, Server n, SEL Logs tab, and Backup or Clear the log.

Don’t forget to at least take a look at the log to make sure it hasn’t filled due to real, unresolved hardware problems.  The SEL Log logs absolutely everything that goes on to the extent of even logging LED’s as they turn on and off on the equipment, so these logs fill quite quickly.

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail

Renaming a vSwitch in VMWare ESXi

Your vSwitches visible in vSphere client are allocated names, e.g. vSwitch0, vSwitch 1 and so on.  In order to create dvSwitches (Distributed vSwitches), you need to point vSphere Client at a VirtualCenter Server, not directly at an ESX host in order to access the enterprise features enabled therein.


Going back to pain old vSwitches though, the names need to match if you have VMotion VMKernel ports contained inside them, and if they don’t then it won’t work.

You soon realise that you can’t rename a vSwitch from within vSphere Client either – oh no!  Deleting it and recreating it may be a problem too if there are VM’s living inside an internal Virtual Machines network that cannot be VMotioned away to another host.

The good news is that you can fix this scenario using the “unsupported” console on the ESX host.

At the ESX Console, log in and hit Alt-F1 then type unsupported and hit Enter.  You won’t see the word “unsupported” appear as you type it but upon hitting Enter, you’ll be prompted for the root password.  Type it in and hit Enter.

You be presented with a Linuxesque command prompt.  If you don’t do vi, go find someone who does or you’re about to break stuff.

cd /etc/vmware

vi esx.conf

Search for “name” using Esc, /name, Enter and keep hitting n (next) until you find the incorrectly named vSwitch.  Change the word by hitting Esc, cw followed by the correct name, followed by Esc.

/net/vswitch/child[0001]/name = “vSwitch4

If you’re happy the name has been changed correctly in esx.conf, hit Esc, :wq! and hit Enter to write the changes back to disk and quit vi.

Back at the Linux prompt, type clear to clear the screen, and type exit and hit Enter to log out of the console.

Alt-F2 will close the “Unsupported Console” returning you back to the black and yellow ESX Console.

Esc to log out, then finally F11 to restart the host.

When the ESX host restarts, you can reconnect using vSphere Client and the vSwitch will now have the correct name.

image_pdfCreate PDF of this post...
Facebooktwittergoogle_plusredditpinterestlinkedinmail