Sunday, September 30, 2012

Basic Steps to learn Vmware ESXI Trouble shooting


When ever an issue arises on an ESX host people often rush to the Service Console and start typing various commands to figure out what is wrong. Of course some of the commands are not available with ESXi and you might not even have access to the ESXi console. Many will resort to the vCLI/vMA or even PowerCLI and that works perfectly fine. Especially the vCLI/vMA is geared towards those who have experience with ESX command-line troubleshooting. You will have all "esxcfg-*" commands to your disposal and of course resxtop; which will cover 95% of those cases where commandline details are required.
I want to stress that ESXi was not built for console access, although we do provide access to the console and it works fine. The idea around ESXi is to have a lean hypervisor which is managed from the outside versus the inside and VMware has provided multiple tools to do so. The first and foremost being of course vCenter Server or the vSphere Client. Many problems can be solved simply by using the vSphere Client connected to a host directly or through vCenter Server itself. The first KB article in the list below is a good example of how vCenter can be used to troubleshoot an inaccessible virtual machine. Something that people tend to forget is that the vSphere Client can also be used to read log files, there is no need to open up a console session for that shown below and explained in the second article in the list:
    1. Open a browser and enter the URL http://, where is the IP or fully qualified domain name for the vCenter Server.
    2. Provide administrative credentials when prompted.
    3. Click the Browse datastores in the vCenter inventory link.
    4. Navigate the webpages until you reach the appropriate datacenter, datastore, and folder as noted in step 1.
    5. Click the link to the appropriate log file, and open it with your preferred editor.
    On the topic of log files, for those who never worked with ESXi the location is slightly different than you are used to:
    • The VMkernel, vmkwarning, and hostd logs are located at /var/log/messages
    • The Host Management service (hostd = Host daemon) log is located at /var/log/vmware/hostd.log\
    • The vCenter Agent log is located at /var/log/vmware/vpx/vpxa.log
    • The System boot log is located at /var/log/sysboot.log
    • The Automatic Availability Manager (AAM) logs are located at /var/log/vmware/aam/vmware_-xxx.log
    Note that the /var/log/messages is a combination of all logs out there except for the HA log. You will need to monitor open that up seperately when troubleshooting HA related issues. Also be noted that the HA logfiles aren't part of the Syslog mechanism either unfortunately. Knowing the logfiles and the type of info you can get from it is key when troubleshooting. I encourage everyone to get familiar with it when you have the time to do so, as under pressure you don't want to find yourself fiddling around in the wrong location or logfile when you have 4 managers and your director watching over your shoulder if you have fixed it already or not.
    After you have dived into the log files make sure you check the Knowledge Base. Our Knowledge Base has an excellent set of articles which can be used to troubleshoot very specific issues or at least lead you into the right direction. I have listed some of the most common issues and used KB's including a link to the article below for your convenience:
      1. Restart the management agents on an ESXi host (1003490)
      2. Determining why a single virtual machine is inaccessible (1018834).
      3. Determining why a virtual machine was powered off or restarted (1019064).
      4. Determining why multiple virtual machines are inaccessible (1019000).
      5. Troubleshooting virtual machine network connection issues (1003893).
      6. Interpreting virtual machine monitor and executable failures (1019471).
      7. Determining why a virtual machine does not respond to user interaction at the console (1017926).
      8. Using Tech Support Mode in ESXi 4.1 (1017910)
      9. Determining why a VMware ESXi host is inaccessible (1019082)
      10. Determining why a VMware ESXi host was powered off or restarted (1019238).
      11. Determining why a VMware ESXi host does not respond to user interaction (1017135).
      12. Enabling serial-line logging for an ESXi host (1003900).
      13. Using performance collection tools to gather data for fault analysis (1006797).
      14. Using hardware NMI facilities to troubleshoot unresponsive hosts (1014767)
      15. Interpreting a VMware ESX host purple diagnostic screen (1004250).
      16. Troubleshooting VMware High Availability (HA) (1001596).
        In some cases however it might be required or desirable to log in to Technical Support Mode (yes this is fully supported) and work directly from the ESXi shell. The ESXi shell as many of you know also contains all esxcfg-* commands, the invaluable esxcli command and of course some shell commands that are required for troubleshooting. Some  of those commands are obvious, others are less obvious. I have listed several below to make things easier.
        The one many complained about in the past but actually is available is vmkping. Vmkping can be used to do basic network troubleshooting, but also for instance to validate if jumbo frames can be used by simply adding the size of the packet:
        vmkping -s 9000
        One that many bumped into in the ESXi 4.0 time frame was the lack of a mount command. This mount command was actually available, but as part of busybox:
        /usr/bin/busybox mount
        In 4.1 though the "mount" command has been linked to busybox itself enabling you to just use "mount. The same applies to for instance fdisk. Fdisk will enable you to validate the partition setup. It has helped me many times in the past to validate that partitions were still marked as "VMFS" when someone accidentally presented VMFS volumes to Windows machines which immediately resignatured the disks. Again under 4.0 fdisk is not available as a binary but is available through "busybox", and in 4.1 is available as a link. (Most of these links are located in /usr/sbin)
        /usr/bin/busybox fdisk -l
        Another thing that I have done in the past regularly when I needed to evacuate a host is place the host in maintenance mode. With ESXi you can do this as follows:
        vim-cmd hostsvc/maintenance_mode_enter
        And of course you can also exit maintenance mode:
        vim-cmd hostsvc/maintenance_mode_exit
        What about listing all VMs and stopping a specific one?
        vim-cmd vmsvc/getallvms
        vim-cmd vmsvc/poweroff
        These are just examples to show the power of vim-cmd. Many try to avoid using it, but really it is not overly complex and it gets the job done fairly simple. It can be difficult sometimes to figure out the syntax but than again if you can't find figure it someone else probably has, google it.
        Something that I was asked about this week which can also come in handy when troubleshooting memory issues is the following command which will give you the memory utilization of the hypervisor components:
        vdf -ph
        These commands are just a couple of examples of what is possible within the ESXi shell, although we do generally recommend to avoid logging in to the ESXi shell (via remote or local tech support mode) and prefer to use the alternatives we offer it will work fine. In general troubleshooting hasn't changed much due to the full support of the "Technical Support Mode" feature, the remote command line utilities (vCLI or the vMA) and of course vCenter or the vSphere Client.

        Monday, September 24, 2012


        VCP5 - Upgrade a vNetwork Distributed Switch

        There are 3 versions of vNetwork Distribute Switches available: 4.0, 4.1, and 5.0. Each version provides new functionality, but also limits the interop with older versions

        Upgrade Distributed Switch

        1. In vSphere, browse to Networking
        2. Select the switch and on the Summary tab, click Upgrade
        3. Select the upgrade version and click Next
        4. Confirm no hosts report as incompatible, click Next
        5. Click Finish
        Version Compatibility / Features
        VersionFeaturesCompatibility
        4.0N/AESX 4.0 and later
        4.1Load-Based Teaming
        Network I/O Control
        ESX 4.1 and later
        5.0User-defined network resource pools
        NetFlow
        Port Mirroring
        ESX 5.0 and later
        Step-1
        1) How to add the ESX host to distributed switch with out downtime.
        2) while upgrading the distributed switch , is there any impact on ESX hosts.

        What is Fibre Channel over IP (FCIP or FC/IP)


        Fibre Channel over IP (FCIP or FC/IP, also known as Fibre Channel tunneling or storage tunneling) is an Internet Protocol (IP)-based storage networking technology developed by the Internet Engineering Task Force (IETF). FCIP mechanisms enable the transmission of Fibre Channel (FC) information by tunneling data between storage area network (SAN) facilities over IP networks; this capacity facilitates data sharing over a geographically distributed enterprise. One of two main approaches to storage data transmission over IP networks, FCIP is among the key technologies expected to help bring about rapid development of the storage area network market by increasing the capabilities and performance of storage data transmission.

        FCIP Versus iSCSI

        The other method, iSCSI, generates SCSI codes from user requests and encapsulates the data into IP packets for transmission over an Ethernet connection. Intended to link geographically distributed SANs, FCIP can only be used in conjunction with Fibre Channel technology; in comparison, iSCSI can run over existing Ethernet networks. SAN connectivity, through methods such as FCIP and iSCSI, offers benefits over the traditional point-to-point connections of earlier data storage systems, such as higher performance, availability, and fault-tolerance. A number of vendors, including Cisco, Nortel, and Lucent have introduced FCIP-based products (such as switches and routers). A hybrid technology called Internet Fibre Channel Protocol (iFCP) is an adaptation of FCIP that is used to move Fibre Channel data over IP networks using the iSCSI protocols.

        acm bottom ad