Wednesday, November 14, 2012

Run Hardware Diagnostic tests

Most servers are shipped with a hardware diagnostics CD, although other hardware vendors may choose to install a hidden utility partition located on your hard drive.
Note: If you are not experienced with computers or have any concerns, please contact your hardware vendor.
You can diagnose hardware related problems on your server by booting from the diagnostic CD or choosing Diagnostics from the boot device list.
These diagnostic tools allow you to:
  • Check the hardware configuration and verify that it is functioning correctly.
  • Test individual hardware components.
  • Diagnose hardware-related problems.
  • Obtain a complete hardware configuration.
When testing, if a component failure is detected, make note of any error code(s) and contact the hardware vendor.

Check your memory

Note: This process requires downtime on your ESX/ESXi host for up to 48 hours. In most cases, contacting your hardware vendor for a diagnostic utility as mentioned above should be sufficient in testing your hardware. VMware does not endorse or recommend any particular third party utility.  However, there are third party options available to test your memory.
To test your memory:
  1. Download memtest86+ from http://www.memtest.org/.
  2. Extract the ISO image from the .gz or .zip archive.
  3. Burn the image to CD.
  4. Boot your ESX/ESXi host from the CD.
  5. The memtest goes through each memory bank and checks for errors.

    Note: If memtest86+ does not run on your hardware, contact your vendor for their memory test utility. 
Ensure your server conforms to Non-Uniform Memory Access (NUMA) rules and regulations
Notes:
  • If you are not experienced with computers or have any concerns, please contact your hardware vendor.
  • Problems related to NUMA usually occur following a RAM upgrade or after an ESX/ESXi Server host installation.
You might see an error such as the following:

The BIOS reports that NUMA node 1 has no memory. This problem is either caused by a bad BIOS or a very unbalanced distribution of memory modules. 
NUMA is a system where each processor has separate memory. The separate memory helps to avoid a performance hit when several processors attempt to address the same memory.
The main requirement is that a similar amount of memory is installed beside each processor. If the amount of memory installed beside each processor is not not similar, it is unbalanced and you might experience performance problems.For more information, see ESX Server Memory Management on Systems with AMD Opteron Processors (1570).


More information on NUMA is also available in the Resource Management Guide.

Run the VMware CPU Identification Utility

To ensure that your CPU(s) are being detected as expected you can use the VMware CPU Identification Utility. You can download the utility at VMware Shared Utilities. This tool helps you ensure that the ESX host is detecting and reporting your CPU(s) correctly.
When the the VMware CPU Identification Utility has been downloaded, the cpuid.iso image can be used to create a bootable CD that aids in processor and feature identification. The tool displays Family/Model/Stepping information for the CPUs detected, and hexadecimal values for the CPU registers that identify specific CPU features. The hexadecimal register values are then interpreted to indicate whether the CPUs support features like, 64bit, SSE3, and nX/xD.
The following is sample output:
Reporting CPUID for 2 logical CPUs...
All CPUs are identical
Family: 0f Model: 04 Stepping: 1
ID1ECX ID1EDX ID81ECX ID81EDX
0x0000641d 0xbfebfbff 0000000000 0x20100000Vendor : Intel
Processor Cores : 1
Brand String : " Intel(R) Xeon(TM) CPU 2.80GHz"
SSE Support : SSE1, SSE2, SSE3
Supports NX / XD : Yes
Supports CMPXCHG16B : Yes
Hyperthreading : Yes
Supports 64-bit Longmode : Yes
Supports 64-bit VMware : No
Additional Information

In addition make sure that you are meeting the minimum system requirements for your ESX/ESXi.  For more information see Minimum requirements for installing ESX/ESXi (1003661).For more information about decoding machine check exceptions, see  Decoding Machine Check Exception (MCE) output after a purple screen error (1005184)

No comments:

Post a Comment

acm bottom ad