Thursday, January 31, 2013

Simulating Patch/Driver update without rebooting ESXi Host

I was reading Mostafa Khalil's Storage book and came across with a fantastic option available with esxcli command using which you can simulate the patch update or any other Async Driver Update and you can see if the update will succeed or fail if there is any dependencies  on other packages.

You can first go through an installation dry run to verify if there are any problems as it would tell if there is any dependency which needs to removed as well. 

e.g. I am putting here the bn2nx driver for a 10 GB Braodcom NIC and I have copied the the zip file under /vmfs/volumes/bcm directory
esxcli software vib install -d /vmfs/volumes/bcm/BNX2X.zip --dry-run

This is a verbose version of the above command where we are defining the -d option
esxcli software vib install --depot /vmfs/volumes/bcm/BNX2X.zip --dry-run

Now once the command finished running you will see the output contains an entry with "Reboot Required: false", which means that at this time reboot of the ESXi host after the installation is NOT required. As this was just a simulation only but when you actually decide to put the patch/driver update you may need to reboot the host for certain services to be restarted or certain driver modules to be loaded upon the restart. 
You can run /sbin/services.sh restart or if you are in / directory then just run services.sh restart, which will restarts all the services on that host. 
This can only run from the ESXi Shell via SSH or locally on the ESXi host using DCUI/ILO/DRAC/RSA/KVM etc etc.

With the above command it will make possible for you to determine and plan the maintenance window accordingly and not affecting the production environment at all.
This command can let you know if you need any other update/dependency (if there is/are any) beforehand so that you can put them on the host before applying the actual driver/patch update.

Generally this command is used as one of the advanced options during the Installation/Upgrade of the ESX host using the kickstart script (ks.cfg). You can find about it more in the online documentation for vSphere here.

Hope you will find this information useful.

Care and share.


Wednesday, January 30, 2013

Quiz on vSphere

Hi Folks,

This post is just to have the input on the issue I was discussing with my colleague.

And it is just to know how the VMware Technologies can be used to suggest a solution to a particular issue.

To answer the question or provide the suggestions, leave the comment and after one week from today the "Winner" will be announced here on this post only.

So here you go........read the question carefully before you come up with answer as you can  use any supported method with different technologies of vSphere.


How can I present 16 vNICs inside a Virtual Machine on vSphere 5.1?



Your time starts now. :-)

Thanks and share!!

Saturday, January 26, 2013

Disable/Enable ABC for NFS Troubleshooting on NetApp

I was reading some issue with NetApp NFS troubleshooting and came across the KB from NetApp to disable flow Control on the Array.

This new Flow control setting is listed under a RFC 3465 which is described as ABC (Appropriate Byte Counting (ABC)).

Now this particular settings takes care of TCP Congestion Control with Appropriate Byte Counting (ABC) settings on the array.



As per this document  Storage Best Practice Document (TR-3749.PDF) page 25 Flow control means:


 FLOW CONTROL
Flow control is a low-level process for managing the rate of data transmission between two nodes to prevent a fast sender from overrunning a slow receiver. Flow control can be configured on ESX/ESXi servers, FAS storage arrays, and network switches. For modern network equipment, especially 10GbE equipment, NetApp recommends turning off flow control and allowing congestion management to be performed higher in the network stack. For older equipment, typically GbE with smaller buffers and weaker buffer management, NetApp recommends configuring the endpoints, ESX servers, and NetApp arrays with the flow control set to "send."


And further reading on the Congestion Management with Flow control - TR-3802.pdf page 23 , it mentioned other technologies to use such TCP windowing, increased switch buffering, and end-to-end QoS which may reduce the need for simple flow control throughout the network. Now I think along with that you can add the setting for ABC as per the RFC mentioned above. You can read the RFC to get more idea on how exactly it works on the TCP stack.

Fro ONTAP 8.1 Netapp has  released the guide with the following command:


ip.tcp.abc.enable

(Enables/disables the use of Appropriate Byte Counting in TCP Congestion Control following RFC
3465. Valid values for this option are on or off. The default value for this option is on.)



Now if you need further information on how to set this up on the affected Array and I would suggest to get in touch with the vendor.

Now this may or may not improve the overall situation but its worth giving a try.


Please share if you can.

Friday, January 25, 2013

How to verify VMFS Heartbeat Region Corruption

Recently I was discussing the issue when you see some messages in the logs where the HB offset is noticed in place of the actual location of the bytes.

Upon research found out that the there is a heartbeat region corruption occurred on the VMFS volume due to the power outage or some other reasons.

You will see the following messages in the logs:


cpu2:2816)WARNING: HBX: 533: Volume 4f7217c1-3589352d-625d-001b22248a7a ("1TB-LAB") may be damaged on disk. Corrupt heartbeat detected at offset 3751936: [HB state 0 offset 0 gen 0 stampUS 0 uuid 00000000-00000000-00

cpu5:2980)WARNING: HBX: 533: Volume 4f7217c1-3589352d-625d-001b22248a7a ("1TB-LAB") may be damaged on disk. Corrupt heartbeat detected at offset 3751936: [HB state 0 offset 0 gen 0 stampUS 0 uuid 00000000-00000000-00


Now on your ESXi 5.x host run the following command on ESXi console using SSH or DCUI:

hexdump -s 22626304 -n 2048 -C /vmfs/devices/disks/

The output will be similar to this:

hexdump -s 22626304 -n 2048 -C /vmfs/devices/disks/

01594000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01594200  01 ef cd ab 00 42 39 00  00 00 00 00 de 00 00 00  |.....B9.........|
01594210  00 00 00 00 cd 31 82 11  00 00 00 00 00 00 00 00  |.....1..........|
01594220  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
01594230  0e 00 00 00 36 00 00 00  00 00 00 00 00 00 00 00  |....6...........|
01594240  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01594400  01 ef cd ab 00 44 39 00  00 00 00 00 00 00 00 00  |.....D9.........|
01594410  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01594600  01 ef cd ab 00 46 39 00  00 00 00 00 00 00 00 00  |.....F9.........|
01594610  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01594800

Now if you look at the bold section in RED above you see that the 1st section is missing some value out of the 4.

After fixing the heartbeat region you can see the value as follows once you run the same command again.


hexdump -s 22626304 -n 2048 -C /vmfs/devices/disks/

01594000  01 ef cd ab 00 40 39 00  00 00 00 00 00 00 00 00  |.....@9.........|
01594010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01594200  01 ef cd ab 00 42 39 00  00 00 00 00 de 00 00 00  |.....B9.........|
01594210  00 00 00 00 cd 31 82 11  00 00 00 00 00 00 00 00  |.....1..........|
01594220  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
01594230  0e 00 00 00 36 00 00 00  00 00 00 00 00 00 00 00  |....6...........|
01594240  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01594400  01 ef cd ab 00 44 39 00  00 00 00 00 00 00 00 00  |.....D9.........|
01594410  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01594600  01 ef cd ab 00 46 39 00  00 00 00 00 00 00 00 00  |.....F9.........|
01594610  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
01594800

As you can see in the above output with all Bold in RED, all 4 sections have some HEX values which is a sign of healthy VMFS heartbeat region. If you do have issue viewing the file for any VM before and it was giving an error, try again you should be able to view the file from the affected datastore.
Now to fix the same issue I would suggest to contact VMware Support if not dealing with Lab/Test environment or you are not sure how to fix the corruption. This information is just to give you an idea how to verify if there is indeed a corruption on the VMFS volume or not.

For more information refer to KB 1012036 and also refer to the blog by @VMwareStorage on VOMA which talks about the metadata corruption.


Hope you find this information useful, if yes then do share.

Thursday, January 3, 2013

SiteSurvey 2.5.3 for FT - Not supported on vSphere 5.1

I was reading researching about NX/XD (KB 1993) , EVC (KB 1003212) & FT and came across the page where I saw that on vSphere 5.1 the Site Survey 2.5.3 utility is not supported.

VMware SiteSurvey 2.5.3 is actually a plugin for the vSphere Client which analyzes ESX/ESXi host/s managed by vCenter Server and reports on whether the configuration of both software and hardware is suitable for use with the VMware Fault Tolerance (FT) feature.

You will notice the following message on the Main Site Survey Page.



As you can see it is supported on all previous version than vSphere 5.1.

Another important utility is the cpuid tool which checks the CPU configuration for compatibility and other feature comparison. If you don't have this then I would recommend to download the same as you can use it any time for checking the cpu information.

You can download the CPUID zip file from here.

If you are not aware about FT then you can read the features here.

Microsoft has their own HAV (Hardware Assisted Virtualization) detection tool which you can download from here.

As everyone else I am also waiting for SMP FT which will solve so many issues in the area of BC/DR (Business Continuity and Disaster Recovery) which helps the businesses achieving 6 9's of SLA and rethinking the RTO/RPO combination. It will make a huge difference in the availability of vCenter Server which is the core component for all other dependent products such as vCloud, vShield, vCenter Orchestrator etc. etc.

Hopefully it will come out as heard that its demo was shown at some conference in 2012 but not sure what exactly it will support. If any one witnessed the demo then please leave your comment here so the readers/users can use the information discussed during the Demo.


Sharing is caring.

Please RT the Tweet and help spread the word.

Enjoy !!

Wednesday, January 2, 2013

Upgrade Videos of vSphere, vCloud, vCenter, VDS, vShield

Hi,


If you want to go through the upgrade process of various VMware products, there are documentation available for them but one of VMware VCDXs, Matthew Meyer @mattdmeyer has developed a whole list of videos which covers the ugprade process and setup of various products which are as follows.


VMware vCloud Director 1.5 to 5.1 Upgrade

http://youtu.be/J6vfnvOS_D8



VMware vShield Manager 1.5 to 5.1 Upgrade

http://youtu.be/qrx8tQxNPIo



VMware vCenter Single Sign-On Installation
http://youtu.be/c1QWNbea3lo



VMware vCenter Server 5.0 to 5.1 Upgrade

http://youtu.be/0ypcu5JYWyo



VMware ESXi 5.0 to 5.1 Upgrade

http://youtu.be/Kvozx6v4m50



vCloud Director 1.5 to 5.1 Agent Upgrade

http://youtu.be/D9CSNOhqfAE



VMware vSphere Distributed Switch (vDS) 5.0 to 5.1 Upgrade

http://youtu.be/MVDAwbVZd0I




VMware VXLAN Installation for vCloud Director 5.1

http://youtu.be/LK7N4o8N7ts



This will give you an idea on exactly what to do while upgrading or installing the related products.

Sharing is caring so RT and spread the word.

Thanks !!



Tuesday, January 1, 2013

Windows 2008 Boot issue after extending the disk on vSphere 5

Recently I came across a strange issue on my Lab where I assigned limited size Thick Provisioned VMDK to a W2K8 R2 VM. Just the OS drive.

It was ran out of space and I had to extend the VMDK so did it from the VS Client.

On next reboot I saw that Windows is not booting properly and giving me an error


=============================
File: \Windows\system32\winload.exe


Status: 0xc000000e
=============================

Now I know I still have to extend the partition within the Guest OS but that was the next part I was supposed to do but not even getting that far.

Searching the error at MS site, I found an article to fix the boot sector.

  • To do that boot the VM with the Windows OS boot disk
  • Select "Repair your computer"
  • Open the command prompt, and run:

    bootrec.exe /rebuildbcd
    bootrec.exe /fixmbr

    (Select Yes for any prompts)
  • For changes to take effect, reboot the virtual machine   


VM is up and running after fixing the boot sector so no issue there, but now looking further into the rebuildbcd issue it looks like there is some disk signaturing issue going on but was not able to track it down so still doing further deep dive in to it.

Share this please!!