Tuesday, October 6, 2015

Host gets stuck at 68% entering maintenance mode

Another short one.

This happened to me in version 5.5. I set a host in maintenance mode in a cluster, and all VMs moved, but when entering maintenance mode the host stalled at 68%.

KB 1002412 states that this is either a vcenter problem or a time problem. I offer a 3rd solution.

Connect to the host through drac and restart the management agents. The host will disconnect and reconnect, and now entering maintenance mode will work ok.

Friday, May 29, 2015

vRealize Ops licensing with VSOM 5.5

Short but sweet.

I deployed vRealize Operations Management 6.0 as a linux appliance and configured it on a 5.5 vCenter that has hosts with VSOM (vSphere with Operations Management) 5.5 licenses. I thought that vRops would pick up the licenses from the vcenter, but no.

I then wondered if there was a separate license on the MyVMware portal, but no again. Finally, what i found was the solution is that you paste the same VSOM host licenses in the Licensing section of the vRops web interface - that's it.

I just couldnt find something on it easily on google - and that's what this blog is about.

Sunday, May 17, 2015

Change the management vmkernel IP and DNS settings from DHCP to static without DCUI

This is in my home lab. I installed vsphere 6.0 on my non-HCL Asus MB and left it on DHCP, turned it off a few days, and then turned it on back again to see its DHCP address changed. 

Since I'm studying  for VCAP-DCA and will use this lab more, I figured might as well set it to static and not rely on chance and have to check the display to find the IP again. Part of VCAP-DCA is getting very comfortable using the command line, so I determined I would do it with esxcli.

Pic from DCUI before doing the change:



Notice in this command that both the address and DNS are being acquired from DHCP for the vmk0 interface.

[root@arielitox:~] esxcli network ip interface ipv4 get
Name  IPv4 Address  IPv4 Netmask   IPv4 Broadcast  Address Type  DHCP DNS
----  ------------  -------------  --------------  ------------  --------
vmk0  192.168.1.12  255.255.255.0  192.168.1.255   DHCP              true
vmk1  192.168.1.13  255.255.255.0  192.168.1.255   STATIC           false


A way to find out how to change it is to execute the command without parameters and get the help

[root@arielitox:~] esxcli network ip interface ipv4 set
Error: Missing required parameter -t|--type
       Missing required parameter -i|--interface-name

Usage: esxcli network ip interface ipv4 set [cmd options]

Description:
  set                   Configure IPv4 setting for a given VMkernel network interface.

Cmd options:
  -i|--interface-name=<str>
                        The name of the VMkernel network interface to set IPv4 settings for. This name must be an interface
                        listed in the interface list command. (required)
  -I|--ipv4=<str>       The static IPv4 address for this interface.
  -N|--netmask=<str>    The static IPv4 netmask for this interface.
  -P|--peer-dns         A boolean value to indicate if the system should use the DNS settings published via DHCP for this
                        interface.
  -t|--type=<str>       IPv4 Address type :
                            dhcp: Use DHCP to aquire IPv4 setting for this interface.
                            none: Remove IPv4 settings form this interface.
                            static: Set Static IPv4 information for this interface. Requires --ipv4 and --netmask options.
                         (required)

I'm setting the same IP but static:

[root@arielitox:~] esxcli network ip interface ipv4 set -i vmk0 -I 192.168.1.12 -N 255.255.255.0 -P false -t static

and we can now confirm the results

[root@arielitox:~] esxcli network ip interface ipv4 get
Name  IPv4 Address  IPv4 Netmask   IPv4 Broadcast  Address Type  DHCP DNS
----  ------------  -------------  --------------  ------------  --------
vmk0  192.168.1.12  255.255.255.0  192.168.1.255   STATIC           false
vmk1  192.168.1.13  255.255.255.0  192.168.1.255   STATIC           false

Now we just told ESXi not to get the DNS settings from DHCP, but, what settings will it use then? Let's dig.

[root@arielitox:~] esxcli network ip dns server list
   DNSServers: 192.168.1.1

That is in fact what I had through DHCP. I tested it survives a reboot; nonetheless I point this command out so that you also remember to check the DNS settings when you do this. I did these tests by changing the IP through the client, and in there the DNS option is in a separate location and must be changed manually.

So there you go. Once I checked back the DCUI, it indeed reports it as a static address


and it can be checked in the client as well.

Was there a gotcha here? Not sure. Things worked as expected; maybe the -P option is a bit of a gotcha, so that you don't leave DNS relying on DHCP, if you are doing this with the client.

Sunday, April 19, 2015

reset iLO pw from ESXI

I had some problems resetting the iLO password following these posts (which are excellent and had already saved me from setting the iLO address)

http://www.vmwarearena.com/2013/11/how-to-configure-hp-ilo-from-esxi-host.html
http://www.virtualtothecore.com/en/configure-hp-ilo-directly-on-esxi-server/

I didn't want all of my iLO configuration reset, just the password; and I think it had to do with the different ilo versions, because what I was getting was a syntax error when I tried copy/paste/adjust the method in the first post.

</-- ERROR :      STATUS= 0x0001

     MESSAGE= Error: Line #0: syntax error near "=". -->

Here's a foolproof way:

Frist, dump the ilo info into a file (this makes sure your end file is compatible with the iLO version :) )

/opt/hp/tools # ./hponcfg -w iloconf.txt

vi the file, and use dd to remove all the lines except the ones not in blue, and then add the ones in blue (your new password goes there)

/opt/hp/tools # vi iloconf.txt
<!-- HPONCFG VERSION = "4.4-0.0" -->
<!-- Generated 4/19/2015 21:59:40 -->
<RIBCL VERSION="2.1">
 <LOGIN USER_LOGIN="Administrator" PASSWORD="password">
  <USER_INFO MODE="write">
   <mod_USER USER_LOGIN="Administrator">
    <password value="your_new_pw_goes_here"/>
   </mod_USER>
  </USER_INFO>
 </LOGIN>
</RIBCL>

After that run this command, which should always complete successfully

/opt/hp/tools # ./hponcfg -f iloconf.txt

I had no idea what my previous iLO password was, so this works perfectly well to reset the password as long as you can login to your host.

Don't forget to remove the text files from wherever you saved them after you are done. That's a sensitive password you are leaving in clear text!

Lots of little gotchas in this post, which is why it's here - this blog is meant to save you some time with the time I wasted already. If it helped, let me know in twitter :)

Monday, March 16, 2015

"no network adapters" in ESXi 6.0 GA (workstation 10, possibly any case)

Summary: If you don't meet the minimum 4GB of RAM you may get this error.

I used to do a few custom ESXi VMs for my nested hosts, keep them there and modify them as needed. 

Reason for that is the default settings are not of my particular liking. I don't let them have a 40GB disk - 1GB is more than fine for my lab. Likewise, I started my templates with the least amount of ram; in 5.0 and 5.1, that meant 2GB. 5.5 raised the minimum limit to 4GB, and told you very clearly about it! (felt like RTFM after that)



This will become important later. 

I naively installed my first ESXi 6.0 GA host with 2GB of RAM, and ran into this



"Well WTH?" i thought. I googled around but there wasn't anything too clear for "no network adapters" in ESXi 6.0 GA or VMware Workstation. I assumed there had to be a mistake so I checked the settings. Workstation doesn't let you choose the NIC adapter, although you can edit the VMX file.

Checking the VMX files, one for 5.5 and 6.0 I could see the e1000 adapter clearly listed.



Then I took a step back, (RTFM echoed in my head), and I went to check what the requirements were for ESXi 6.0 and saw 4GB ram - I thought, well let's get that fixed although there wasn't an error like in 5.5. Lo and behold, the installer now worked and I was able to install without any other problems.



So, there's a "gotcha" there, and that's what this blog is about :)

Monday, March 2, 2015

P420m/P320h Micron PCIe SSDs - firmware update is different for retail and Dell branded SSDs

This one came up when I was just starting my deployment with PernixData FVP. Peter Chang, @virtualbacon, was instrumental in getting me out of the confusion, as well as Dan Florence from Micron.

I got my new R620 servers with Micron P420m PCIe drives (Dell part 468-7978). I always make sure I upgrade the firmware on any server i'm deploying, and I noticed the Dell OME bootable CD I created on that day didn't include any items for the SSD. I checked the VMware HCL and there were several recommendations depending on ESXi version, so it seemed I did want to make sure I was on the latest firmware to avoid some known issues. I figured, no biggie, and decided it could be too new, and would need to update it separately.

When I browsed the Dell downloads for this server, I only found a software package in the Dell website for Windows, and quite older than the VMware HCL; so I went to the Micron website, browsed in products for SSDs, PCIe interface, and there was the P420m, in my case a Half Height Half Length (HHHL) card. 

I found both a bootable and non-bootable version, saw both used the same firmware, and proceeded to download the latest file. This included firmware, drivers and manuals (at that time, the latest was 145.03.00, released 09/2014).

Fast forward a lot of details and reading - I got RSSDM installed and working, and was able to poll information using the cli

/opt/micron/bin # rssdm -L -d

Drive Id             : 0
Total Size           : 700.15GB
Drive Status         : Drive is in good health
SMARTSupport         : Yes
SMARTEnabled         : Yes
Interrupt Coalescing : D200F
WriteBufferEnabled   : Yes
Power Limit Status   : Not Supported
Est. Life Remaining  : 100%

Listing the detailed drive information is retrieved successfully
CMD_STATUS   : Success
STATUS_CODE  : 0

Copyright (C) 2014 Micron Technology, Inc.

/opt/micron/bin # rssdm -L

Drive Id             : 0
Device Name          : mtip_rssd0
Model No             : Micron P420m-MTFDGAR700MAX
Serial No            : 000000001404whatever
FW-Rev               : B2085108
Total Size           : 700.15GB
Drive Status         : Drive is in good health
PCI Path (B:D.F)     : 41:00.0
Vendor               : Micron
Temp(C)              : 57

Drive information is retrieved successfully
CMD_STATUS   : Success
STATUS_CODE  : 0

Copyright (C) 2014 Micron Technology, Inc.

Note the 08 in bold, it becomes important later. I then tried to do a firmware update and got an error:

/opt/micron/bin # rssdm -T B145.03.00.ubi -n 0

Trying to update for drive 0, from current firmware B208.51.08 to B212.05.00.
Are you sure you want to continue(Y|N):Y

Unified Image update for drive 0 will take a few seconds to complete.
Please wait
........

Drive Id     : 0
Unified image update operation failed
CMD_STATUS   : Order of Unified Update : Firmware, UEFI main driver, Option ROM.                                                Firmware Update failed in Unified image download
STATUS_CODE  : 51

Copyright (C) 2014 Micron Technology, Inc.

After back and forth over the "why", and reaching out to PernixData and Micron through e-mail and twitter, finally the answer was provided to me. Remember the numbers in bold? Peter Chang, working with a Micron engineer, was able to confirm this was the source of my problems:

"It appears they are trying to upgrade disti FW on an OEM driver. Xx.xx.08 to xx.xx.00. The .08 is a Dell release.

RSSDM will prevent this operation.  This is likely the cause for the error. "


Aha! Turns out the correct page to get the Dell latest firmwares is:

http://www.micron.com/dell

Dan Florence from Micron was later able to confirm:

"Yes, B208.51.08 is the latest firmware for the Dell-branded version of P420M. Dell has their own internal testing and validation program for these drives, with their own unique cadence for releases. They also have their own warranty/service program. For these reasons a Dell drive is disallowed from loading our standard distribution firmware."

This was back in December 2014. Since then, I've seen version 145.03.08 show up in January, which is firmware "B212.05.08" - the Dell tested release of the Micron I was trying to upgrade to in the command excerpts above (see the 08 at the end?). 

It is important to note the Dell releases are somewhat delayed from Micron's, so when you see a new firmware out for Micron, just wait a bit and you will probably see Dell's. This Dell section of the Micron webiste does provide a nice little PDF explaining the upgrade process in good detail, including how a reboot and secure erase are needed.


I'm sharing this because "there's a gotcha there" and guess what this blog is about :)


Bonus - Late February 2015 I received an advance support bulletin from PernixData that new firmware is being made available for the P420m/P320h to fix a new issue, so keep an eye out on the respective locations if you are using these SSDs. 

If you are using retail SSDs in VMware and are a PernixData customer, a new firmware (B145.04.00) is available for your testing now. If you have not received such bulletin, open a case with Pernix to get the related instructions, as they are going out of their way (the issue is not related to FVP) to make sure all their affected customers are notified.

I'd like to finish saying that so far, these SSDs have behaved very very well for me and I am pleased with them. Whenever you are using new technology, you have to be ready to accept the uncharted territory - but you do expect support. The fact that both Micron and PernixData are jointly reaching out with a solution speaks to their engagement and respect for the customer, so I couldn't be happier :)

Saturday, February 28, 2015

Re-joining hosts to a vCenter with distributed switches for data and storage, and a gotcha for iSCSI while VMs are running

Recently I had an outage where we thought we had lost the vCenter database. I joined the hosts to another vCenter, while we worked around the original problem, and I was soon reminded that my hosts were using distributed switching and I had probably lost that distributed switch information. I would probably need to rebuild that vCenter and move them back.

Someone will say, what about your dvswitch backups? Well, sadly this was running 5.0. From this KB I don't have an option to backup dvswitch configs unless I'm running 5.1 or later. Yet another reason to upgrade, right?

In the end, the DB access was recovered, and I had my old vCenter. We didn't even have to restore, simply the access was there again, but this applies also with a restore. However, since I had moved my hosts to another vCenter, when I joined them back they did not immediately fall back in place.

The hosts knew they were no longer connected to the old vCenter, so they did not reconnect when it was again available. I had to add them manually. However, the vcenter had them grayed out so I had to first remove the hosts, and then add them.

Once I added them, I checked the network settings and the physical interfaces were not were they were supposed to be. I went to the networking tab and confirmed the distributed switches reported no hosts as members.

Adding the hosts was not difficult for my data networks - the wizard asked if I had to migrate any vmkernels and did a good job of pointing out which had to be migrated.

The one gotcha was when moving back the iSCSI distributed switching. I got the following alert:
This one did not appear when doing the Data switching. The alert is valid - you are moving a vmkernel with active iscsi traffic, which is possibly catastrophic. My first thought was "wow - so i'll need to get some downtime on the VMs, and possibly create another host, and migrate VMs over, before I can put this back the way it was". After meddling around and thinking, I convinced myself there should be a way of doing this without downtime, since the host by itself has active iscsi switching and I had no downtime.

The more I thought about it, the more I convinced myself there should be a way. After trying it again, I noticed the little checkbox that allows you to ignore these errors and continue :)

Now, I only advice you to do this if you are running a similar scenario - in my case, this was:
1) the same vcenter
2) the same hosts it had before
3) the same iscsi config as it was before
4) and I picked a host with a few non critical VMs as a testbed. 

The test was successful, and I was able to re-incorporate my hosts into the distributed switch with no downtime. Looking back, should I have moved the hosts to another vCenter until I made sure the DB was tuly unrecoverable? Maybe not. But I sure like knowing what's the worst that can happen now. 

VMware is a great technology in that most of the time, that particular scenario that has you in a bind has happened before, and there is a workaround already in place. I'm sure i could have googled and found a post like this as well, but in this lucky day, I was able to see it by myself. Anyways, hope it helps you. Remember the motto:

I'm sharing this because "there's a gotcha there" and guess what this blog is about :)

Bonus:

A design tip - I've asked peers what do they think of using distributed switching in storage. Most people have said "if you have the licensing, and it helps standardization, do it!" . This was in a VCAP-DCD study group anyone can join, and I do recommend it, as it's very active and full of information https://plus.google.com/117774242079675798525/posts/QYycLGQNEHT

In my particular case, where I don't have many hosts, I have opted to remain with normal vSwitches, for my new environments. This is because of a particular reason: we are trying to consolidate many vcenters into few, and this will help the host portability, and the movement of VMs from one vcenter to another without downtime. 

However, like the above post concludes, distributed switching would be useful if we had several hosts and had to distribute the work among many, since it would save time and prevent mis-configurations. Of course, if it's for data, use it! What else gives you traffic-load based load balancing? :)

Tuesday, February 10, 2015

PernixData FVP: Enabling write caching and its effect on VMware backups

This is oversimplified but I just wanted to post this as it's my explanation to my DBAs. As I keep learning the technology I will update this post.

==========

I do have “write acceleration through cache” capabilities, but they have to be disabled when whole VM backups run, because they snapshot the data on the array. This is because when you use a cache for write, some data can be on the cache and not on the array (yet) – this will lead to inconsistent backups, which is a big problem. The good news is that it’s not a problem for backup applications that operate at the VM level.

==========

I'm sharing this because "there's a gotcha there" and guess what this blog is about :)


update: @virtualbacon reminded me this whole operation is mostly automated and pointed me to an excellent post on the topic


"VADP policy automates flush for proxy VMs. For physical use PS automation" PS being power shell.

http://www.virtualtothecore.com/en/veeam-backup-pernixdata-write-back-caching/


update2: found 2 more blogs that talk about this

http://poulpreben.com/veeam-direct-san-backups-and-pernixdata-fvp/ by @poulpreben

http://www.tracinglines.net/fvp-and-vadp-simple-integration/ I couldn't find the twitter handle of the owner

 and a copy of my questions with my great Pernix resources via Twitter

began talks with commvault and they do have a VM that does the snapshots - so that VM I would 1/2
go in powercli and do a ""set-prnxaccelerationpolicy -name WHATEVER -vadp" ? 2/2 . thanks for the assist :)

You can do it right in the UI: Advanced->VADP, then add the backup VM. Set backups to hot-add. That's it!

Monday, February 9, 2015

Nvidia, multi-monitor VM and Office can cause "VMware Workstation unrecoverable error (vmui) Exception 0xc0000005 (access violation) has occurred"

There is a whole KB explaining the possible causes.

VMware product unexpectedly fails with an unrecoverable error (1008485)

Remember, the 3 conditions have to be met:


  1. Nvidia driver, doesn't seem related to what version, I had the latest WHQL
  2. Multi-monitor VM (booted as single monitor in my case)
  3. Open any office application


...and wait a bit. Workstation abnormally ends with the error in the title (in my case it's Workstation 10 on Windows, I will confirm if this happens in Fedora).

However, the summary solution is to disable 3d acceleration on the VM. I turned it off to do this first: then Edit Settings, Display, a little check that says "Accelerate 3D graphics", make sure it's emptied, and turn the machine back on.

If you are using Office 2013, you could also disable office hardware acceleration. If I find where to do that in 2010 i'll update the post.

I found the solution in this VMware community thread: https://communities.vmware.com/thread/455819?start=0&tstart=0 ; I'm sharing this summary cause "there's a gotcha there" and guess what this blog is about :)

Friday, February 6, 2015

First post - mission statement and who writes this blog

VMware is easy to pickup but difficult to master. Frequently, when I'm in a pinch, a Google search has saved my butt. I've felt immensely grateful to those people who posted their ideas and solutions and made it freely available on the internet for people like me.

I think honesty is paramount, including honesty to oneself. So, let's start with a mission statement for this blog: 

This VMware-focused blog aspires to document tips and "gotchas" that I find while working in my day to day job. It will include tips related to vendors in the VMware ecosystem that I am either testing or have personally implemented.

Since I have a day job that takes most of my non-family time, I don't expect this blog to be as active as some of the widely known bloggers'. However, I will try to write any gotchas I learn, and anything I find I helped someone else with. 

There will be tips that help people just beginning their VMware journey. I'll also post tips concerning older versions that I still use. There also will be tips which are very specific to a particular vendor or implementation scenario. The posts may seem erratic, but remember my intention is for them to show up in your Google search when you really need it.

Who am I to so mercifully spread this knowledge to the internet (pun intended)? Really, I'm someone like you, at some point in your life. I firmly believe I'm barely starting in my knowledge path; the more I dig, the more I find!

I realize my limitations, so I will gladly and humbly accept collaborations from anyone who is interested in adding to this site. The more the merrier!

I have other small and quite unattended blogs that cover a variety of my interests. I use arielsanchezmora.com to put them all in one place. You can find my information there.