Saturday, February 28, 2015

Re-joining hosts to a vCenter with distributed switches for data and storage, and a gotcha for iSCSI while VMs are running

Recently I had an outage where we thought we had lost the vCenter database. I joined the hosts to another vCenter, while we worked around the original problem, and I was soon reminded that my hosts were using distributed switching and I had probably lost that distributed switch information. I would probably need to rebuild that vCenter and move them back.

Someone will say, what about your dvswitch backups? Well, sadly this was running 5.0. From this KB I don't have an option to backup dvswitch configs unless I'm running 5.1 or later. Yet another reason to upgrade, right?

In the end, the DB access was recovered, and I had my old vCenter. We didn't even have to restore, simply the access was there again, but this applies also with a restore. However, since I had moved my hosts to another vCenter, when I joined them back they did not immediately fall back in place.

The hosts knew they were no longer connected to the old vCenter, so they did not reconnect when it was again available. I had to add them manually. However, the vcenter had them grayed out so I had to first remove the hosts, and then add them.

Once I added them, I checked the network settings and the physical interfaces were not were they were supposed to be. I went to the networking tab and confirmed the distributed switches reported no hosts as members.

Adding the hosts was not difficult for my data networks - the wizard asked if I had to migrate any vmkernels and did a good job of pointing out which had to be migrated.

The one gotcha was when moving back the iSCSI distributed switching. I got the following alert:
This one did not appear when doing the Data switching. The alert is valid - you are moving a vmkernel with active iscsi traffic, which is possibly catastrophic. My first thought was "wow - so i'll need to get some downtime on the VMs, and possibly create another host, and migrate VMs over, before I can put this back the way it was". After meddling around and thinking, I convinced myself there should be a way of doing this without downtime, since the host by itself has active iscsi switching and I had no downtime.

The more I thought about it, the more I convinced myself there should be a way. After trying it again, I noticed the little checkbox that allows you to ignore these errors and continue :)

Now, I only advice you to do this if you are running a similar scenario - in my case, this was:
1) the same vcenter
2) the same hosts it had before
3) the same iscsi config as it was before
4) and I picked a host with a few non critical VMs as a testbed. 

The test was successful, and I was able to re-incorporate my hosts into the distributed switch with no downtime. Looking back, should I have moved the hosts to another vCenter until I made sure the DB was tuly unrecoverable? Maybe not. But I sure like knowing what's the worst that can happen now. 

VMware is a great technology in that most of the time, that particular scenario that has you in a bind has happened before, and there is a workaround already in place. I'm sure i could have googled and found a post like this as well, but in this lucky day, I was able to see it by myself. Anyways, hope it helps you. Remember the motto:

I'm sharing this because "there's a gotcha there" and guess what this blog is about :)

Bonus:

A design tip - I've asked peers what do they think of using distributed switching in storage. Most people have said "if you have the licensing, and it helps standardization, do it!" . This was in a VCAP-DCD study group anyone can join, and I do recommend it, as it's very active and full of information https://plus.google.com/117774242079675798525/posts/QYycLGQNEHT

In my particular case, where I don't have many hosts, I have opted to remain with normal vSwitches, for my new environments. This is because of a particular reason: we are trying to consolidate many vcenters into few, and this will help the host portability, and the movement of VMs from one vcenter to another without downtime. 

However, like the above post concludes, distributed switching would be useful if we had several hosts and had to distribute the work among many, since it would save time and prevent mis-configurations. Of course, if it's for data, use it! What else gives you traffic-load based load balancing? :)

Tuesday, February 10, 2015

PernixData FVP: Enabling write caching and its effect on VMware backups

This is oversimplified but I just wanted to post this as it's my explanation to my DBAs. As I keep learning the technology I will update this post.

==========

I do have “write acceleration through cache” capabilities, but they have to be disabled when whole VM backups run, because they snapshot the data on the array. This is because when you use a cache for write, some data can be on the cache and not on the array (yet) – this will lead to inconsistent backups, which is a big problem. The good news is that it’s not a problem for backup applications that operate at the VM level.

==========

I'm sharing this because "there's a gotcha there" and guess what this blog is about :)


update: @virtualbacon reminded me this whole operation is mostly automated and pointed me to an excellent post on the topic


"VADP policy automates flush for proxy VMs. For physical use PS automation" PS being power shell.

http://www.virtualtothecore.com/en/veeam-backup-pernixdata-write-back-caching/


update2: found 2 more blogs that talk about this

http://poulpreben.com/veeam-direct-san-backups-and-pernixdata-fvp/ by @poulpreben

http://www.tracinglines.net/fvp-and-vadp-simple-integration/ I couldn't find the twitter handle of the owner

 and a copy of my questions with my great Pernix resources via Twitter

began talks with commvault and they do have a VM that does the snapshots - so that VM I would 1/2
go in powercli and do a ""set-prnxaccelerationpolicy -name WHATEVER -vadp" ? 2/2 . thanks for the assist :)

You can do it right in the UI: Advanced->VADP, then add the backup VM. Set backups to hot-add. That's it!

Monday, February 9, 2015

Nvidia, multi-monitor VM and Office can cause "VMware Workstation unrecoverable error (vmui) Exception 0xc0000005 (access violation) has occurred"

There is a whole KB explaining the possible causes.

VMware product unexpectedly fails with an unrecoverable error (1008485)

Remember, the 3 conditions have to be met:


  1. Nvidia driver, doesn't seem related to what version, I had the latest WHQL
  2. Multi-monitor VM (booted as single monitor in my case)
  3. Open any office application


...and wait a bit. Workstation abnormally ends with the error in the title (in my case it's Workstation 10 on Windows, I will confirm if this happens in Fedora).

However, the summary solution is to disable 3d acceleration on the VM. I turned it off to do this first: then Edit Settings, Display, a little check that says "Accelerate 3D graphics", make sure it's emptied, and turn the machine back on.

If you are using Office 2013, you could also disable office hardware acceleration. If I find where to do that in 2010 i'll update the post.

I found the solution in this VMware community thread: https://communities.vmware.com/thread/455819?start=0&tstart=0 ; I'm sharing this summary cause "there's a gotcha there" and guess what this blog is about :)

Friday, February 6, 2015

First post - mission statement and who writes this blog

VMware is easy to pickup but difficult to master. Frequently, when I'm in a pinch, a Google search has saved my butt. I've felt immensely grateful to those people who posted their ideas and solutions and made it freely available on the internet for people like me.

I think honesty is paramount, including honesty to oneself. So, let's start with a mission statement for this blog: 

This VMware-focused blog aspires to document tips and "gotchas" that I find while working in my day to day job. It will include tips related to vendors in the VMware ecosystem that I am either testing or have personally implemented.

Since I have a day job that takes most of my non-family time, I don't expect this blog to be as active as some of the widely known bloggers'. However, I will try to write any gotchas I learn, and anything I find I helped someone else with. 

There will be tips that help people just beginning their VMware journey. I'll also post tips concerning older versions that I still use. There also will be tips which are very specific to a particular vendor or implementation scenario. The posts may seem erratic, but remember my intention is for them to show up in your Google search when you really need it.

Who am I to so mercifully spread this knowledge to the internet (pun intended)? Really, I'm someone like you, at some point in your life. I firmly believe I'm barely starting in my knowledge path; the more I dig, the more I find!

I realize my limitations, so I will gladly and humbly accept collaborations from anyone who is interested in adding to this site. The more the merrier!

I have other small and quite unattended blogs that cover a variety of my interests. I use arielsanchezmora.com to put them all in one place. You can find my information there.