Sunday, November 5, 2017

Disabling vSAN and reclaiming disks when vCenter is not available

Enabling vSAN is very easy through the vSphere GUI. As long as you have unused disk devices and have setup vSAN traffic on a vmk interface, you're basically good to go (although, for enterprise deployments, it's important to pay attention to the HCL and design documentation).

vSAN in fact exists perfectly well without a vCenter. vCenter is just the easy-to-use UI. However, if  vCenter is lost you will need to resort to esxcli commands to be able to change vSAN settings, since the standalone html5 esxi host interface does not allow you to perform vSAN commands.

This situation has happened to me when messing around in the homelab (where my vCenters are short lived, ie, I deleted it before I disabled vSAN) or in a case where I bought homelab hardware from a friend and the vCenter didn't make the trip. Since he did provide the root password to the hosts, I was able to SSH into the host and run esxcli commands.

If disks were used by vSAN and that configuration was not properly undone, you will find that those disks cannot be re-used in a new vSAN configuration, or even to create plain datastores. I believe that information is recorded in each disk, meaning even re-installing ESXi would not clear said config. 

The "delete partition" commands available through the standalone html5 interface will fail because vSAN is still running and is protecting the disks. So, before I re-install my hosts or try to re-use the storage devices, I run the below commands.

I saw other posts that achieve the same thing using fdisk and partedutil, instead of esxcli vsan commands. That would be the brute force method - below is a much simpler and safer way, that I know works in ESXi 6.5.

There's two phases. 

1) The first phase is removing the host from a vSAN cluster. Check if the host believes it's in a vSAN cluster with

esxcli vsan cluster get

Remove the host from said cluster with

esxcli vsan cluster leave

This command can take a while to take effect, be patient. The host ceases collaborating with its cluster and running the get command again should show that the host is no longer a member of a vSAN cluster. 

esxcli vsan cluster get

Virtual SAN Clustering is not enabled on this host

At this point the vsanDatastore datastore will no longer show in the host storage, but, we aren't finished!

2) The second phase is clearing vSAN config from the disks so they can be re-used. Check if vSAN "owns" the disks with 

esxcli vsan storage list

From the list, you want to identify the cache disk, typically the best performing SSD, and copy the device name. 

   Device: naa.50026b724712194a
   Display Name: naa.50026b724712194a

   Is SSD: true

  Is Capacity Tier: false

The gotcha - you can't perform manual operations on vSAN disks if they were claimed automatically when vSAN was configured. You must run this command to disable that auto-claiming before proceeding (use esxcli vsan storage automode get to check if you need to do this).

esxcli vsan storage automode set --enabled false

Now enter this command with the cache device name

 esxcli vsan storage remove -s naa.50026b724712194a

The -s means SSD. The command also accepts regular disks with -d. The thing to know is that doing the cache disk does all the disks in a diskgroup (ie, takes care of all capacity disks), so it's faster to just do the cache disk for the whole disk group.

This will take a while as well, but after completing, you should not see any output after issuing esxcli vsan storage list again (granted, you removed all disk groups).

I hope this helps anyone learning and playing with vSAN. Any corrections or suggestions, please reach out to me on twitter.

References to create this post (I do recommend them, they explain more):

  • William Lam has a great post detailing how even after disabling vSAN through vCenter, you may need to perform steps to reuse the disks.
  • jmalpadw gave a great answer in the VMTN community forums which is basically this post's commands, I just added more explanation and example outputs.