Tech: SmartCloud Provisioning

2022, Jul 31    

In what can be considered a flashback to my early IT career and a fun (yet incredibly challenging) project to work on, we arrive at my time with a product known as SCP.

In my early technical career I spent a lot of time working with virtualisation products/tooling, from the early incarnations of Xen and KVM, though to Hyper-V and ESX (before ESXi was released). As a large portion of my time was spent reworking demo VM's that needed to run as fast as possible for sales demonstrations I became quite familiar with the in's and out's of the technologies and what could be achieved.

Fast forward a few years and while working with the different hypervisor technologies on a daily basis a new product was released called SmartCloud Provisioning (or SCP for short). It leveraged OpenStack under the covers and was designed for incredibly fast provisioning of systems, in some cases 10 seconds or less. Of course the technology at the time was in its infancy, but as a tool for deploying new VM's it had promise.

With every new tool/technology however comes the technical challenges, doubly so when the sales teams get involved. Pitching a tool that promises multi-cloud deployment of systems sounds great but you actually need to be able to demonstrate it. A remote demo is always a risky endeavour (especially in Eastern Europe with poor internet connections at the time) so you need something you can take with you but ideally not a flight case that weighs the same as a small car.

Part of the hardware provisioning I would do entailed setting up Linux laptops with VMware Workstation so the sales team could run their VM's and demo the software (easy enough). The frustration was always that when a laptop was returned it would need to be wiped and reprovisioned ready for the next user. While this process is automated (thanks to some scripting magic and PXE booting, a story for a different day) it still wasn't ideal. When trying to add a hypervisor into the mix that OpenStack could talk to (without doing a Xen DomU to run your software stack) it all becomes messy.

After working on this challenge for a while I was graced with a new release of ESXi, which while planning for upgrades within our environment I spotted it had a new experimental feature (when the underlying CPU supported it). This newest version of ESXi allowed you to enable nested virtualisation, allowing a guest VM to run another VM inside of it (think of it like the movie Inception but things get slower each level deeper you go).

After thinking about the possibilities of this for a while it was time to build a custom ESXi ISO and see if I could get it to install on a Thinkpad. A few late nights later (and a custom Ethernet driver) I had ESXi booted and fully functional on a test laptop.

For the non-technical readers I apologise in advance (as it only gets more technical from here...)

With ESXi now being the primary OS on a max-spec Thinkpad (seriously, I loved the hardware I used to work with) this immediately made one aspect of the demo very easy. SCP could be installed as a guest VM that auto-boots and connects to ESXi, allowing provisioning of local instances. All the sales team would need to do is plug their work laptop into the loaner laptop and they could access the UI and the newly provisioned system/the hypervisor. This sounds great, but two questions need to be asked at this point:

1) You don't need nested virtualisation to achieve this, why get excited about it?
2) This doesn't sound like a challenge, is this all you did?

Fair questions, to which I have the following answer (combining both)... I didn't plan to run a single hypervisor on the laptop, I planned to run ALL of the supported x86 hypervisors on there and have it provision to all of them at the same time to show the potential of SCP and there wasn't any trickery. Additionally, I planned to have all of this as an auto-deployed stack via PXE, with the ability to revert all of the hypervisors back to a clean state after a demo (to save the reprovisioning effort). Now you (hopefully) see the challenge :-)

So... Nested virtualisation, ESXi, KVM, Hyper-V, Xen, both Linux and Windows images for each (with the same cloud provisioning scripts in), a slim OS running SCP, and a lightweight cross-platform demo stack that shows components on the different hypervisors all talking to each other for good measure. Not your typical Tuesday!

It took a fair amount of work to get things running smoothly, from what I recall the better part of 5-6 weeks (to this day I still appreciate the confidence/patience my former manager had in me for this project of mine). It also required the fastest laptop I could get my hands on and a remarkably expensive enterprise SSD (which were rare at the time) to be able to handle the disk I/O.

The end result: the ability to demonstrate a multi-cloud provisioning tool on an easily transportable laptop that involved zero trickery and could actually be used. It worked so well that after getting this into an easily deployable form (that the sales teams could use) I went on a European roadshow with one of the lead salesmen, covering the technical aspects of how it works, how it was done, and how it would be of benefit. I have to admit, I actually felt a sense of pride in this piece of work :-)

The question after all of this (and to most of the projects I create) is why... In truth, since working on the embedded hardware within consumer routers and the bifferboard I've always felt motivated to use hardware to its maximum potential. Running 4 different hypervisors on a laptop at the same time and being able to provision guest systems and then use them pushed the laptop to its absolute extreme (with a toasty CPU, the 32GB RAM 100% used, and the SSD getting a serious workout).

That, and sometimes I just like doing the unconventional ;)