| agrajag ( @ 2006-04-13 12:58:00 |
Most of you are probably aware that Apple recently released Boot Camp, software which allows an Intel Mac owner to easily setup booting into either Mac OS X or Windows XP (apparently Linux also works fine, NetBSD has some issues though). There has been quite a lot of chatter about this in the mass media circles. In geekier circles, another product got some attention: Parallels Workstation. Parallels Workstation is a virtual machine system that allows you to run Windows inside of Mac OS X (only on an Intel Mac), similar to Virtual PC except much faster. This is a pretty exciting technology so I went ahead and downloaded the Linux version and took a look at it.
The results worried me. What I found looked like a wide open invitation to very serious exploits. Through my job I have gained a fair amount of understanding of the x86 architecture, operating systems in general, and virtualization technologies. At the risk of being overly immodest, I consider myself to be an expert in the combination of these three concepts with respect to security (Note: I am certainly not an expert in any one of them individually). I've been debating whether or not I want to post on this subject for the past few days, but today I saw a New York Times article singing the praises of Parallels' product and decided I should get out my soap box and bore you all to tears with some serious geekery.
Long story short: based on a partial source code analysis of the Linux version of Parallels Workstation, I would not install this product on any system that I really care about (i.e. one that houses sensitive data or from which I access the outside world on a regular basis). It is certainly an interesting product and I may install it on a spare test machine, but I foresee major security concerns and so would keep it off important systems. For those of you who want more information, follow the cut below.
My exploration of Parallels product was inspired by reading the following blurb on their website:
Parallels Workstation 2.0 is the first desktop virtualization solution to include a lightweight hypervisor, a mature technology originally developed in the 1960s to maximize the power of large mainframes. Hypervisor technology dramatically improves virtual machine stability, security and performance by using a thin layer of software, inserted between the machine’s hardware and the primary operating system, to directly control some of the host machine’s hardware profiles and resources. It not only makes Parallels Workstation-powered virtual machines secure, stable and efficient, but also empowers users to immediately realize the benefits associated with Intel VT hardware virtualization architecture.This interested me because I have a fair amount of experience with the Xen Hypervisor and am pretty well familiar with the concept. To understand what a hypervisor is and why it is good, you need to be familiar with how a computer works with a traditional operating system.
The x86 family of processors (and as far as I can tell most others) support, in hardware, a notion of privilege. In the x86 architecture, privilege is specified as a "Ring" in which the processor is currently operating. The rings are indexed from 0 to 3, with 0 being the most privileged and 3 being the least privileged. In most cases (read basically all widely used operating systems), the operating system runs in ring 0 and its processes run in ring 3. This gives the operating system the ability to do things like talk directly to devices, read any piece of memory it wants, and well pretty much anything else you can think of. A process on the other hand must ask the operating system to do these sorts of things on its behalf.
The idea of a hypervisor is to introduce a small, fast, stable component under the operating system. In Xen, the operating system gets pushed up to ring 1, and the hypervisor sits in ring 0. This makes the hypervisor the most trusted piece of software on your system (Note: a piece of software is said to be trusted if it has the potential to violate your security policy, this does not imply that it is trustworthy). On top of the hypervisor you can then multiple, independent virtual machines, each with its own operating system and processes. The hypervisor is responsible for managing virtual machines: it schedules them similarly to how an operating system schedules processes, it provides memory for them, and it usually offers some sort of inter-VM communication primitives. The hypervisor does not manage devices in any way. This is the aspect of the hypervisor that gives it the potential to be worthy of the trust we place on it; Microsoft reports that something like 90% of all operating system level crashes are due to faulty device drivers, Linux device drivers are a complete mess, and I doubt OS X device drivers are much better. There are a lot of different devices one might plug into a computer, and a lot of them operate poorly or have poorly written drivers; by removing them from the concern of the lowest level component on your system, you make that component much simpler and more likely to be stable.
The best part of this is that it much faster than traditional "hosted" virtualization solutions where the virtual machine manager runs as a process in a host operating system (like VMWare or Virtual PC). The reason for this speed boost is explained by comparing the way in which a file is opened by a process.
In a non-virtualized operating system, if process A wants to open a file it makes a "system call" which causes a "context switch" from the process' execution environment (Ring 3) to the operating system's environment (Ring 0), the operating system then looks up the file to be opened (which may involve talking to the hard-drive), opens the file, and returns a "descriptor" to the process (causing another context switch back to Ring 3). The process can then use this descriptor to read and/or write to the file. The important detail wrt this description is that context switching is slow (relatively speaking). So remember, under normal operations every time a file is opened (or closed, or read, or written, etc) there are 2 context switches.
In the "hosted" virtual machine case if a process in a VM tries to open a file the sequence of events starts out the same way, the process makes a system call to its OS (the one running in the virtual machine) which causes a virtual context switch, the virtualized OS then tries to talk to its hard drive, but since it isn't running on real hardware, the virtual machine manager traps this attempt a turns it into a system call to the real operating system which causes a real context switch, the real operating system then does its magic and returns the result (another real context switch) to the virtual machine manager, the virtual machine manager then hands the result to the virtualized operating which returns it to its process (another virtual context switch). So this requires the same 2 real context switches plus 2 virtual context switches.
In the hypervisor case, the virtualized OS is able to talk directly to its hard drive, so everything behaves identically to how it does in the non-virtualized case.
This is not 100% accurate in all cases, but it is more or less accurate, and similar patterns of actions occur frequently. Basically hypervisor based systems need to make fewer context switches which means they run faster.
There are a few downsides of hypervisors, previously the problem was that it required modifying the operating system to be aware that it was not running directly on the hardware (specifically parts of the OS's memory management code and a few other very low level details needed changing). This situation is solved by new features in intel processors known as VT (VT now seems to stand for "Virtualization Technology", it used to stand for "Vanderpol Technology") which are able to hide a lot of the hypervisor's voodoo from the operating system. The exact details of this are beyond the scope of this discussion. The other major drawback is that devices have to be shared in some meaningful way, this is often handled by having a "primary" virtual machine that talks to the devices and provides idealized interfaces to those devices that other domains can use. This "primary" machine is not fundamentally different from any other machine running on the platform, it just talks to the devices directly. Typically it also has a managerial role and handles things like creating and destroying VMs. Again, this is not a fundamental difference, just different privileges and installed software.
Ok, now that we've gone over what a hypervisor is, why hypervisors are cool, and what their shortcomings are we can get back to Parallels. The thing that struck me about Parallels claiming that their product used a hypervisor was that in the install documents, there was no mention of needing to reboot. In a true hypervisor, you have to reboot because the thing you are booting isn't your operating system it is the hypervisor (which then builds a virtual machine with your operating system). I was very confused. So I downloaded the Linux version of Parallels Workstation and took a look at what it was doing.
What I found was that it consists of a user space process, and a number of kernel modules (bits of code that can be injected into a running operating system, in OS X these would be called "Kernel Extensions"). The user space process is distributed as a binary (not source), but the kernel modules include their source (because they have to be build to match the kernel you're running). Since the kernel modules are the pieces that will be running in ring 0 anyway, I went poking about in their source. Before too long I found the following code snippet:
Extracted from parallels-2.1.1670-lin/data/drivers/drv_main/ioctls.c
<snip>
if (copy_from_user(&mFunc, arg, sizeof(struct monitor_functions_def_t) * MONFUNC_COUNT))
break;
/* setup functions pointers */
for (i = 0; i < MONFUNC_COUNT; i++)
param->iData.MonitorFuncs[i] = (monitor_funct_t)mFunc[i].fId;
/* initialize callbacks */
vmSetExports(param);
/* Monitor open */
if (param->iData.MonitorFuncs[MONFUNC_OPEN]) {
ret = param->iData.MonitorFuncs[MONFUNC_OPEN](¶m->drvInfo, 0, param);
}
</snip>
The above snippet is part of the handler function for the "ioctl" (input/output control, pronounced eye-oct-al) system call (remember system calls are how unprivileged processes get the privileged kernel to do things on their behalf) for a device file created by the driver. What the code snippet does is copy its arguments in from user space (that's the copy_from_user bit on the first line), and store them in an array (that's the params->iData.MonitorFuncs[i] = ...). In general this is not a particularly bad thing, except for the fact that the data being copied in are "pointers" (indirect references) to functions in the calling process. These function pointers will presumably later be used as event handlers for a virtual machine. This means that these functions will get called by your real operating system kernel (running in Ring 0). In fact, one of them does get called on the second to last line (the one starting with ret = param->).
This means that any process that is able to make this system call on the device in question is able to introduce its own arbitrary code into your kernel. Surreptitiously introducing code into the kernel is generally referred to as a rootkit. Now, there are some protections on this; specifically the caller must be able to produce a special "salt" (like a really long password of random characters) that was generated when the module was loaded. But if someone does figure out this salt, they will have complete, unbridled access to literally everything on your computer (e.g., you're bank account number as you type it into your online banking website), and they will be able to alter the way in which the operating system does things like interpret the filesystem (e.g., if they don't want you to see the file "Evil Hacker Toolbox of Doom" you won't, even if you're an administrator). Really, subverting the ability to introduce random code into the most trusted component of a system is the ultimate ability.
Although I have only looked at part of their Linux version and none of their OS X version, my understanding of their design is that they're "hypervisor" straddles the divide between your operating system kernel, and the user space of your kernel by passing function references from a user land process to their modules. This design realizes known of the stability enhancing properties of the true hypervisor design because it shares an execution context with the primary OS kernel (it does take advantage of the speed benefits), and I believe it to be inherently insecure because it provides an easily exploitable mechanism by which arbitrary code can be run in a highly trusted execution environment. This is why I do not trust Parallels Workstation.
If I am wrong about my interpretation of Parallels design I urge someone more familiar with the inner workings of their software to set me straight. Because Parallels software is mostly closed source, I believe the only people who are likely to be qualified to correct me are engineers working for Parallels, however if anyone is welcome to try to dissuade me. I would also just like to take the chance to urge Parallels to provide full source code to their product so that others in the community are able to either verify my suspicions, or to disprove them (or maybe even verify that the problems exist, and correct them).
Update (4/14/06 12:45): I've taken a little look at VMWare's free vmplayer software. It does involve a couple of kernel modules (one for networking, and one for the main virtual machine monitor). I have not had the opportunity yet to do anything more than a preliminary poke of the source code to these modules, but my first impression is that they do not seem to behave similarly to Parallels approach of calling random code from the kernels execution context. I suspect they do however enable to guest operating system to run at a higher privilege level in order to more efficiently interact with its devices.
Ok, I'm off the soap box and going to get lunch.