The potential performance Impact of Device Guard (HVCI)

Overview

In this blog I want to discuss the group policy setting Virtualization Based Protection of Code Integrity and how it can dramatically impact the performance of Windows 10 systems if not coupled with the latest generations of CPU’s. In particular those without Mode Based Execution Control (MBEC) support.

From what we’ve seen, there can be up to a 40% performance impact if your devices do not support MBEC.

Here is the Group Policy in question:

Performance Impact

So back in 2016 we evaluated all of the relevant Windows 10 security features and justified the implementation of almost all of them, with the exception of UMCI. Although we are advocates in application whitelisting, we felt that it was not enterprise ready from a management point of view.
We did however implement HVCI to provide virtualized protection to the KMCI component, ensure kernel mode drivers are signed, and to ensure memory pages were secured. W or X but never WX.
Its only one option in a group policy so what could go wrong?
Fast forward to 2018. Everybody is complaining of bad performance. Performance analysis showed various minor things but nothing substantial. My PC at home runs much faster right?! 😊
The eureka moment came when turning of virtualization in the firmware. We now had a fast and responsive machine. In fact it was approximately 30-40% faster! (Based on a number of user scenario based tests, e.g file copy, application open, zip extraction, math calculations etc).
What could it be? Credential Guard just protects credentials and KMCI just checks that the drivers are signed?
Further investigation showed us that HVCI was to blame.
After an escalation to Microsoft we received a response to check if we had 7th gen CPUs together with a screen shot of the new features added by Intel and a comment along the lines of “Apparently you need this feature”. They were referring to MBEC.

What is HVCI and KMCI

Well here’s a quick overview:
KMCI (Kernel Mode Code Integrity) is on by default and in its native state it ensures that all kernel mode drivers are signed. Those that are not, cannot load.
HVCI Code Integrity moves this component to the Secure kernel and uses Hypervisor technology to protect it in VTL1. It’s loaded early in the boot process and acts as a broker to ensure code is signed and controls the access rights to the memory pages where the code is stored.
It does this by leveraging SLAT (Extended Page Table). It ensures that kernel mode memory pages are signed and are marked Executable or Writable but never both. The change in permissions to memory pages must be authorized by this broker service.

HVCI also protects the Code Flow Guard (CFG) bitmap from modification.

HVCI also ensure your other Truslets, like Credential Guard have a valid certificate

Modern device drivers must also have an EV (Extended Validation) certificate and should support HVCI. In Server 2016 this is more stringent.

Investigation into how…

Now the hunt was on to find out more about this feature, and to understand how it affects performance.
I could not and am yet to find any information on this from Microsofts documentation.
What I did find was a line in the Windows Internals 7th Edition book by Mark Russinovich and Alex Ionescu. It states:

The Secure Kernel relies on the Mode-Based Execution Control (MBEC) feature, if present in hardware, which enhances the SLAT with a user/kernel executable bit, or the hypervisor’s software emulation of this feature, called Restricted User Mode (RUM).

So here is our first evidence of a potential performance point. If MBEC is not available then it uses Software Emulation in the form of RUM.

The next insight came from Rafal Wojtczuks VBS Security Analysis from 2016

The problem arises when the configuration allows unsigned usermode code and only signed kernelmode code. Intel CPUs use same EPT table regardless whether running VM usermode or kernelmode

This problem was considered in secvisor paper [scv], and although it is not stated explicitly in any official MS document, VBS-enforced KMCI uses very similar approach. The scenario below starts with usermode running unsigned code:

Naturally, in case of unsigned usermode code, such approach results in many additional vmexits, and this impacts performance. In the worst case scenario involving Hyper-V running in VM and the workload consisting of executing nonexistent syscall in a tight loop (from a location in unsigned usermode page), the performance hit was x200. Yes, 20000% slowdown. Again, signed usermode code performance is not impacted (because there are no uEPT->eEPT transitions).

The full document is here:
https://www.blackhat.com/docs/us-16/materials/us-16-Wojtczuk-Analysis-Of-The-Attack-Surface-Of-Windows-10-Virtualization-Based-Security-wp.pdf
or youtube video
https://www.youtube.com/watch?v=_646Gmr_uo0

OK. So what about having this information from official sources. Microsoft?!
The Bluehat conference from February 2018 seems to be the first time where they admit that there was/is a performance issue.
https://www.youtube.com/watch?v=8V0wcqS22vc
Watch from 16:15 hear about HVCI, how it works and why MBEC is important, especially for performance reasons!

Here is what imho appears to be the timeline of events for this particular technology.

  1. Microsoft design Code Integrity with HVCI. Hardware requirements were shared to OEM (but include no performance information). MBEC was not even present until 2017 so is obviously not one of them, nor is it to this day. It is only a requirement for their new 2018 “Secure System” standard.
  2. Performance and security issues were observed in 2015/16 (as per blackhat presentation)
  3. Microsoft discuss adding additional attributes to the extended page table (SLAT) with Intel
  4. In 2017 Intel added this technology to their 7th generation CPUs
  5. In 2018 Microsoft make HVCI a component which is always enabled if MBEC support is present. On by default. Kind of an indirect admittal imho that the tech was not enterprise ready?

How MBEC Helps

MBEC, aka Extended Extended Page Tables, adds additional attributes to the SLAT table that allows the kernel to differentiate kernel and user mode pages. As well as KMX and UMX attributes becoming separate hardware bits.
HVCI no longer has problems with sharing the page table between kernel and user mode causing VMExits as described in Rafal’s analysis.

How to turn it off

If you have the justification to turn it off then the next question is whether you implemented this feature with a UEFI Lock.
If you have no UEFI Lock. Simply disable the policy and reboot the client.

If you have then you are in for some pain..

The UEFI lock basically prevents this feature from being turned off by a hacker. It will not be possible to simply disable the group policy to have HVCI disabled.
The Lock updates the BCD Configuration and creates an EFI variable(s). If one of these properties is changed, the system will detect the other one and add it back automatically.
The only way to remove the lock that I am aware of is to use the Device Guard Readiness Tool as it contains a special EFI file to perform this action.
https://www.microsoft.com/en-us/download/details.aspx?id=53337
After running this tool, the user will be presented with a screen after the next reboot to confirm whether they want this Virtualization feature turned off. By pressing F3 HVCI will be off. So physical presence is also required.

Lessons learned and various thoughts

Performance testing

I’m a strong supporter of performing a layered performance stack analysis. Analysing the impact of each layer of all windows components applied to the Operating System.
If this were done after the design phase then we would have understood the impact from the very beginning. Its always important to try and balance Security requirements with User Experience.

Microsoft’s strategy and reaction to the changing security landscape

Take a look at the bluehat conference again
https://www.youtube.com/watch?v=8V0wcqS22vc
I am hugely impressed with Microsoft’s strategy and reaction to the changing security landscape. They pushed Intel to extend the EPT table and introduce new attributes!
However they are working on a lot more:
Pillars of security
System Guard with DRTM (Dynamic Root of Trust Measurement) together with dynamic Attestation
Improved DMA protection
SMM protection

Continual evaluation of the security stack

As Microsoft are continually adapting to the security landscape, Companies also need to stay on top of what is evolving, what’s changed and to re-evaluate their current configuration.
In some cases, like HVCI, there is a reliance on Hardware. Depending on where a company is in its Hardware Lifecycle, it may be a few years before they can use it OR (depending on Security requirements) this may need to be expedited.

The importance of hardware and understanding it

Typically, during the high level design phase of an Operating System migration project, all of the new features are analysed, scrutinized, decided upon and either implemented or not.

Windows 10 introduced many new security features, most of which naturally required the implementation of other security features, to provide complete “root of trust” protection.

Note: A small gripe but Documentation was all over the place, in many cases out-of-date, incorrect (TPM requirement for Cred Guard?) or lacking detail. It is getting better lately however.

Many of these features also require the support of specific hardware features.

Here are a few examples

Q: Does my device support Device Guard? It has Vtx , VTd and SLAT  support in the firmware, so yes?

A: It is critical to confirm this with the Vendor. Lenovo for example do not support Device Guard on devices using CPU’s earlier than Skylake. You risk Blue screens if you don’t heed their warning.

Furthermore, the Device Guard standard is ever evolving. Although the base requirements may be met, the optional requirements may not and the list of requirements grows with each feature release. Hardware vendors are having a hard time keeping up but to Microsoft’s defence, they are adapting to a changing security landscape. Some vulnerabilities can be mitigated in software, some cant. Examples of optional requirements include Secure MOR v2, NX protection, SMM Protection. There are many more.

Does your hardware support these optional requirements? This should also be evaluated as there are very good reasons why they were added to the standard

Also be aware that there is a new “Secure System” standard.

Q: Can I use TPM2.0 or 1.2?

A: Well TPM2.0 is more secure but technically either BUT watch out for TPM lockouts on 1.2. Ownership passwords are important to preserve in this case.

Also be aware that Secure Boot (PCR7) Attestation support  is not part of the TPM1.2 standard. This can affect your security expectations if you are attempting to seal your Bitlocker key to PCR7.

If TPM2.0 is running in firmware (i.e not in a discrete chip), e.g  using Intel PTT technology then be aware that it is this running from Intel’s Management Engine and that it requires a connection to the internet to re/provision its EK Certificate.

These are just a few examples and there are plenty more.

Microsoft need to improve their transparency

Update Documentation and notify customers of performance impacts please!!