Recently Lui et al. released a new very interesting paper. This paper will be the topic of this blog post. In my “Rowbuffer side channel attacks“. I speculated that CAT would be interesting as mitigation for cache side channel attacks. It turns out that I was right about that. I did however get some of the details of how CAT works wrong. Yuval Yarom of University of Adelaide was kind enough to correct me and that became an errata. This new paper, that Yuval co-authored, how ever proves that CAT could indeed play an import role in defending against cache attacks. They call their method “CATalyst”. In this blog post I make an ultra short and incomplete summary of how CAT and CATalyst work for some context. If you wish to get into the details of it you really need to read the paper. Though I will be critical of CATalyst in the this blog post, I should make it very clear that I consider CATalyst a big step forward towards defense against cache side channel attacks and that I’d much prefer to have CATalyst than nothing at all – which it’s my impression is the current state of cloud computing.
Ultra short summary of how CAT works
Cache allocation technology or CAT is a feature of some newer Xeon’s. It’s not meant to make it into consumer PC’s. Newer means some Haswell’s has CAT, I’m not sure how common it is in Broadwell or Skylake. CAT was meant as a performance enhancement tools to make it allow a supervisor or hypervisor to allocate more L3 cache to performance critical operations. Essentially the hypervisor or supervisor can assign a “Class of service” (Short COS) to an MSR for an application. There are currently 4 COS’s. Each COS is associated with a bitmask that describes which ways of the cache can be evicted under this service class. Thus any application can have reads/writes and flushes served from any way just as before, however only evict ways that is compatible with the COS. The intended effect is to shield high priority applications from competitive eviction by lower priority applications.
Using CAT to defend against CSC
Since flush can be served from any way in the cache as always, CAT is incapable of defending against flush+reload and flush+flush. However those attacks can be thwarted avoiding sharing memory across trust boundaries. Liu et al. uses this to defend against these attacks.
This makes the most important enemy “Prime + Probe”. Recall prime and probe works by priming a cache set to consist of attacker only data. Then wait for victim activity and finally access the data it had in the cache – if it incurs a cache miss the attacker knows that the victim used the cache set. With CAT able to prohibit evictions in parts of the cache, it is capable of making it impossible to prime the cache and thus make prime + probe impossible.
However it would be bad karma to just deny eviction as a general rule because that’ll come at a huge performance cost. Deny eviction and the cache cannot adapt to the ongoing work and become inefficient. With 4 COS values it’s however possible to partition the cache in to only 4 pieces. That again is a bad idea, because in the cloud where you typically have say 16 to 64 simultaneous VM’s on one computer. It could only make it more difficult for the attacker to get co-location by factor 4, but not thwart anything.
Liu et al. deals with this problem rather elegantly. They divide the cache in two: A secure partition and insecure partition. Since the secure partition is shared among all VM’s prime + probe would still be possible. To deal with this only so much data can be loaded into the secure partition that absolutely everything can be cached. They then make sure that everything in the secure partition is then cached. The size of the secure partition is kept small to minimize the performance penalty. Finally the VM keeps control of the COS and bitmask to prevent any evictions in the secure partition. Thus everything in the secure partition is pinned in the L3 cache and access to memory in the secure region will never miss the L3 cache. The VMM exports API to the VM’s operating systems to load and unload pages into this secure partition for applications in the VM’s to use. To prevent a DoS attack on this each VM can only load it’s share of secure memory.
Implementation is made possible through the fact that a VMM gets called when MSR’s are written. Thus it can reserve a COS for itself for loading pages into the secure partition and prevent VM’s from gaining this COS as well as setting a bitmask that would allow access to this.
The really cool thing about this is that it’s the first suggested mitigation so far that both looks iron clad and is feasible in current hardware without being tied down to other micro-architectural designs such as the complex-address function for cache allocation and at a reasonable performance cost as well.
Another feature of CATalyst is that it will make row buffer side channels quite difficult as reads are served entirely from cache, thus not activating a row in D-ram. It might not be all together impossible to use row buffer side channels in this as secure partition may share rows with insecure memory and r/w memory can cause row activation when write operations trigger write-back operations for reasons of cache coherency. I don’t think this is a real problem though.
Problems with this method
I see three problems with this method however:
1) Modern hardware is required
2) Changes to VM, OS and applications are required
3) The secure partition is small.
The requirement for modern hardware is a short term problem. In time modern hardware will be available in the cloud and thus it’s not really a big issue.
Changes to VMM, OS and applications
For the second problem OS and VMM could probably be easily changed with the current focus on the cloud. In fact I think the OS changes needed could be implemented by 3rd Party drivers and thus easily made available. Applications pose a much bigger issue though. Typically a cloud consumer uses multiple applications each with their own security requirements and they need to be updated to support CATalyst etc. This problem could be overcome, but certainly a bigger issue than the first. To my knowledge, historically security implementations that require changes to a range of applications have not fared well in the market place.
The small partition size
The real issue for me with CATalyst is the secure partition is small. The obvious maximum size is the cache size of the system to be shared by all VM’s (say 20 megabytes). That however would be prohibitively expensive performance wise – which is why the authors are talking about taking only a 10th of that effectively limiting the secure partition size to 4-16 pages per VM. Increasing the size is probably associated with (near) exponential performance cost: Caching the most used cache line brings more than caching the 20th most used cache line.
The authors argue that this isn’t a problem: “A few secure pages are usually sufficient
for protecting the security-sensitive code and data since these are usually small”. I beg to differ. For a single crypto implementation this is obviously true. The problem is that many VM’s will have a number of crypto implementations and operation typically threaded. Remote access technologies may very well use a different implementation than the web shop, that again uses a different one than SSH and so on and so forth. Not to mention custom implementations. But security requirements doesn’t limit themselves to crypto. Oren et al.  Showed us that we can infer mouse movement and implement covert channels with cache side channel attacks. Gruss, Spreitzer & Mangaard showed us that we can spy on keyboard input in some circumstances. Why care about crypto if your password leaks? I dare speculate that using machine learning technology combined with cache side channel attacks on the non-secure partition a great many secrets could be leaked from a co-located victim. After all, which information we each consider sensitive is subjective and cache side channel attacks apply to a very wide range of data.
Even if each of these things fit individually in the secure partition bad programming practices, hangs and crashes are likely to, at some point, allow a VM to DOS itself by filling the secure partition and not release appropriately. Not to mention the overhead that would come from often loading pages into secure memory. Further I wouldn’t trust software developers in general to correctly identify security critical code in the first place.
 Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas, Gernot Heiser, Ruby B. Lee (2016). CATalyst: Defeating Last-Level Cache Side Channel Attacks in Cloud Computing. https://ssrg.nicta.com.au/publications/nictaabstracts/8984.pdf
 Gruss, Spreitzer & Mangard (2015): “Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches”