Thursday, April 2, 2015

Thoughts on H.A.R.E.S, harvard, mining and leverageing

Update 19-6-2015

I didn't explicitly write it in the original blog. But I should've. The methods below would be useful for detecting Shadow walker type root kits too.

Original post starts here:

I finally got a new hard disk and I'm now spending most of my malware hobby time trying to extract features for machine learning from 70000 PE executables and put them into a database. This unfortunately takes a while. But stay with me I'll get back to ML soon enough.

Download frequency tables here


While I'm working on other stuff here is a short note on something I found extraordinary interesting: The H.A.R.E.S. white paper. by Jacob Torrey. The white paper is pretty short, and the slides from his presentation on SyScan (which I did not attend) doesn't bring much more light to the thing.  So forgive me if I'm in accurate. Also I'm writing this on the side while running 70.000 malware executables through my feature grabbing tool and monitoring that everything is ok. So no code this time, just words.

What is H.A.R.E.S and why do I like it and fear it at once

To summarize really shortly:
Mr. Torrey wants to build a DRM style utility in which the protection is that the code segment can be executed but not read or written. To implement this he uses that the memory manager (MMU) in the modern x86 computers keeps separate caches for page tables (TLB's) for data and execution. He then manipulates the execution cache to point to real data, and the data cache to point in "nirvana". This trick is that makes PaX  on linux work or the Shadow walker root kit. To avoid the attacker from reading the code in the original file he proposes to encrypt with AES-128 for which the modern intel processors has instructions for. To protect the key and the page tables he installs everything in a "thin" virtual machine so that everything the host os (That includes the attacker) does is directly with hardware except for decryption and protection of the TLB's. Read his white paper for details  - I find it quite good.

I find this interesting because in many ways it's a step towards a new computer architecture. x86 was originally build on the von Neumann architecture. For this discussion the important part of the von Neumann architecture is that there is no distinction between code and data.  A competing architecture the Harvard architecture distinguish between them. A great many of the bad behavior that is possible in x86 architecture is possible in the von Neumann architecture but not in the Harvard (stack execution, trampoline hooks, self modifying code, many code injection techniques ...). Mr. Torrey's idea effectively imposes this Harvard feature on a von Neumann platform and that is a very interesting thing - at least from a theoretical perspective. The Gruq however noted that though at first it sounds like a blessing for InfoSec, it's really is a curse in disguise. Imagine no way to audit code for maliciousness, no signature, no code heuristics only behavior analysis to identify if something is malicious or not. Mr. Torrey suggest some weaknesses in his methods that might be useable. I'll get back to that in a moment.

I doubt that Mr. Torrey's thing will find much real world application, but it might be an important stepping stone towards an architectural change that is much needed. That alone makes it worth peaking into and not to mention it seemed like a good semi-technical topic for a fun blog post. I have no time to do code on this and since I doubt this will have real world implication I find it ok to stick with speculation.

A detour

The first problem that Mr. Torrey has is that code in real executables are often interspersed with data. Here a string, there a jump table etc. If you've disassembled enough you've seen it. Since not even the code it self can read this data we have a problem. I'd suggest that a solution to this problem is identify as much code as you can in the code section using standard methods (Performance counters based coverage analysis and static coverage analysis). You make a copy of the entire code section, destroys (overwrite with garbage) what you could positively identify as code and uses the data TLB to project it correctly for data instructions. You sacrifice protection for easy of implementation and that always seems to me to be a worthwhile sacrifice.

Attacking H.A.R.E.S

Now back to the weakness's Mr. Torrey suggested. I think the real potential is attack the implementation of the thin VM it self. This at least currently cannot be protected it self and thus reproducing the VM exactly, but leaking the key seems to me to be the most promising way to go about playing attacker. He also suggests hardware attacks (reading physical memory directly) but shipping hardware with an antivirus software is uneconomical to say the least. This leaves me to his third suggested attack vector: Side channel attacks which will be my topic for the rest of this post.
The side channel attacks I'll talk about is tracing the code and seing what happens and using performance counters to gain additional information about the code you cannot read. You could probably also use cache timing for an attack, but I'll not get into that because I don't know enough about modern caches yet and of the three it seems to me to be the least productive. Forgive me again - since my 15 years de facto infosec pause, I've been reading up only 3 months.

The problem of tracing around observing registers and memory to find out what the code looks likes seems daunting at first. With N byte code segment we have 256 to the power of N possible programs. Even if we say we just need to get close, that looks like a terrible large number. Also with the sheer amount of instructions on x86 platforms it seem near impossible to get them all correctly deduced from just looking at their behavior. Think clflush instruction. It get's worse. Imagine the code segment being full of data that essentially are instructions that crashes some hardware in your computer if run in ring0 and causes exceptions if not. Since you cannot really execute the instruction, you cannot use the trace method and the performance counters are unlikely to be of much use to you. Unfortunately such a scenario can easily be imagined - imagine a long string of outb instructions, that'll crash some attached hardware if executed. Game over?
Not quite. First of the problems mentioned above are all real, but they probably won't shut us down totally.  We can probably squeeze most of the data out of the unreadable code segment. Also we should start up be noticing that not all opcodes are equally likely. Some are used more than others. Zombie did a survey a couple of years back and just one opcode made up for 15% of all opcodes in his sample. Only 54 opcodes made up 95% of the opcodes used in his sample, the majority of these 54 opcodes indeed seems to be analyzable. Looking just at frequencies with which bytes occur in calc.exe from windows XP I found almost 10% to be 0's. Zero is "add al, reg8, r8/m8". About half of those zeroes where followed by another zero making up the instruction "add al, [eax]". You can find my tables as an excel file. This hints that the problem of squeezing enough information out of the code segment might not be quite impossible.

The amount of instructions on the x86 platforms is ironically going to help us. Since the instructions are so crammed many opcodes can contains other smaller opcodes. For example mov eax, 90h would 0xb8, 0x90. The 0x90 being a "nop". We can use this to leverage our search.
Let me define some terminology before I start. With "mining" I mean finding out the opcode and/or parameters of said without any prior knowledge of it. With "Leveraging" I mean using known bytes in the code segment to figure out more bytes by using opcodes in embedded in previously mined opcodes.
Obviously since we know nothing we need to mine as a first step. My first step mining would go for the easiest stuff first. So I'm essentially going to setup up a context where I can easily identify if somebody is using my registry values e.g. set EAX to 0xa1a2a3a4, ebx to 0xb1b2b3b4 etc. Including the memory on the stack and then set a thread to execute exactly one instruction for each bytes in the loaded target executable. I do not care where real opcodes meant to be executed start. I'm looking for anything that I can identify. Once we trace we'll either see an except (other than the expected int 1) or we'll get a new context after execution of the instruction. I'll return to execptions later. When talking about a register in the input context I'll use that register with a .0 before it similarly for the output context a .1 will be appended. Thus eip.1 is the output eip. I can for most instructions calculate a length of the instruction by simply doing eip.1-eip.0. This works for all but branching instructions. I'll refer to this imperfect measure as the opcode size (OpcSize) nevertheless. Below is a table of instructions that is particularly easy to mine (and how to mine them)
Identification criteria
0x40-0x4f Inc/dec reg
OpcSize ==1 && reg.0==reg.1 +-1

0x50-0x57 Push reg
OpcSize==1 && esp.0==esp.1+4&& @esp.0==reg

0x58-0x5f pop reg
OpcSize==1 && esp.0==esp.1-4 && reg.1 == @esp.0

0x68 /0x6aPush imm8/32
OpcSize==2/5 && esp.0==esp.1+4
This gives us 1/4 parameters bytes which we can read from the @esp
0xe8 call rel imm32
esp.0==esp.1+4, @esp.1==eip.0+5
We can only partially mine this easily as sometimes it'll cause an exception and not give us a new context
0xC2 retn imm16
Esp.0==esp.1-imm8 && abs(eip.0-eip.1)>16
We can only positively identify this instruction if the branch made was so"long" it couldn't be an instruction and the imm8 value is bigger than 4 ( to differentiate from 0xC3 retN).
Xchng reg, eax, scasb/w/d/, stosb/w/d, pushfd,popfd,push es/cs/ds/ss, int, ...

Mining criteria left as an excersise... sorry..

Say we figured perfectly mined for 30 or so opcodes and identified 1000 bytes. The complexty we are now trying to resolve is only 236 to the power of N-1000. Still a ridiculous number, but notice that our progress is exponential. Finding just one byte reduces the problem a factor of 256!!!
The above mining was the simple mining. Things we could establish by tracing just once per instruction. Tracing more than once can provide even more information. Say cli/sti instructions. Tracing this instruction twice with different preset I-flags will let you see a toggle of this flag and  and OpcSize of 1. Thus we can identify these. We can also mine conditional short jumps with more than one trace. Trace all instructions with all branching-flags off and once on. If once the OpcSize is 2 and once not two we have a conditional short jump. There goes a significant number of bytes again and since we can mine perfectly we can subtract another 14 for the number of possible choices for each opcode.

Once we've mined these we can start leveraging. For this particularly the 1/4 bytes we got from the call rel imm8 or the push imm8/32 or the conditional short jumps are interesting. Imagine that the last byte of the address in the imm was 0x2. We'd then know that the opcode is add al, imm8. Zero out al.0 execute and you'll have the next byte in al.1 Imagine this being 0x3 (opcode add eax, imm32) we can then do the trick again with eax. However not all opcodes can be leveraged because they don't leak more information than the opcode themselves by mining. So we cannot expect a cascade effect, but we will make progress. We can easily leverage add/adc/sbb/xor/sub al/eax, imm8/32 as well as all short jumps opcodes and how about mov reg, imm8/16? We could also look at two bytes leveraging say 0x8d 0x98 would be lea ebx, [eax+imm32] There are more easy leverage to be had.

Harnesing Exceptions
Up to this point I've treated exceptions as something which prevents you for getting information while mining. They can be rather interesting though. Not only does the type of exception tell you something about the type of opcode, it also give gives you information on whether it was a read or write that caused. Imagine leveraging this 0x8b (mov reg32, m32). Now we wish to know the parameters of it. Simply set as much of the memory in the process you're working with to guardpages. Chances are tracing this will now throw an exception. Tracing a couple of times with different register settings if the read address in the exception doesn't change we have a simple IMM32 with indirection . We can also find out what registers change the address and use this information to figure out more about the RM and MOD bits of the ModRM. Which again will allow us to identify the SIB byte (if present). Once this is done you could get lucky and be able to make this instruction not throw an exception and that'll give you the REG bits of ModRM leaving you with the complete instruction. Even if you don't know the instruction you can use this technique to mine for anything that looks like a ModRM byte and figure it and the additional parameters out.

Last words
I've put enough on the table in this direction now. There is plenty more to be harness in terms of trace mining and leveraging. A hint though would be to use frequency maps from old non protected executables to direct your mining operatings towards where they are most likely to give you the highest success rate for your effort. Mining for 0x00, 0x00 in calc.exe in windows XP sounds like a great idea also for other executables. You can mine for imperfect information that won't allow you to with certainty find a given byte, but rule out almost everything making an educated guess easy later. You should also look into info leaks from performance counters. Or hooking system api to get return addresses will allow you to better mine for api calls. Moving yourself into ring 0 might also bring you something. Finally we can make use of bindiff'ing methods if we've managed to get enough information from the above measures we can compare out incomplete functions with those of standard libraries. You can find more about this on the Diaphora webpage included a link to a good article by Thomas Dullien and Rolf Rolles.

Diaphoraweb page:
Torrey, Jacob. White paper:
Torrey, Jacob. SyScan slites:

Zombie's opcode statistics:

No comments:

Post a Comment