Row hammer detection is possible
Introduction
First off
detection isn't fixing, but it's a good step in that direction and I'm growing
continually more confident in my claim that it probably is fixable as I work
with this. Anyway following the original idea send me deep into writing drivers
and I quickly began to think that modifying my method slightly would bring
detection and that writing this up would give me a lot of confirmation ahead of
time for my original idea which I continue to think is superior, though
significantly more technically complicated to write a proof of concept for. And
I seriously need a break from doing my day job first and then come home and do
a lot of hours row hammering.
Simplified method
First what
is the modification to my original idea. Well it's easy, just run the
performance counters without setting the request for interrupt on overflow of
counter in the MSR. The interrupt would give me the eip/rip of the offending
instructions and this again would allow me to have very strong identification
if something as row hammering or not. Without the interrupt I won't have
eip/rip so I'll essentially just be guessing based on the performance counts
over an interval. This gives me two parameters to tune for detection. The first
is the interval in which i poll the performance counter and the second is a
"cut off level". With "cut off level" I mean how many
performance counts are not normal but hammering. Essentially what this
means is that false positives and false negatives can occur. Of cause I also
have to decided on what performance counter I should use. L2Cache misses sounds
promising because if the instruction doesn't get through to the real memory,
there is no row hammer. On the other hand the cache is optimized so that normal
code won't trigger it.
My
conclusion up front: Yes it works good enough that is I believe the correct
identification of row hammering is near perfect!
My method for proof of concept is plain using vsperfcmd that comes with visual studio 2013 community edition.
Visual studio 2013 is free and awesome, unfortunately that does not extend to vsperfcmd. It's free and horrible, but with a bit a of tweaking it works.
First I had
to write up a row hammer program for windows. Since I don't actually wan't to
mess with bits I just made row hammer "look a like" program:
UCHAR * Buffer = (UCHAR *)malloc(1024 * 1024);
MessageBox(0, L"Hello world", L"Hello world", MB_OK);
__asm {
pushad
mov
ecx, 1000000
mov
eax, Buffer
mov
ebx, eax
add
ebx, 0x1000
code1a:
mov edx,
[eax]
mov esi, [ebx]
clflush [eax]
clflush [ebx]
dec ecx
jnz
code1a
popad
}
I hammers
away like a real row hammer would, but obviously it's just a simulation (But
hey.. it could still switch bits on you so be careful). I commented out the clflush instructions to
play "normal program".
Now
profiling it. It took a while before I figured out that you should avoid the
"launch" parameter of vsperfcmd because it seems to insist on using
instrumentation instead of sampling when this parameter is used and that is
just plain silly. I found out why without the clflush instructions it
complained that it couldn't instrument. Replacing them with 15 nops and
vsperfcmd where back in it's unintended bad business. Ugly. However the "attach"
works like a charm, but obviously requires me to have a way of first starting
the row hammer program and then attaching, and then starting the actual row
hammering code. Hence the messagebox above (sleep() produces better results if
you wish to batch test).
This would
be a command line for vsperfcmd that you guys out there can play with. I
automated sampling in a batch file. The output file (c.vsp) can be opened with
visual studio.
vsperfcmd
/Start:sample /output:c.vsp /attach:5688
/Counter:L2Misses,1000,"L2Misses" /shutdown:20
Now the job
was just to sample a lot of runs and see what the results I would get was. I
got on average 1367 samples with the clflush instructions and 7 without. In my
testing I never got less than 1200 samples with clflush and always single digit
results without. This of cause isn't "proof" in any meaningful sense,
but it's enough evidence that I'll hold my neck out and say it works. There is
a lot more work to be done on this but I'll take a break for now. Though I
promise I will blog again on my original idea.
Wait there is more
Wondering
about something like this could be implemented in real life scenario the
obvious is send an email to the administrator when detected and shut down the
computer before root access for the attack could do damage. This is fine in
most corporate settings, but for home users it's just plain not the user experience
you wish for. So I'm thinking that implementing this check would best be done
in the scheduler of the operating system. This could profile every slice of
time it allocates to user mode threads. If it come up with a "row
hammer" incident it should just take the offending thread out of the
scheduling loop so that i cannot do any more damage and set a signal for a
verification program. Chances are that the row hammering loop would have been preempted
near or in the row hammering core loop and just reading out the context would
give you an eip/rip to work with to do some additional analysis. If the
additional analyses so choose it could just terminate the process and no system
can keep running.
No comments:
Post a Comment