Foreword
This blog was actually finished a long time ago. I've not posted it before because I'm anything but happy with how it turned out. I hope you enjoy it anyways. In case you actually read this I added an errata to the cache side channel post. A friend of mine in the infosec community insists on calling me old and he is right too. You are old when you write nostalgia blogs.
Introduction
The idea
for this post came shortly after I’d posted the last nostalgia post. That post remains one of the most popular on
my blog. The trigger was a question by MalwareTech on twitter. I don’t recall
the exact quote, but it was along these lines: “Is there a way to figure out if
an instruction is privileged from it’s encoding?”. My first thought was that
it’s a silly question. But thinking about it made me reconsider. The question has lots of merit and
the answer is silly. Any reasonable designer would have grouped privileged
instructions together, yet x86 doesn’t. The reason is that x86 assembler wasn’t
designed, it evolved. It’s layers and layers of added designs and I’ve had the mixed pleasure
to follow lots of this evolution first hand. This post however is not the story of x86
processors – other people could tell that story better than I could, I’m sure. This post is
about me and x86 processors. I’ll try to focus on the tech stuff, because I
assume you’re only interested in me to the extend, that my story reflect yours.
I hope you enjoy this trip down memory lane.
Setting the stage
Somewhere
in time Intel made a new processor
called 8086. This CPU would've been long forgotten by now if not IBM had made
an open computer design around it. As a consequence companies like Nixdorf of
Germany, Olivetti of Italy, Commodore (and much later Apple) and many others
produced their own computers based on this design and the 8086 and it's
successors became the de facto business PC and remains so today. It had an
instruction set inherited from older Intel processors like the 8085, to make
porting software easier. And a set of 16 bit of registers that by now sound
awfully familiar: AX, BX, CX, DX, SP, BP, DI, SI, CS,DS,ES and SS. The first 4
of which each could be accessed as two
8-bit registers as well.
In the
early 1980'ies - I don't know exactly when my father bought his first x86'er
PC. It was a 4.77 MHZ 8088 IBM portable (if you can call anything weighing 14
kilos portable and not to mention it could not run unplugged). It was soon sold
again, because the display did not live up to the standard our old Commodore
already had. Consequently it was replaced with a Commodore PC 10 with a NEC V20
processor. It was instruction compatible with the 8088, but running 8MHZ and if
I recall correctly a better memory bus which made it a pretty solid computer
for the day. At first I didn't care much for the monster. First you had to boot
the OS from a floppy disc (rather than from a ROM), you had to enter the time
on each boot too as no battery was on board to keep the clock running, the
games sucked and I had nobody to swap games with. It took a while for me to get
interested in that computer. My father being a geek relatively quickly acquired a 10Mb
hard drive and a multifunction card which cured the having to enter the time
problem and boot from floppy disks. However what made the difference to me was
a word processor. Once I got used to it, I found that the easy with which I
could do my homework would save me time. Having a spell checker in the mid
80ies and search function to approximate commas was brought down time spend on my
Danish homework significantly for the same grades (Me being a freak I actually
timed it). Another cool thing I just have to mention is that my best friend
bought a used IBM PC and today it defies belief but it actually came with a
diagram of all the electronics on the mother board so you could repair the
beast with a soldering iron should something break.
Hacking did
exist in these days, but not for kids like me. Mischief obviously existed and
that usually meant borrowing the library's computer (an Olivetti) that booted
into a menu from which you could selected a handful of innocuous programs and
require a password to get back into DOS to do maintenance. A dos prompt was
root, since the early processors in the family 8086, 8088, V20, 80186, 80286
etc. came completely without security functions. These processors had no memory
protection, any memory was R/W/X and the entire address space was accessible
from anywhere. The in/out instructions always worked, the interrupt table was
placed at a fixed location in memory etc. etc. Consequently if you got into dos you owned the
computer. As a direct consequence of the processors inability (insufficient
memory, too slow, etc.) to do meaningful multitasking, many programs, such as
the dominant word processor Word Perfect, allowed you to open a dos prompt in
foreground to format floppy disks, look for files etc. and then close the dos
prompt to continue working on your document. I used that countless times to get that black
hat feeling of power, back when I was 12 or younger. Don't recall
actually doing anything with my power though. But I wanted to crack games, have
a cool nick name in intros and for that I had to learn assembly.
My first 0x86 assembly
program
My first
assembly program would’ve looked much like this:
.MODEL small
.STACK 100h
.DATA
HelloMessage DB ‘Hello, world’,13,10,’$’
.CODE
.startup
mov ax,@data
mov ds,ax
mov ah,9
mov dx,OFFSET HelloMessage
int 21h
mov ah,4ch
int 21h
END
.STACK 100h
.DATA
HelloMessage DB ‘Hello, world’,13,10,’$’
.CODE
.startup
mov ax,@data
mov ds,ax
mov ah,9
mov dx,OFFSET HelloMessage
int 21h
mov ah,4ch
int 21h
END
When linked you’d have
a small hello world executable. This code obviously only has remote similarity
to any assembler you’d write for windows or other modern operating system today.
Back then it was just 0x86 assembler. With the arrival of the 80386 processor
it became “real mode” as opposed to “protected mode”. I’ll get back to this
later. The two things that seems particularly weird in the above assembler is
writing to DS (which we today call a descriptor register, but back then it was a
segment register – there was nothing to “describe” and thus it’s only say which
segment of memory to read from) and the use of interrupts.
To understand why
segment registers had to be written, the first thing we need to understand is
that the early x86 processors came without paging and where 16 bit. Thus
without changing the segment register we could only access 64kb – not very much
even by the standard of the time. Thus the changing of the segment registers. The
“Physical” address could be calculated by segment (register << 4) |
address. 0000:0211 would be the same memory as 0001:0201. The upside to this
scheme was that loading an executable or allocating memory you’d only loose a
max of 15 bytes to alignment. This seems like a reasonable compromise
considering the address space was 20 bits: 1 Mb – of which only 640kb was to
the users disposal, the last 384kb belonged to the bios (well sort of –
eventually things like himem.sys partially recaptured memory from this region).
I like to think that history repeats itself and the scheme is a bit like Amd64
which is a 64bit addressing in a 48bit address space and the reserved 384kb can
in wierd way be thought of a early version of SMM memory.
Interrupts in those
days where not only hardware and exceptions service as it is today. It was also
the API. Though many interrupts where used (0x19=reboot,0x20=exit process)
interrupt 0x21 and 0x13 where the once you had to know. 0x21 was the dos
services – which in todays terms are what is exported from kernel32.dll in
windows. The calling convention was all register based (funny how history
repeats itself. After more than a decade with stack based parameters, it’s
typical to use much more registers for call styles in x64 – actually until
WinXP int 0x2e was used internal to forward calls from user mode to kernel mode
and was then replaced with the new instruction syscall, but programmers rarely
see this nowadays). I never saw any official documentation, I doubt most people
did – remember without internet to download stuff and no Intel or Microsoft
websites documentation was tricky. Instead unofficial documentation especially
in the form of interrupt list was floating around. The most important was Ralph
Brown’s interrupt list. For interrupt 0x21 the ah register selected the
function. 0x9 was Writeln() (that is write a $ terminated string to the console
and 0x4c was ExitProcess(). The interrupt 0x13 was very important to system
programmers, because this was where the bios exported it’s functionality. If
you wanted to read sector based from disk for example, the easiest way was to
use this interrupt which was supported by the bios. All boot loaders at the
time where full of int 0x13 and copy protections too…
Another funny thing
about early x86 was that lot’s of stuff was mapped to fixed addresses in
memory. At 0000:0000 was the start of the interrupt table, which was just an
array of far pointers. An interesting side effect of this was, that any program
could globally hook all operating system API: An offer that virus authors would
not turn down. Even weirder was address 0000:b800. It was the console. Mov ax,
0; mov ds,ax; mov [b800], ‘A’ would print an “A” in the top left corner of the
console. The graphics adaptor would read it directly from here. Consequently
you could read what was in the console from this address too. Graphics was
rare, so the console mode was where computers where actually used in the early
x86 days. EGA graphics had a fixed address too – I think it was 0000:e800 – but
I really do forget these things.
Writing my first demo
A demo was
at the time the “proof” of coding skill in 1994. Doing nice motions graphics
required pushing the processor pretty far and writing a good demo in a high
level language usually didn’t turn out well. I wrote my first and last demo in
real mode assembler. It was a sine scroller – which essentially means that a
text scrolls across the screen in a sine curve. To realize this, the first
thing I had to do was to define a font. Other than the standard ascii, the
operating system or hardware didn’t offer anything. This was done in a graphics
program and then the output was converted into a table for each letter by hand.
With only 256 colors it was a byte table. This was a lot of work, even when I
drew the font only for the letters I needed. The second thing I needed was a
sine table. Today you’d just calculate one as you go. Not so back then. The
first problem was that early x86’s did not come with floating point
instructions. To get floating point instructions you’d actually have to buy a
so called co-processor (typically called 8087, 80387 or so) which you could
plug into your motherboard. The second reason was that a table lookup was far
more efficient. I generated my sine table in pascal that had a library for
calculating sine with integer instructions and then added it to my demo.
Drawing the actual graphics on screen was challenge of it own. The screen
refresh was pretty slow and you’d see artifacts from it if you didn’t
synchronize with it. Thus updating your screen was a loop of waiting for the
refresh to finish and then a complex memory move operating to the graphics card
memory (a fixed address as seen above). Waiting was achieved by using the “in”
instruction directly (No clue what port it was) until a certain bit was deleted
(or so I recall it). My demo sucked and I distinctively remember that despite
being able to code these kinds of things, I had a huge disconnect in what I
thought it would look like and what it looked like. I knew I was never going to
be an artist and had already lost my heart to systems programming and computer
virus stuff.
80386
80386 in
many ways was the game changer for x86. It was introduced in 1985 according to
Wikipedia. It was also the first computer I bought – it was 1991 or 92. Imagine
a processor architecture staying pretty much unchanged for 7 years – you hardly
see that today. The irony is that despite of this back then a computer lasted 2
or 3 years before it was antiquated, by today standard no time at all. The 386
was probably the biggest revolution in the entire history of the 0x86 platform
and for infosec people almost certainly.
The most profound change was the move to a 32 bit architecture from the
previous 16 bit. Not only was the address space increased from 20 bit to 32
bit, but the registers and operand sizing was upgraded too, but also a paging
system was added. This allowed 32 bit assembly language code to look like it
does today. Further more the security architecture with ring 0 – ring 3 which
some of us have a love/hate relationship with today was introduced here. New
debugging features were added in particular the hardware memory break
point. The CRx, DRx, GDT, LDT and IDT
registers was introduced with the 80836. The real irony was that this processor
was so far ahead of the software that it wasn’t until Windows 95 a full 10 years
later operating system that made full use of the opportunities of the 386
processor became wide spread. What
certainly was a revolution in hardware, became slow evolution in software. Dos extenders that swapped from real to
protected mode became standard for games (especially DOS4GW from Watcom, but
Pharlab also had a market share). Short of an operating system it basically
just provided a flat memory layout to
C++ programs. . There was a DR dos which had true multitasking but didn’t break
the 640kb boundary and thus was pretty much unusable. We saw horrible Windows
2.0 and 3.0 with terrible, terrible 16 bit protected mode execution with their
NE format which can best be described as a crime against good style.
Virus and infosec in the
early 1990ies
I had been infected
with an interest in computer viruses in the late very early 90ies reading
about the exploits of Dark Avenger and his anti-virus nemesis: Vesselin Bontchev. Both were heroes of mine. Thus as
soon as I’d written my first program in assembler I started wanting to write
anti-virus software – some how writing virus code always seemed immoral to me.
There of cause isn’t much anti-virus code you can write when you’ve only
written “Hello World” before that. That did not scare me and I quickly wrote up
a number of goat programs. For those unfamiliar with goat programs, they are
programs meant as scapegoats for virus to infect. Since computer virus at the
time tried to avoid detection and crashes by carefully selecting which files it
would infect, having a large collection of known executables on your computer
would increase the attack surface for any virus and of cause allow for
detection too. On the assembly side, it was pretty much programs with different
junk instructions to fill up the executables – nothing fancy. With time I moved
on to writing encryptors and matching decrypters for executables. That was
pretty much the same code as you’d use in a virus but without the moral
implications. The first target was the .com file format of DOS. This was the
easiest executable format imaginable. It was simple the binary code and data.
It was loaded into a free segment at xxxx:0100, where xxxx denotes the free
segment the operating system loaded the.com at and execution would start from
this address too. The memory below 100h was used for command line and other
start up parameters. This made .com a very easy format to encrypt. Basically I
made a jmp instruction instead of the first two bytes to the end of the file,
where a decrypter was appended. However with a file format this simple removing
a virus or an encryptor was trivial and thus anti-debugging had to be made.
Hooking interrupts 1 and 3 could throw off most debuggers. Running checksums on
your self would detect if somebody was stepping/gotoing using interrupt 3
inserted into the code and using the pushf instruction to read the trace flag
was popular. Finding back-door
interfaces for the most common debuggers was a common enterprise for me and
writing up detection code for them using this new found information. On the
other hand patching backdoor interfaces in your own debugger allowing you to
decrypt the competition was another essential trick. A real game changer was a
debugger called TRW. It emulated each instruction instead of executing them and
was thus independent of all these games we were playing. As a response I
started checking instructions for emulation errors – lot’s of work, but real
fun. In many ways a sign of what was to come in this area. After I had my fun
with the errors, I reported them back to the author, who by the way was a
really nice guy.
With the
arrival of Windows new territory was everywhere and I did lot’s of work on the windows
internals in the beginning particularly the PE executable format. Despite all
things being new and all – the first (public) generic type PE decryptor
(ProcDump32 from 1998) used the same principle as had been used to unpack .com
files generically in the DOS days. It set the trace flag and single stepped
everything until the original entry point and then dumped the memory. The
anti-anti-debugging was inspired heavily from TRW: A driver would emulate
instructions that could be used to endanger the safe tracing such as pushfd/rdtsc
instead of actually executing the instructions. ProcDump32 could dump running programs
too –except those wouldn’t actually run afterwards because the data wasn’t
correctly restored. It was however enough to do reverse engineering on the code
in many cases.
Towards the
end of the first half of my career as an infosec hobbyist, I was introduced to
what had been more or less the birth of modern hacking: The stack overflow
hack. My interest at the time was Windows internals, but in IRC EFFnet you
could not avoid being confronted with “0-dayz”. Being used to assembler coding
I quickly understood the mechanics of the attack. It was occasionally useful
for privilege elevation attacks on Windows NT. For windows 98 internals freaks
like myself had no use for it – with the front door left open, why look for a
window to break? I didn't consider priviledge elevation to be hacking at the time. Besides finding buffer overflows were too easy – in the very early days of it looking for strcpy() routines. It
was quickly followed by attacks on sprintf type formatting stings. Usually if
you could find one of those where the input string came from “user” input, you
would have a 50% chance of having a 0 day in your hand. You could report bugs
like you wanted to and it wouldn’t change a thing. My experience was that if
you reported a privilege elevation bug to MS engineers you’d be ignored. If you
reported a BSOD that had to be provoked to appear, you’d be thanked. If you
reported a BSOD that could happen by accident it would be fixed. It would take
forever though till it was distributed though. It was all reliability, no body
cared about security in between 1996 and late 2000 where I put my infosec
carrier to rest.
My last act
before retiring as a malware hobbyist, was writing a blog post on the favorite
malware of the press in the day – Back Orifice. I’d figured out how to detect
Back Orifice in specific and most other remote control malwares of the day
generically. Nobody cared. I tried for a while to get a job with anti-virus
companies, but short of a nice email from Mikko Hyponen, nothing ever came of
it. So I got a job which quickly killed my interest in spending time behind a
computer in my spare time– but my life with x86 was not yet over .
Optimization
Assembler
has always played a major role in optimization. As I started out reading about
optimization it was all what we see in compilers these days. E.g. Use shl for multiplication by 2 or 4. The
first kind of optimization I actually did was pre-computed tables. At the time it didn’t take much calculation
for pre-computations to be meaningful. Part of this was because that the
processors where much slower relative to RAM that they are these days. Also
accessing hardware directly instead of through bios/OS channels worked wonders.
Say the demo above would not have worked without directly accessing the graphics
memory. In the mid/late 90ies a friend
of mine was tasked with optimizing an interrupt routine for an industrial
barcode scanner which would be the heart of a large sorting machine for
logistics. I joined the effort for the fun of it. First we spend a lot of time translating from
pascal to pure assembler and then more time was spend replacing multiplications
with combinations of “shl”,”lea” and “add” instructions. Then we spend hours on
avoiding push and pop and keeping everything we needed in registers and
eventually got it fast enough for the requirements.
In 2000 I
was working with optimizing an MPEG2 decoder. By now compilers had made the
above optimizations internal and it was rarely worth the effort to see if you
could do it better. The optimization game had moved yet again. The name of the
game was now MMX – I realize that SSE3 was available at this time – but not in
the low end computers and MMX optimization would’ve do the trick at both ends
of the scale.
By 2002 it
was an MPEG2 transrater I was optimizing on. The idea was copying a DVD9 to a
DVD5 without re-encoding. I had a beta version of a Pentium 4 and Intel really
wanted me to work on using hyper threading. But threading is difficult and it
brought less than optimizing memory access and again real life market
perspective: Nobody but a few select people had a hyper threading computer and
there was little chance that they’d be wide spread within a year. Especially
using the non-temporal instructions of SSE32/3 and aligned memory access made a
huge impact. Another thing that made a big impact on my optimization work was
that performance counters, had become really useful.
By 2006 I
was back to optimizing video – this time h.264 codec for massive amounts of
video. This time around optimizations now moved to threading – with 8 cores in
a computer and the prospect of using multiple computers, splitting tasks in a
meaningful way gave much more performance gain than the 50% you could get from
hand optimizing big spenders. To be fair I did write up SSE3 routines for the
biggest spenders. But assembly language optimization had finally taken the back
seat.
"...Yet knowing how way leads on to way,
ReplyDeleteI doubted if I should ever come back..."
Your late 90's work was truly awe-inspiring.
Welcome back.
Thanks for the kind words and for reminding me just how awesome Robert Frost is.
DeleteWell, since it was you that first exposed me to his poetry, i consider it return of a favor. ;)
Delete