This blog is about distance measures for executables. It
first gives some general considerations and then develops a primitive distance
measure. It then goes on to apply this measure to different versions of two
distinct executables, and does some distancing with some random unrelated executables
to verify that it works properly.
This is my first real blog post in 15
years. The subject of this post is using machine learning techniques for
malware. I started out way too ambitiously something which meant that this post
is significantly below my own expectations. It was the direct reason why I
invented the 24 hours limit. The text gives short general introduction to supervised
machine learning in general and a short presentation of Bayes-theorem in
context of malware. It goes on to estimate a baysian probabillities for
identifying an executable packer. You may notice there is lots of code in the source codes which isn't related to the text. It was sacrificed for time. Maybe I'll return to it some time in the future.
Welcome to my blog. It's been 15 years since I last
had a blog so bare with me as I get started. Here I hope to write about:
low-level coding in general
malware topics in general,
And the use of statistics and machine learning
techniques for malware in particular.
I actually started out writing my first blog entry
before starting the blog and I fairly quickly figured out that my ambitions
where far larger than what I could ever realize. Therefore I've imposed some limits on
I shall not start writing a blog post that
I estimate will take me more than 24 works hours to write.
Topics should interest me.
Focus on accessibility of my chosen topics rather
than rigor. It's a blog, not science.
To the greatest extend possible any results
should be transparent and repeatable.
The first is simply a time constraint because being a
malware hobbyist is just one of my hobbies. I go climbing as much as I can,
play football and I'm one of them weirdo’s who own a table saw. On the
other hand I wish to spend my time here on topics which are slightly more
advanced that explaining how to use the standard tools of the infosec trade.
The compromise is that I'll mostly just skim the surface of a topic and that
I'll not write tools, but "proofs of concept" type code.
I've been playing with malware and low-level coding
since I first figured out how to add signatures to IBM antivirus back in 1992.
I've hobby wise written low-level software and reverse-engineered for Dos and
Windows since 1996, including executable packers, unpackers, privilege
elevation hacks, etc. I got a degree in economics, especially focused on
econometrics (which is statistics for economics). Since 2000 I worked
professionally in software development. I’ve worked on anything ranging from
file systems drivers for win9x/NT over copy protection to video compression
codecs and many other interesting things. I'm currently vice president of
engineering at Protect Software GmbH.
email: “anders_fogh” is the first part of my email. The last part is “hotmail.com”