Monday, June 29, 2015

Machine learning with malware part 2: Model selection


Unrelated background information:
I wished I'd gotten a lot further a lot sooner with this. Also this blog post is only barely up to the standard I set for myself. Compressing so much statistics and machine learning into a blog post of reasonable length while avoiding excessive math is a terribly difficult thing to do. I hope that it lifts the vail a bit around what machine learning is and how it can be leveraged in a malware setting, but if you really wish to get down with it there can be no two opinions about you need a real text book with all the rigor that it requires.


 Finally it took a long time because most of my family conspired to celebrate round birthdays. Also climbing season started which keeps me away from the computer. Finally I have this feeling that the "Predator" from Machine Learning Part 1. blog produced insufficient amount of features. So spend a significant amount of time embedding my code emulator (from unpacking with primitive emulation) into Predator at first thinking it would be easy. Emulating on 80000 files malformed files, where a significant amount is missing DLL's etc. turned out to be way more time consuming than I imagined.

The blog post is here:
Machine learning with malware part 2: Model Selection

3 comments:

  1. Master the core cognitive skills in Machine Learning technology by getting enrolled for the exclusive real-time based Machine Learning Training in Hyderabad program by AI Patasala. Become a successful Machine Learning expert today.
    Machine Learning Courses Hyderabad

    ReplyDelete
  2. Hello! Our outsourcing company specializes in providing first-class data labeling solutions, including document annotation. With years of experience, our data labeling experts meticulously label and annotate different types of documents to improve search, classification, and retrieval capabilities. >> https://www.ailabelers.com/case-studies/document-annotation/

    ReplyDelete
  3. Hello! With our free tool you will discover ways how to accurate speech data collection . The quality of your data directly influences the success of your models. find more info in our article that delves into why precise speech data collection is the cornerstone of robust speech recognition and natural language processing systems. Explore the insights that can transform your understanding of data labeling's impact on cutting-edge technology.

    ReplyDelete