Last year we posted an article
about graphic
representations of malware, in which we commented that it's possible
to automatically
identify and classify malware into a family based on
their graphical structure
representation. This representation is based on the relationship between
function calls in the executable.
These relationships create a graph of the internal structure of the
executable. These graphs are very similar among samples of the same
family or among samples which share the same
source code. There are several publications about this technique (Ero Carrera &
Gergely Erdély [VB2004]) and all of us have heard about Sabre
Security VxClass
Project, which is a system to automatically unpack and classify a binary into
a family.
PandaLabs is 'two or three steps ahead' too and we
have developed our own system to automatically identify and classify the samples
we receive daily. Of course, this system
works with unpacked samples, that's why we use it with our
generic unpacker engine. We have made a flash video [14 MB] (to show you how this system works. Basically the steps are:
- Unpack the sample
(the system only works with unpacked binaries) - Drag&Drop it into the client
application - The client
application sends it to the graph
server - The server analyzes it with IDA and uses several python
scripts to extract: - Graph of
function calls - Control Flow Graph (cfg) of
functions - Entropy
- CRC32 and custom CRC of
functions - Preselect samples from the database, applying several filters: entropy,
compiler, filesize,… Then, the resulting ones will be compared with our sample.
This data will be used to compare the
sample with our entire graph database (Actually, we have already analyzed and stored in the graph database 185.000 samples).