Chris Anderson first coined the term "The Long Tail" back in 2003 while explaining an interesting effect businesses on the Internet were starting to experience (here and here). Basically it consits on a statistical distribution which demonstrates that low-demand products collectively sell more than high-demand products. As an Amazon employee put it: "We sold more books today that didn't sell at all yesterday than we sold today of all the books that did sell yesterday." Mr. Anderson explains his death-of-the-blockbuster theory in his 2006 book "Why the Future of Business is Selling Less of More".

Historically the media echoes a lot of the "Top Virus" charts and In-The-Wild virus lists. It has also become somewhat of a standard among AV companies to regularly publish these statistics. I guess it makes it easier for journalists to publish a quick story that end users can understand. Unfortunately malware creators today do not want to show up in the Top 10 list. Rather it seems they have figured out that they can collectively infect many more users by "infecting less with more variants". Or put another way, malware relies on propagating less of much more.

To understand if this Long Tail effect was applicable to malware, we took some statistics from our online scanner ActiveScan for a specific period of time between November 13th and December 13th for several years, starting 2002 until 2006. To normalize the data throughout the years, we counted only the same malware categories. In order words we didn't count adware, tracking cookies, PUPs, spyware, exploits and the like. The total unique detections amount to almost 4.1 million trojans, viruses, backdoors, dialers, bots and worms.

First lets look at the data from a traditional perspective:
– During 2002 the Top 10 out of 5043 samples accounted for 40% of all infections.
– During 2006 the Top 10 out of 22911 samples accounted for 10% of all infections.

 

Next lets chart them out. The following graphs the (Y) number of infections per sample and (X) the top infecting samples, starting from the Top 1 (Gaobot.DC.worm in 2002 and Dialer.B in 2006) to Top 10000. We can clearly see how the Long Tail effect applies to malware as well. In 2002 only the Top 1000 samples created any significant infections. During 2006 well over 8000 samples were responsible for the mayority of infections.

 

Perception <> Reality
Due to this Long Tail effect we are seeing that users are not
aware of the real situation. Most users will not pay attention to a
single W32/Spamta.ES.worm which barely accounts for 100 infections in a
given month, let alone a journalist alerting about it. However the
reality is that if we group all current W32/Spamta variants there's
tens of thousands of user infections.

Based on this reality we are considering not publishing any more "Top
10" or similar lists to avoid being part of the confusion. Or at least
showing the complete data so that users can evaluate the real situation for themselves. Now
if we get rid of the "old school Top 10 alerting scheme" we still need
some way of communicating to the users what to watch out for. Since we
have malware families and sub-families, why not group all infections
per sub-family together to give a clearer picture of what is really
going on? For example, a spike in Trj/Downloader infections (by any
Downloader variant) could be considered an alert situation. Any comments?