Monday, October 23, 2006

Comparing Automated Malware Analysis Services

Wow! It's been nearly two months since I posted anything here, for which I apologize. I've been busy as a bee. In fact, I think I caught the bees slacking off. I should be able to post a bit more regularly now, which ought to help me avoid losing my loyal reader.

This weekend, I happened to catch a computer that was infected with a fairly garden-variety IRC botnet, but this post really isn't about that. During the course of the investigation, I saw that the C&C server had instructed it's victim to download and run a certain file. As I frequently do, I used wget to download the same file so I could see what was in it.

Normally, my first stop for malware analysis is the UNIX strings command, but in this case, the binary was packed, so there was very little information available.

Now, I'm no programmer. I can do some simple C and some less simple Perl, but to be honest, I don't really have the sk1llz to do my own detailed malware analysis. Fortunately, this isn't much of a problem, because I rely on three excellent websites which can do this for me automatically.

I'm speaking, of course, about the Norman Sandbox, the CWSandbox and Virustotal. Each of these services offers a web-based interface for you to submit malware samples for automated analysis, usually returning the results within two or three minutes. Conceptually, they're all very similar, but they each have a different focus, so I thought a brief comparison might be in order.

Let's start with Virustotal, as it's much simpler than the others. Virustotal simply takes your sample and runs it against a variety of antivirus programs (I counted 26 this morning). The list includes most of the industry heavyweights, with the exception of Symantec. The end result is that you have a nice report that shows what, if anything, was detected in your sample malware. In my experience, it is rare that more than a handful of products properly detect the newest malware I find, so it's very handy to have a service that just runs it through through so many products and summarizes the output. The final report is delivered right on the web site, so once you submit your sample, you get a results page immediately, and as the scans progress, it is updated with fresh results.

Where Virustotal simply tells you the name of the detected malware, both the Norman and CW sandboxes provide a much more detail about the internal workings of the suspect binary. Using a combination of simulation, API hooking and other techniques, they will actually run the binary in the sandbox environment and look to see what actions it takes.

The Norman analysis (sample) relies on the same sandbox engine they use in their commercial antivirus products, and apparently you can purchase your own copy of the same sandbox system that the free website uses. The report comes back as a text-based email, and typically includes a list of files or registry keys that are created, deleted or modified. It will also usually tell you if the malware automatically restarts at boot time or not. Other features, like URLs opened or processes spawned, are kind of hit-and-miss. Norman failed to detect any of the network communication in the sample I submitted this morning. Overall, the report is a little on the basic side, but it's good for a quick read-through to see if you should be worried.

By far the most informative of the three services is the CWSandbox. Based on work performed by Carsten Willems at the University of Mannheim, CWSandbox is optimized for analyzing binaries which contain botnet software. The report is far more detailed than the other two systems, and includes much better information about files, registry keys and processes used by the malware. In fact, in my test this morning, it was the only one of the three to extract the list of IPs and ports that the binary would attempt to connect to. Unfortunately, the report is in XML format only, which would be good if I were going to process it with a script of some sort, but it does make it a bit harder for the analyst to read directly. Still, since CWSandbox gives the most detail of three, the extra effort is probably worthwhile.

If I had to pick just "the best" of the three, it would be CWSandbox, because it consistently gives more detailed information about the suspect binaries I upload. Fortunatley, I don't have to pick just one, and I routinely use all three to give myself the broadest coverage, and thus the best chance of finding out what the code really does.

Update 2006-10-30 14:08: Anti-spyware company Sunbelt Software also has an automated analyzer, it turns out. They're using the CWSandbox engine, but it delivers the reports in HTML or text-based emails. The HTML reports use very little formatting, but they're a lot easier to read than the XML reports CWSandbox itself returns.

I tried the Sunbelt version against the same binary I used to test the others, and though I found the report to be more useful overall (because it was readable), it didn't seem to include quite as much information about the Internet hosts the binary would attempt to communicate with. Perhaps Sunbelt has an older version of the engine?

In any case, Sunbelt generates my new favorite reports, though I still recommend submitting samples to more than one engine, just to make sure you've got your bases covered.

Update 2007-09-24 13:55: Paperghost over at Vitalsecurity.org posted a link to a new service called ThreatExpert.com. TE is part online sandbox and part malware encyclopedia. Basically, you upload malware samples and you get a nice report. The reports are also available to other users, indexed by their standard malware names (if known).

There are tons of sample reports in the database, and they are by far the best looking and easiest to read of all the similar services. They seem to contain basically the same type of information you'll find in the competing services, such as registry keys modified, files created, processes forked, etc. I've also seen a few really nice extras, such as the inclusion of screen shots of any windows that the code creates. TE also tries to identify the probable country of origin for the file, though I really have no idea how they do this. Strings analysis of the language included in the embedded text, perhaps? Another big plus is that it's the only such service to provide an optional user account that allows you to quickly locate and review reports for any of the samples you submitted. I can see this being very useful.

You'll notice that I used the phrase "seem to" in the last paragraph. For some reason, both of the UPX-packed samples I submitted failed to run properly. I got a report, but the sandbox indicated that the file could not be analyzed. To be fair, both the Norman and the Sunbelt sandboxes choked on it as well, failing to report any malicious activity at all. While this did keep me from generating a good comparative report, I'd have to say that ThreatExpert seems like a promising addition to my list of "go to" services for automated malware analysis.