Monday, March 31, 2008

Switching to Sguil: A whole new meaning

Many of you may have wondered why I haven't yet blogged about the recent release of Sguil 0.7.0. Did I forget? No. Am I disappointed with it? Not at all! Am I just lazy? Yes, but that's not why.

The truth is, I've held off blogging about that because there's some even bigger news with the Sguil project!

You probably didn't know this, as we've tried hard to keep it under wraps until it could be formally announced, but the Sguil project has just received an extremely large vote of confidence, in the form of it being acquired lock, stock and barrel by Cisco!

Yes, you read that right! From the press release:

Under terms of the transaction, Cisco has acquired the Sguil™ project and related trademarks, as well as the copyrights held by the five principal members of the Sguil™ team, including project founder Robert "Bamm" Visscher. Cisco will assume control of the open source Sguil™ project including the Sguil.net domain, web site and web site content and the Sguil™ Sourceforge project page. In addition, the Sguil™ team will remain dedicated to the project as Cisco employees, continuing their management of the project on a day-to-day basis.

Really, I didn't blog about Sguil 0.7.0 yet because I didn't want to do say anything that could have interfered with this deal.

The great thing about this is that both Cisco and Sguil have made significant investments in Tcl, as it's already found in the OS on many Cisco products. Of course, Sguil is written almost entirely in Tcl, so this should provide for some great synergy going forward. You should start seeing Sguil being pushed out into the carrier-grade Cisco gear by 3Q08, with the rest of the Cisco-branded products following in phases through 4Q09. Linksys-branded gear will be supported too, though there's not an official timetable for that yet.

On a personal note, I would like to congraluate Bamm (AKA "qru"), Sguil's lead developer. He's put a lot of time into this project over the years, and is finally going to reap some rewards:

Although the financial details of the agreement have not been announced, Sguil™ developer Robert Visscher will become the new VP of Cisco Rapid Analysis Products for Security. “This deal means a lot to the Sguil™ project and to me personally,” Visscher explains. “Previously, we had to be content with simply being the best technical solution to enable intrusion analysts to collect and analyze large amounts of data in an extraordinarily efficient manner. But now, we’ll have the additional advantage of the world’s largest manufacturer of networking gear shoving it down their customers’ throats! We will no longer have to concern ourselves with mere technical excellence. Instead, I can worry more about which tropical island to visit next, and which flavor daiquiri to order. You know, the important things.”

I know that many of you will have questions about this major evolution in the Sguil project and our continuing roles as Cisco employees, so please feel free to leave them here as comments, or ask in freenode IRC's #snort-gui channel.

Tuesday, March 25, 2008

Temporarily speed up SANCP insertions in Sguil

It's Monday morning, you're half asleep, you haven't finished your first diet soda yet, and -- oh no! Sguild has been down all weekend! Worse yet, SANCP inserts are backed up, to the tune of 17,000+ files in the queue!

As you know, the Sguil sensors are pretty much independent of the actual Sguil server. They'll happily continue collecting data, even when sguild has been down for a while. When the server comes back up, the sensors will automagically reconnect and send all the queued data. This is by design, of course. You don't want to lose all that data due to a failure of the central server.

Even after an extended outage, most of the data collected by the Sguil sensors poses no real problem. There are relatively few Snort alerts (maybe a few thousand), and probably even fewer PADS events, and these get added to the database in no time. Network session records collected by SANCP, however, can pose a bigger problem.

If you recall, SANCP works by keeping an in-memory list of active network "sessions" (including psuedo-sessions created from UDP and ICMP traffic). By default, it will dump these to a file every minute or so (or more often, on a busy network). The Sguil sensor contains a SANCP agent process that monitors the filesystem for these files, and sends them to Sguild as they are created, deleting them from the sensor.

Now here's the problem: there are just so many darned network sessions on a busy network that even a short outage can result in a few hundred files waiting to be queued, especially if you have multiple sensors. Longer outages, though, can be disastrous. Let's say that you have six sensors, and your Sguil server has been down for the weekend (48 hours). How many files is that?

60 * 48 * 6 = 17,280

Now, at an average rate of about 5 seconds to insert each file, how many hours would that take to catch up?
17,280 / (60 * 60) = 24

That's right! It'd take a full 24 hours to catch up! In the meantime, you're missing a few days of valuable network data (probably the few days you're most likely to want to query on Monday morning) and your MySQL database is spending all it's time inserting, which means not only that it's slower to respond to your analyst console, but also slower to process incoming events. In fact, it can easily get caught in a sharp downward spiral, where the incoming data gets even further backed up.

So what can you do about this? Actually, it's quite simple. If you find that you're getting behind while processing your backlock of SANCP records, you can dramatically speed things up by temporarily disabling the indices on your SANCP files.

First, figure out which days you have to catch up on. If you know your server crashed on Friday the 8th, and it's now Monday the 11th, you probably want to go through all SANCP tables from Friday - Monday.

Second, determine what the table names will be. Remember that Sguil creates one SANCP table per day, per sensor. These are all merged into a single virtual table, but for indexing purposes, ignore that one and concentrate on the individual tables. They will be named something like:
sancp_$SENSORNAME_$DATE

So for example, if you have two sensors named "external" and "internal", you'd have the following tables:
sancp_external_20080208
sancp_internal_20080208

sancp_external_20080209
sancp_internal_20080209

sancp_external_20080210
sancp_internal_20080210

sancp_external_20080211
sancp_internal_20080211


Next, you simply issue the SQL command to disable indexing for each table:
ALTER TABLE sancp_external_20080208 DISABLE KEYS;

MySQL will perform a quick table check before returning to the prompt. This may take a minute, and I personally find it annoying to wait after each table, so I usually just create a text file with all the commands in it, one per line, and run it batch mode:
mysql -u sguil -p sguildb < DISABLE-KEYS.txt

Based on my experience, I've seen the indexing speed go from 5 seconds per file to about 5 files per second, which is quite significant! At that rate, it would take less than an hour to insert everything!
17,280 / (5 * 60 * 60) = 0.96

Of course, you have to be extra careful to re-enable indices on all those tables. You can run a similar set of SQL commands to turn indices back on for a table:
ALTER TABLE sancp_external_20080208 ENABLE KEYS;

Again, I usually run this as a batch job.

The act of disabling and then later re-enabling indices does take a little while, but usually not more than a few minutes for each. Even given this overhead, it is still significantly faster to process a bunch of SANCP files without indices, then reindex them after you're all caught up.

Sure wish I didn't need to know this... 8-)

Update 2008-03-25 11:27: After you re-enable keys, you may need to also do a quick db check to make everything sane again:
mysqlcheck -o -a -u sguil -p sguildb

This will recheck all your tables and make sure they're still consistent. I've had a few situations where Sguil has been returning error messages like "ERROR 1030 (HY000): Got error 124 from storage engine" until I did this.

Friday, March 21, 2008

In which I attempt a metaphor

So I was explaining the poisoned search results threat to several people yesterday, and I hit upon a good metaphor to explain why this is particularly serious: it increases the attacker's "shots on goal".

If you know Hockey at all (which I don't, but I've been to a few games), you know that the scoreboard typically lists "Shots on Goal" right beside each team's score. Why? Because you can't score if you don't shoot!

The more times you get to try to score, the more likely it is that you will do so, and it's the same with security. Tracking the number of exploit attempts, even if they are unsuccessful, is just like reporting shots on goal.

It happens that poisoned search results are a great way to increase your shots on goal with very little effort, and if the analogy holds, that means this will prove to be an extremely effective strategy for the attackers. I believe current events are proving this to be true.

Of course, now that I have not only made metaphor linking digital security to real life, but a hockey metaphor, at that, I expect that I have invoked The Bejtlich, and he will no doubt be forced to appear shortly and leave an insightful comment.

New ZLob spreads through poisoned search results

You may have seen this technique before, but in the last few days, it seems that the creators of the ZLob trojan have found an effective way to spread their malware: poisoned search results.

In case you're wondering how this works, it goes something like this:


  1. The attackers identify a set of "hot" search terms that users are most likely to be looking for. Popular products, current events, celebrities, scandals, you name it. I don't know for sure where they come up with these terms, but if it were me, I'd get them from Google Trends or some place like that. To really be effective, the attackers need to gather as many of these terms as possible, perhaps several thousand. They need to be updated frequently, too.
  2. The attackers identify an otherwise legitimate website that happens to be vulnerable to some sort of file upload attack.
  3. The attackers create a set of HTML files, one per search term they're targeting. The HTML is crafted to look highly relevant for that term, with what looks to me like snippets of text from other legitimate web pages on that subject. In addition, each of the files links to each of the other files, artificially inflating their number of incoming links in an attempt to fool the search engine into placing them nearer to the top of the result list.
  4. When a user searches on one of the terms, they will see poisoned results interspersed with legitimate ones. If they click on the poisoned link, obfuscated Javascript in the page will redirect them to a site that claims to have a relevant video. It shows a static GIF that looks like the YouTube video interface, but then pops up a dialog telling the user they need a new CODEC to view the clip.
  5. Of course, you know where this is going... The "CODEC" is an EXE file containing the ZLob trojan. SCORE!

It used to be that if you avoided browsing pr0n, gambling sites and similar shady sites, you were less likely to come into contact with this sort of thing. But now, legitimate users doing regular, every day searches are being exposed a lot more often. This is kinda scary.

So what can you do to protect your users against this type of attack? On a technical level, not that much. You can't really get much done on the Internet without a search engine, and it's going to be up to them to improve their ability to vet the pages they index. Individually, something like the NoScript Firefox plugin would be effective, but that's difficult to impose on an entire user community.

However, the most effective security is not technical. Get the message out to your users, "There are malicious web pages out there; you're likely to find some of them inside the search engine results; be careful what you click on, and never download things you weren't expecting to download."

Of course, I can't let this go by without at least some sort of NSM advice. Here's a quick Snort rule I wrote to detect these trojan CODEC downloads:

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"WATCHLIST Possible ZLob Codec Download"; uricontent:".exe"; nocase; pcre:"/.*codec.*\.exe/smi"; flow:to_server,established; classtype:trojan-activity; sid:10000000; rev:1;)

This looks for HTTP downloads of files that machine "*codec*.exe" (case insensitive, of course). A simple file name change or something would evade this, but it's not too hard to see how to customize this to look for other things. And if your version of Snort is compiled with flexible response support, you can even add "resp: rst_all;" to try to block the download attempts by sending spoofed RST packets, which should provide some extra security.