Sunday, November 23, 2008

Time for a change

As most of you probably know, I've operated my own security consulting business for the past few years. What is less widely known is that it was a part time affair, as I kept a full time job working for a US nuclear physics research lab. I was very happy with the way things were going, but sometimes you just get an offer that's too good to refuse...

That's pretty much how I felt when I started talking to Richard Bejtlich about coming over to GE to help them build an enterprise-wide CSIRT. This is an exceptional chance to help create an incident response capability that could have a very real effect on the security posture of one of the largest companies in the world, and who could turn that down? To top it off, I know some of the other team members, and I can honestly say that they're a top-notch group of people. And in my book, interesting work combined with a high powered team of experts equals an opportunity I just had to take.

So starting tomorrow, I'll be an Information Security Incident Handler for GE. I promise this won't turn into a GE blog, though. You'll still see the same quality technical articles you've come to expect. That is, if my new boss will unshackle me from the oars every now and then. Heh.

Monday, November 03, 2008

Detecting outgoing connections from sensitive networks with Bro

As I mentioned in my last post, I've been playing with the Bro IDS. I wanted to take a stab at creating my own policy, just to see what it's like to program for Bro. It turned out to be surprisingly easy.

I started with the following policy statement: There are certain hosts and subnets on my site which should never initiate connections to the Internet. Given that, I want to be notified whenever these forbidden connections happen. To accomplish this, I created restricted-outgoing.bro.

To use this sample, copy into your $BRO_DIR/site directory, and add "@load restricted-outgoing" to your startup policy (the mybro.bro script if you're following my previous example).



##
## Detect unexpected outgoing traffic from restricted subnets
##
@load restricted-outgoing

Now the code is loaded and running, but it must be configured according to your site's individual list of restricted hosts and subnets. Following the above lines, you can add something like:

redef RestrictedOutgoing::restricted_outgoing_networks = {
192.168.4.0/24, # restricted subnet
10.0.0.0/8, # restricted subnet
192.168.9.43/32, # individual host
};

If you run Bro now, you should start seeing lines like the following in your $BRO_DIR/logs/alarms.log file:

1225743661.858791 UnexpectedOutgoingUDPConnection
x.x.x.x/netbios-ns > y.y.y.y/netbios-ns : Restricted
Outgoing UDP Connection

1225743661.858791 UnexpectedOutgoingTCPConnection
x.x.x.x/netbios-ns > y.y.y.y/netbios-ns : Restricted
Outgoing TCP Connection

Similar entries will also show up in your $BRO_DIR/logs/restricted-outgoing.file.

This script considers a connection to be a tuple composed of the following values: (src_ip, dst_ip, dst_port). When it alerts, that connection is placed on a temporary ignore list to suppress further alerts, and a per-connection timer starts counting down. Additional identical connections reset the timer. When the counter finally reaches 0, the connection is removed from the ignore list, so you'll receive another alert next time it happens. The default value for this timer is 60 seconds, but you can change it by using the following code:

redef RestrictedOutgoing::restricted_connection_timeout = 120 secs;

Even though your policy may state that outgoing connections are not allowed from these sources, it may be the case that you have certain exceptions. For example, Microsoft Update servers are useful for Windows systems. There are three ways to create exceptions for this module:

First, you can define a list of hosts that any of the restricted nets is allowed to access:

redef RestrictedOutgoing::allowed_outgoing_dsts = {
72.246.0.0/15, # Akamai, usually updates
};

Second, you can list services which particular subnets or hosts are allowed to contact:

redef RestrictedOutgoing::allowed_outgoing_network_service_pairs = {
[192.168.4.7/32, 25/tcp], # SMTP server
[192.168.4.8/32, 80/tcp], # Web proxy
[192.168.4.9/32, 53/udp], # DNS server
[10.0.0.0/8, 123/udp], # NTP
};

Finally, you can list specific pairs of hosts which are allowed to communicate:

redef RestrictedOutgoing::allowed_outgoing_dst_pairs = {
[myhost.myorg.org, www.google.com],
[192.168.4.23, www.business-partner.com],
};

Note that this is the only one of the three options that allow you specify individual IPs without CIDR block notation, or to use hostnames. The hostnames are especially useful, as Bro automatically knows about all the different IPs that the hostnames resolve to, so the "www.google.com" above would match any of the IPs returned by DNS.

Overall, I found the Bro language pretty easy to learn, and very well suited for the types of things a security analyst typically wants to look for. I was able to bang out a rough draft of this policy script on my second day working with Bro, and I refined it a bit more on the third day. Of course, I'm sure an actual Bro expert could tell me all sorts of things I did wrong. If you're that Bro expert, please leave a comment below!

Update 2008-11-06 14:45 I have uploaded a new version of the code with some additional functionality. Check the comments below for details. And thanks to Seth for his help!

Getting started with Bro

Lately, I've been playing with Bro, a very cool policy-based IDS. I say "policy-based" because, unlike Snort, Bro doesn't rely on a database of signatures in order to detect suspicious traffic. Rather, Bro breaks things down into different types of network events, and Bro analysts write scripts to process these events based on their particular detection policies and emit alarms.

At first, I was pretty puzzled about how to get started. The Bro website has some quick start docs, but they direct you to use the "brolite" configuration (a kind of simplified, out-of-the-box configuration). The bad news for that, however, is two-fold. First, the configuration is listed as deprecated in the files that come with the source tarball, and second, the brolite installation process doesn't work right under Linux.

So for the record, here's what you need to get started with Bro. (Thanks to Bro Guru Scott Campbell for helping me out with this):


  1. # ./configure --prefix=/usr/local/bro
  2. # make && make install
  3. # cp /usr/local/bro/share/bro/mt.bro /usr/local/bro/site/mybro.bro

After that, create the file runbro.sh:

#!/bin/sh
export BROPATH=/usr/local/bro/policy: \
/usr/local/bro/policy/sigs:/usr/local/bro/site

./bro -i eth1 --use-binpac -W mybro.bro
Now you can just run runbro.sh and it'll do the right thing. The new mybro.bro file will be a very stripped down default set of policies. It won't do that much, but you can then add to it as you see fit. You can find more details about this in the Bro User Manual and Bro Reference Manual.

By the way, this example uses the --use-binpac option to enable some new-style compiled binary detectors. This caused Bro to crash frequently on my RHEL testbed, so if the same happens to you, you might need to leave that option out.

Wednesday, September 10, 2008

Catastrophic consequences of running JavaScript

Wow, I had no idea simply running JavaScript could be this bad. I'm really happy now to be running the excellent NoScript Firefox extension.

First, take a look at HasTheLargeHadronColliderDestroyedTheWordYet.com. DO NOT VISIT THIS PAGE WITH A JAVASCRIPT-ENABLED BROWSER!

Notice that the page displays a simple "Nope", indicating that the world has not yet ended. Whew!

Next, view the source for that page. You'll see the following snippet of code:


<script type="text/javascript">
if (!(typeof worldHasEnded == "undefined")) {
document.write("YUP.");
} else {
document.write("NOPE.");
}
</script>
<noscript>NOPE.</noscript>


Let me walk you through that code... First, if you have JavaScript enabled, everything between <script> and </script> is executed, which comes out to be a single if statement, where one of the possible outcomes is that the world, in fact, has been destroyed.

On the other hand, if your browser doesn't support JavaScript, the page simply renders whatever is inside the <noscript> stanza, which always evaluates to "NOPE."

In other words, with JavaScript enabled, there's a small (but finite) chance that the world could end! However, with no JavaScript, there's zero chance. Therefore, JavaScript is demonstrably dangerous! The risk far outweighs any temporary benefit we could gain from this technology!

Do not tempt fate! Disable JavaScript everywhere IMMEDIATELY! You have been warned!

Monday, September 08, 2008

How those Internets work

If you're a lot of analysts, you probably have a very good grasp of how networks work at the LAN, WAN and maybe even MAN level. You also probably know that "the Internet" is really just a collection of independent networks that have mutually agreed to talk to each other, but you're not exactly sure how all that works.

If that sounds like you, I highly recommend Rudolph van der Berg's article How the 'Net works: an introduction to peering and transit. It's a great introduction to peering and transit agreements, which goes into detail about how traffic routes from one network to another without using a lot of technical jargon. I particularly like the discussion of the economics of choosing peering vs. transiting.

Great article. I have to say, it's one of the most informative networking articles I've read in a while.

Monday, July 28, 2008

Chinese Hacker Podcast

The good folks over at The Dark Visitor have just completed their first podcast. These guys are a great resource for those of us who don't speak Chinese, and if you're not reading and listening, you probably should be.

Thursday, June 19, 2008

Integrating domain reputation search into Firefox 3

This happens to me every day. I find a domain name somewhere, usually through my NSM work, and I wonder, "Is this domain known to be malicious?" Now, I don't personally know every domain on the Internet, but I've had some success using McAffee's SiteAdvisor. You feed it a domain name, and it'll tell you not only if it thinks it's suspicious, but also whether or not it offers any sort of downloads, what other sites it's most closely associated with, and what it's users have to say about it (if anything).

Pretty good stuff, but I'm so lazy. Opening a new tab and typing in the SiteAdvisor URL is just sooo hard! So I decided to add it to my list of search plugins, so I can use the integrated search bar instead. Here's how to do it.

  1. Find your searchplugins directory. For a typical Unix system, this is ~/.mozilla/firefox/XXXXXXXX.default/searchplugins (where the XXXXXXXX is a random string)
  2. Create a file in this directory called siteadvisor.xml with the contents below.
  3. Restart Firefox.


There you go! Three simple steps, and now "Siteadvisor" should be listed when you drop down the search menu.

<SearchPlugin xmlns="http://www.mozilla.org/2006/browser/search/" 
xmlns:os="http://a9.com/-/spec/opensearch/1.1/">
<os:ShortName>Siteadvisor</os:ShortName>
<os:Description>Search McAffee Siteadvisor</os:Description>
<os:InputEncoding>UTF-8</os:InputEncoding>
<os:Url type="text/html"
method="GET" template="http://siteadvisor.com/lookup/?q={searchTerms}">
</os:Url>
</SearchPlugin>


Now, the question of the day: What other sites do you use to easily check a domain's reputation? Leave a comment and let us know!

Monday, June 16, 2008

OSSEC Project Acquired

Congratulations to Daniel Cid, who's OSSEC project has just been acquired. Now that Daniel will be working on the project full time, I think we can look forward to some great things!

Wednesday, June 11, 2008

Unintentional hilarity

I subscribe to the Info Security News RSS feed, which is a pretty nice way to keep up with various goings on in the industry.

This morning, the top headline was:

Unencrypted AT&T laptop stolen, details of managers pay lost


I have to admit, I don't really feel too bad about the poor AT&T managers. However, the really funny part was the very next headline:

AT&T Launches Encryption Services to Help Businesses Secure E-Mail and Data


I can't make this stuff up, folks!

Monday, June 09, 2008

Tor server lists revisited

Way back in 2006, I posted about a way to list active Tor servers by querying the Tor directory. Since then, the Tor project has updated it's directory protocol, so that old method no longer works. Since I had someone ask me about it today, I thought this would be a great time to go ahead and update that post.

The principle is still basically the same:

  1. Identify an authoritative Tor server
  2. Connect to it via HTTP and ask for the router list
  3. Parse the list to get the info you want.

Here's an updated script you can use to dump the information about active routers. The output contains 5 columns, separated by pipe characters ('|'). The columns are :
server name|IP address|onion routing port| \
directory services port|last update timestamp

Now, the first two fields are fairly self-explanatory. The onion routing port (sometimes referred to as the OR port) carries the actual data in a Tor session. The directory services port carries directory traffic (the sort of thing this script does). Not all Tor routers offer directory services, so you will often see a 0 in this column. Finally, the last column simply shows the time the router last updated it's status in the directory.

Here's the script:
#!/usr/bin/perl
#
# Fetch the list of known Tor servers (from an existing Tor server) and
# display some of the basic info for each router.

use LWP::Simple;

# Hostname of an existing Tor router. We use one of the directory authorities
# since that's pretty much what they're for.
$INITIAL_TOR_SERVER = "128.31.0.34"; # peacetime/moria1/moria2
$DIR_PORT = 9031;

# Fetch the list of servers
$content = get("http://$INITIAL_TOR_SERVER:$DIR_PORT/tor/status/all");
@lines = split /\n/,$content;

foreach $router (@lines) {
if($router =~ m/^r\s+(\S+)\s+(\S+)\s+(\S+)\s+(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\s+(\S+)\s+(\d+)\s+(\d+)$/) {
($name, $address, $or_port, $directory_port, $update_time) =
($1, $5, $6, $7, $4);
print "$name | $address | $or_port | $directory_port | $update_time\n";
}
}


Of course, there is much more information in the directory than this script shows. As a NSM analyist, I'm more concerned with IPs and port numbers, but if you poke around, you can also find what OS and Tor software versions are running, what capabilities the routers offer, their default exit policies, and other cool stuff. This is all left as an exercise for the reader. If you're interested, read the spec.

Friday, May 16, 2008

Alternative PCAP subsystems for Sguil

If you read my previous post on pcap indexing, you'll know that I've been playing around with some alternatives to the packet capture and retrieval subsystem in Sguil. I'm happy to announce that I've just committed two replacement subsystems to Sguil's CVS HEAD, one for daemonlogger and one for SANCP.

The daemonlogger subsystem should be fairly stable, as I've been running it in production for some time. It's basically a direct replacement for the snort packet logging instance. It's probably a bit more efficient, and has a smaller memory footprint, but it's still substantially similar.

The SANCP system, on the other hand, is very experimental. It uses the pcap indexing functions of SANCP 1.6.2C6 (and above) to dramatically speed up the retrieval of pcap data from huge captures. If your capture files are routinely over 2GB or 3GB, you might benefit from this. However, it does come at a cost, which is that the index files can consume 25% - 35% more disk space than the pcaps alone. Break out the RAID!

Of course, these are simply alternatives to the existing Snort-based packet logging system. That's not going anyway, we're simply offering choices for advanced users.

Also, even though I've been a member of the Sguil project for some time now, these are my first commits into the source tree. I'm officially a Sguil developer!

Thursday, April 03, 2008

PCAP Indexing

There's been some talk inside the Sguil project lately of improving the performance of PCAP data retrieval. That is, when you're trying to fetch the PCAP data for a single session out of a file that contains all the data your sensor saw during that time period, things can get pretty slow.

Our current method involves simply identifying which file probably (note the strategic use of that word) contains the data you're looking for, then using tcpdump with a BPF filter to find it. This usually works well, but it's often very slow, especially if the PCAP file you're searching through is large, say a few GB.

We've discussed a few approaches we could take to improve the performance of these retrievals. One promising way involves creating an index of the sessions inside each PCAP file. It turns out that the tool we're using to collect network session data, SANCP is actually a pretty decent packet logger, even though we're not using it in that manner. The newest 1.6.2 release candidate includes support for PCAP indexing, so I thought I'd take it for a spin and see just what kind of performance improvement we could expect, if any.

My good friend Geek00L recently blogged his experience with SANCP's indexing feature, but he didn't get into the performance, he just wrote about how it worked. You probably should read that, as well as the official docs, if you're interested in this subject.

Another thing I was interested in was figuring out how to use SANCP to index existing PCAP files, as Sguil is already capturing these with Snort. By default, SANCP will only index PCAP files that it creates, usually by sniffing the network interface. I prefer to keep my data collection processes separate if I can, and the existing PCAP collection is working well enough for now. So my first goal was to see if I could convince SANCP to create an index without creating an entirely new PCAP file.

It turns out that this is possible, though kinda kludgey. I was able to use the following ''sancp-pcapindex.conf'' file to create an index:


default index log
default pcap filename /dev/null
format index delimiter=| sancp_id,output_filename,\
start_pos,stop_pos,src_ip_dotted,\
dst_ip_dotted,ip_proto,src_port,dst_port

This is pretty close to the version listed in SANCP's docs, except that I added the ''default pcap filename /dev/null'' line in there. So SANCP will still create a PCAP file, but it'll be written to /dev/null so I'll never see it.

I also had to use two additional command-line options to turn off the "realtime" and "stats" output SANCP likes to generate by default. so in the final analysis, here's the command line I ended up using:

sancp -r snort.log.12347263 -c sancp-pcapindex.conf -d sancp-output -R -S


Ok, so on to the actual indexing tests! I was curious about several things:

  1. How long does it take to create an index?
  2. How large is the index, compared to the size of the PCAP file itself?
  3. Is there a performance increase by using the index, and if so, how much?
  4. Does extracting PCAP by index return different data than extracting it by tcpdump and a BPF?

I decided that I would choose two PCAP files of different sizes for my tests. One file was 2.9GB, and the other was 9.5GB. For each file, I tested index creation speed and retrieval speed using tcpdump compared to retrieval speed using the index and SANCP's ''getpcapfromsancpindex.pl'' tool. Each of these tests was conducted three times, and the results averaged to form the final result. In addition, I examined the size of the index file once (the last time), under the assumption that a properly-created index would be the same each time it was generated.























PCAP size Index size Index Creation Tcpdump extraction Indexed extraction
2.9GB 446M 7m58s 39s 5s
9.5GB 1300MB 23m20s 2m21s 11s

The larger file is roughly (very roughly) three times the size of the smaller file. As this data suggests, the index size and the index creation time are linear, as the larger of each value is about 3x the size of the smaller. The same is true of the time necessary to extract the data with Tcpdump, though not to quite the same extent (it's just a bit over 3x).

However, the interesting part is that the time required to extract the data using the indices is not linear. It only took a little over 2x as long, though to be honest, it could easily be a matter of the amount of data that was contained in the individual network session or something. Still, using the index was about 87% faster in the small file, and about 92% faster with the larger file.

I think it's pretty clear that these indices speed up PCAP retrieval substantially. I think the drawbacks are that the index files are fairly large, and that they take a long time to generate.

As for the index file size, the indices look like they are about 13% - 15% of the size of the original file. For the drastic performance improvement they provide, this could be worth it. What's one more drive in the RAID array? Also, it's possible that they could be compressed, with maybe only a relatively small impact on retrieval speed. I'll have to try that out.

Index generation time is potentially more serious. Obviously, it'd be nicer to generate the index at the same time the PCAP is originally written, but as I'm unwilling to do that (for the moment, at least), I think the obvious speed-up would be to somehow allow SANCP to generate the indices without trying to write a new PCAP file. Even when I've directed it to /dev/null, there has to be some performance overhead here, and any time spent writing PCAP we're throwing away is just time wasted. This would be my first choice for future work: make a good, quick index for an existing PCAP file.

All in all, I'm impressed with the retrieval speed of SANCP's indexed PCAP. Now, if we can get the index creation issue sorted out, this could be a really great addition to Sguil!

Update 2008-04-03 20:15: I was in such a rush to complete this post, that I accidentally forgot to answer my question #4! It turns out that the data returned by both extraction methods is exactly the same. Even the MD5 checksums of the PCAP files match. Great!

Also, note that I edited the above to add details on the command line I used to generate the indices. Another stupid rush mistake on my part. Sorry!

Wednesday, April 02, 2008

Automating malware analysis with Truman

Let me start by saying, I'm no malware analyst. I've done a little reversing with IDA Pro, but really only in class. However, during an incident investigation, I frequently come across an unknown Windows binary that is likely to be some sort of malware, and would really, really like to know what it does, and stat! I can do a few basic tests on the binary myself, like examining the strings for clues as to it's purpose, or maybe unpacking it if it's using some standard packer. For the most part, though, modern malware is too hard to get a good handle on quickly unless you're aces with a debugger.

A couple of years ago, though, Joe Stewart from LURQH (now SecureWorks) released an analysis framework just for folks like me. The Reusable Unknown Malware Analysis Net (TRUMAN) is a sandnet environment that allows the malware to run on a real Windows system attached to a protected network with a bunch of fake services, then collects data about the program's network traffic, files and registry entries created, and even captures RAM dumps of the infected system. The great thing about TRUMAN is that it not only makes it easy to collect this information, it automates most of the process of creating a secure baseline, analyzing changes against that, and restoring the baseline to the Windows system when it's all over.

The terrible thing about Truman though, is that it is quite a complex system, with a lot of moving parts. Which would be OK, except that it comes with almost no documentation. Seriously. None. There's an INSTALL file that gives a brief overview, but leaves out most of the important steps. Frankly, except for one Shmoocon presentation video, there's nothing on the Internet that really tells you what Truman is, how it works or how to go about installing and using it.

Until now!

I just added a TRUMAN page to the NSMWiki. This contains a lot of information, much more than would comfortably fit into a blog post. Most importantly, it contains a detailed step-by-step process for getting TRUMAN up and running with RHEL5 and Windows XP.

Using Truman, I can collect a substantial amount of information about how a suspicious binary acts when it runs, and do it in a matter of 20 or 30 minutes, rather than hours. Admittedly, it's not foolproof, but it should come in extremely handy next time I run across an unknown Trojan.

Monday, March 31, 2008

Switching to Sguil: A whole new meaning

Many of you may have wondered why I haven't yet blogged about the recent release of Sguil 0.7.0. Did I forget? No. Am I disappointed with it? Not at all! Am I just lazy? Yes, but that's not why.

The truth is, I've held off blogging about that because there's some even bigger news with the Sguil project!

You probably didn't know this, as we've tried hard to keep it under wraps until it could be formally announced, but the Sguil project has just received an extremely large vote of confidence, in the form of it being acquired lock, stock and barrel by Cisco!

Yes, you read that right! From the press release:

Under terms of the transaction, Cisco has acquired the Sguil™ project and related trademarks, as well as the copyrights held by the five principal members of the Sguil™ team, including project founder Robert "Bamm" Visscher. Cisco will assume control of the open source Sguil™ project including the Sguil.net domain, web site and web site content and the Sguil™ Sourceforge project page. In addition, the Sguil™ team will remain dedicated to the project as Cisco employees, continuing their management of the project on a day-to-day basis.

Really, I didn't blog about Sguil 0.7.0 yet because I didn't want to do say anything that could have interfered with this deal.

The great thing about this is that both Cisco and Sguil have made significant investments in Tcl, as it's already found in the OS on many Cisco products. Of course, Sguil is written almost entirely in Tcl, so this should provide for some great synergy going forward. You should start seeing Sguil being pushed out into the carrier-grade Cisco gear by 3Q08, with the rest of the Cisco-branded products following in phases through 4Q09. Linksys-branded gear will be supported too, though there's not an official timetable for that yet.

On a personal note, I would like to congraluate Bamm (AKA "qru"), Sguil's lead developer. He's put a lot of time into this project over the years, and is finally going to reap some rewards:

Although the financial details of the agreement have not been announced, Sguil™ developer Robert Visscher will become the new VP of Cisco Rapid Analysis Products for Security. “This deal means a lot to the Sguil™ project and to me personally,” Visscher explains. “Previously, we had to be content with simply being the best technical solution to enable intrusion analysts to collect and analyze large amounts of data in an extraordinarily efficient manner. But now, we’ll have the additional advantage of the world’s largest manufacturer of networking gear shoving it down their customers’ throats! We will no longer have to concern ourselves with mere technical excellence. Instead, I can worry more about which tropical island to visit next, and which flavor daiquiri to order. You know, the important things.”

I know that many of you will have questions about this major evolution in the Sguil project and our continuing roles as Cisco employees, so please feel free to leave them here as comments, or ask in freenode IRC's #snort-gui channel.

Tuesday, March 25, 2008

Temporarily speed up SANCP insertions in Sguil

It's Monday morning, you're half asleep, you haven't finished your first diet soda yet, and -- oh no! Sguild has been down all weekend! Worse yet, SANCP inserts are backed up, to the tune of 17,000+ files in the queue!

As you know, the Sguil sensors are pretty much independent of the actual Sguil server. They'll happily continue collecting data, even when sguild has been down for a while. When the server comes back up, the sensors will automagically reconnect and send all the queued data. This is by design, of course. You don't want to lose all that data due to a failure of the central server.

Even after an extended outage, most of the data collected by the Sguil sensors poses no real problem. There are relatively few Snort alerts (maybe a few thousand), and probably even fewer PADS events, and these get added to the database in no time. Network session records collected by SANCP, however, can pose a bigger problem.

If you recall, SANCP works by keeping an in-memory list of active network "sessions" (including psuedo-sessions created from UDP and ICMP traffic). By default, it will dump these to a file every minute or so (or more often, on a busy network). The Sguil sensor contains a SANCP agent process that monitors the filesystem for these files, and sends them to Sguild as they are created, deleting them from the sensor.

Now here's the problem: there are just so many darned network sessions on a busy network that even a short outage can result in a few hundred files waiting to be queued, especially if you have multiple sensors. Longer outages, though, can be disastrous. Let's say that you have six sensors, and your Sguil server has been down for the weekend (48 hours). How many files is that?

60 * 48 * 6 = 17,280

Now, at an average rate of about 5 seconds to insert each file, how many hours would that take to catch up?
17,280 / (60 * 60) = 24

That's right! It'd take a full 24 hours to catch up! In the meantime, you're missing a few days of valuable network data (probably the few days you're most likely to want to query on Monday morning) and your MySQL database is spending all it's time inserting, which means not only that it's slower to respond to your analyst console, but also slower to process incoming events. In fact, it can easily get caught in a sharp downward spiral, where the incoming data gets even further backed up.

So what can you do about this? Actually, it's quite simple. If you find that you're getting behind while processing your backlock of SANCP records, you can dramatically speed things up by temporarily disabling the indices on your SANCP files.

First, figure out which days you have to catch up on. If you know your server crashed on Friday the 8th, and it's now Monday the 11th, you probably want to go through all SANCP tables from Friday - Monday.

Second, determine what the table names will be. Remember that Sguil creates one SANCP table per day, per sensor. These are all merged into a single virtual table, but for indexing purposes, ignore that one and concentrate on the individual tables. They will be named something like:
sancp_$SENSORNAME_$DATE

So for example, if you have two sensors named "external" and "internal", you'd have the following tables:
sancp_external_20080208
sancp_internal_20080208

sancp_external_20080209
sancp_internal_20080209

sancp_external_20080210
sancp_internal_20080210

sancp_external_20080211
sancp_internal_20080211


Next, you simply issue the SQL command to disable indexing for each table:
ALTER TABLE sancp_external_20080208 DISABLE KEYS;

MySQL will perform a quick table check before returning to the prompt. This may take a minute, and I personally find it annoying to wait after each table, so I usually just create a text file with all the commands in it, one per line, and run it batch mode:
mysql -u sguil -p sguildb < DISABLE-KEYS.txt

Based on my experience, I've seen the indexing speed go from 5 seconds per file to about 5 files per second, which is quite significant! At that rate, it would take less than an hour to insert everything!
17,280 / (5 * 60 * 60) = 0.96

Of course, you have to be extra careful to re-enable indices on all those tables. You can run a similar set of SQL commands to turn indices back on for a table:
ALTER TABLE sancp_external_20080208 ENABLE KEYS;

Again, I usually run this as a batch job.

The act of disabling and then later re-enabling indices does take a little while, but usually not more than a few minutes for each. Even given this overhead, it is still significantly faster to process a bunch of SANCP files without indices, then reindex them after you're all caught up.

Sure wish I didn't need to know this... 8-)

Update 2008-03-25 11:27: After you re-enable keys, you may need to also do a quick db check to make everything sane again:
mysqlcheck -o -a -u sguil -p sguildb

This will recheck all your tables and make sure they're still consistent. I've had a few situations where Sguil has been returning error messages like "ERROR 1030 (HY000): Got error 124 from storage engine" until I did this.

Friday, March 21, 2008

In which I attempt a metaphor

So I was explaining the poisoned search results threat to several people yesterday, and I hit upon a good metaphor to explain why this is particularly serious: it increases the attacker's "shots on goal".

If you know Hockey at all (which I don't, but I've been to a few games), you know that the scoreboard typically lists "Shots on Goal" right beside each team's score. Why? Because you can't score if you don't shoot!

The more times you get to try to score, the more likely it is that you will do so, and it's the same with security. Tracking the number of exploit attempts, even if they are unsuccessful, is just like reporting shots on goal.

It happens that poisoned search results are a great way to increase your shots on goal with very little effort, and if the analogy holds, that means this will prove to be an extremely effective strategy for the attackers. I believe current events are proving this to be true.

Of course, now that I have not only made metaphor linking digital security to real life, but a hockey metaphor, at that, I expect that I have invoked The Bejtlich, and he will no doubt be forced to appear shortly and leave an insightful comment.

New ZLob spreads through poisoned search results

You may have seen this technique before, but in the last few days, it seems that the creators of the ZLob trojan have found an effective way to spread their malware: poisoned search results.

In case you're wondering how this works, it goes something like this:


  1. The attackers identify a set of "hot" search terms that users are most likely to be looking for. Popular products, current events, celebrities, scandals, you name it. I don't know for sure where they come up with these terms, but if it were me, I'd get them from Google Trends or some place like that. To really be effective, the attackers need to gather as many of these terms as possible, perhaps several thousand. They need to be updated frequently, too.
  2. The attackers identify an otherwise legitimate website that happens to be vulnerable to some sort of file upload attack.
  3. The attackers create a set of HTML files, one per search term they're targeting. The HTML is crafted to look highly relevant for that term, with what looks to me like snippets of text from other legitimate web pages on that subject. In addition, each of the files links to each of the other files, artificially inflating their number of incoming links in an attempt to fool the search engine into placing them nearer to the top of the result list.
  4. When a user searches on one of the terms, they will see poisoned results interspersed with legitimate ones. If they click on the poisoned link, obfuscated Javascript in the page will redirect them to a site that claims to have a relevant video. It shows a static GIF that looks like the YouTube video interface, but then pops up a dialog telling the user they need a new CODEC to view the clip.
  5. Of course, you know where this is going... The "CODEC" is an EXE file containing the ZLob trojan. SCORE!

It used to be that if you avoided browsing pr0n, gambling sites and similar shady sites, you were less likely to come into contact with this sort of thing. But now, legitimate users doing regular, every day searches are being exposed a lot more often. This is kinda scary.

So what can you do to protect your users against this type of attack? On a technical level, not that much. You can't really get much done on the Internet without a search engine, and it's going to be up to them to improve their ability to vet the pages they index. Individually, something like the NoScript Firefox plugin would be effective, but that's difficult to impose on an entire user community.

However, the most effective security is not technical. Get the message out to your users, "There are malicious web pages out there; you're likely to find some of them inside the search engine results; be careful what you click on, and never download things you weren't expecting to download."

Of course, I can't let this go by without at least some sort of NSM advice. Here's a quick Snort rule I wrote to detect these trojan CODEC downloads:

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"WATCHLIST Possible ZLob Codec Download"; uricontent:".exe"; nocase; pcre:"/.*codec.*\.exe/smi"; flow:to_server,established; classtype:trojan-activity; sid:10000000; rev:1;)

This looks for HTTP downloads of files that machine "*codec*.exe" (case insensitive, of course). A simple file name change or something would evade this, but it's not too hard to see how to customize this to look for other things. And if your version of Snort is compiled with flexible response support, you can even add "resp: rst_all;" to try to block the download attempts by sending spoofed RST packets, which should provide some extra security.

Friday, February 29, 2008

NSM and VLANs

Sometimes I run across a pcap-aware utility that does something very cool, but just doesn't work right when it encounters 802.1Q VLAN tags in traffic. Most commonly, it fails to recognize these packets as IP.

To see why, take a look at this diagram of a sample Ethernet II frame. Most simple-minded code simply checks bytes 12 & 13 (the EtherType field) to see if they contain 0x08 0x00, which is the code for IPv4 traffic.

However, 802.1Q tags throw things off a bit. If present, the tag occupies an additional 4 bytes between the source address and the EtherType field. The first two bytes are a standard code for VLAN tags, 0x81 0x00. Then the following two bytes are an arbitrary numeric value identifying which VLAN the traffic belongs to.

The code is often something like:


if(etherframe[12] == 0x08 && etherframe[13] == 0x00) {
/* Process an IP packet */
}

Adding VLAN support simply involves an extra check. For a quick and dirty solution, if the VLAN tag is present, I usually just adjust the pointer to the beginning of the frame's data (etherframe, in the above case) by 4 bytes, then proceed as usual.

In most cases, this has the desired effect, but it does mean that I'm throwing away all VLAN tag information. In my experience, this has only been a problem once, on Shmoocon's "hack or halo" network, where each participant's computers were on a unique VLAN but had identical IPs. In the real world, I have never seen this, but I'm certain it exists somewhere.

In the meantime, if you can live with this restriction, you can try out some VLAN patches I wrote for PADS, tcpflow and tcpxtract. You may also want to check out this wiki page which tracks VLAN support in various libpcap-based analysis tools.

Wednesday, February 27, 2008

The awesomest

This is the awesomest thing I've ever heard on the Internet. Some guy recorded a 10 minute phone call with a phisher. My favorites are the wife, the FBI and the 357.

This is SFW, though there are two or three things bleeped out.

Go ahead, get pwn3d, you've got Norton.

So I open my inbox today and find an e-newsletter from Symantec. Normally, they barely register with me, and I just delete them an move on. This one though, had a great subject line:

Go ahead, You've got Norton


Really? That's the idea you're going with? It's safe to open that attachement/click on that link/view that malicious site, just because you've got Norton AV?

I know this was probably written by a marketdroid, as I sincerely doubt that the Norton AV product engineers would encourage you to engage in risky Internet behavior no matter which AV product you've got installed. Still, you'd think that someone, somewhere, when planning their marketing strategy, would notice the fundamental disconnect between that slogan and any actual good security practice.


As Richard Bejtlich is fond of saying, prevention eventually fails. C'mon, Symantec. How can you expect customers to trust your product if your own marketing efforts display an ignorance of fundamental security principles?