Friday, May 16, 2008

Alternative PCAP subsystems for Sguil

If you read my previous post on pcap indexing, you'll know that I've been playing around with some alternatives to the packet capture and retrieval subsystem in Sguil. I'm happy to announce that I've just committed two replacement subsystems to Sguil's CVS HEAD, one for daemonlogger and one for SANCP.

The daemonlogger subsystem should be fairly stable, as I've been running it in production for some time. It's basically a direct replacement for the snort packet logging instance. It's probably a bit more efficient, and has a smaller memory footprint, but it's still substantially similar.

The SANCP system, on the other hand, is very experimental. It uses the pcap indexing functions of SANCP 1.6.2C6 (and above) to dramatically speed up the retrieval of pcap data from huge captures. If your capture files are routinely over 2GB or 3GB, you might benefit from this. However, it does come at a cost, which is that the index files can consume 25% - 35% more disk space than the pcaps alone. Break out the RAID!

Of course, these are simply alternatives to the existing Snort-based packet logging system. That's not going anyway, we're simply offering choices for advanced users.

Also, even though I've been a member of the Sguil project for some time now, these are my first commits into the source tree. I'm officially a Sguil developer!

5 comments:

Martin said...

That's great to hear that you've updated SGUIL to use the pcap indexing. For those of us who deal with big pipes, SGUIL really doesn't scale without some help. I first suggested the indexing idea to John over a year ago, and he took it and ran with it, and now it's pretty solid.

I got similar disk cost results as you did. Obviously, the smaller the average packet is, the larger the index will be with respect to the pcap. Search performances were several orders of magnitude faster. My pcaps are somewhere in the neighborhood of 30 GB for a 15 minute time period, and indexing was able to retrieve arbitrary packets in about 5 seconds.

However, the indexing speed increase is only really apparent if the needle in the haystack you're looking for is fairly unique. That is, if you're searching on some characteristic that exists many times throughout the index, (e.g. searching traffic for a very busy host) then you will see less of a speed increase because the bottleneck is not search but retrieval. That said, typical searches are usually unique enough to see get the big performance boost.

Many thanks to John for another great feature in SANCP and thanks to you for getting it into the hands of the larger community via SGUIL.

DavidJBianco said...

Martin, thanks for the informative comment. I found a similar speed increase, although I also found a performance bottleneck with the speed of the disk reads for the index files. To get the best performance, it looks like the index files should probably be on a different disk than the pcaps, especially for more active networks such as yours. I don't think SANCP provides a convenient way to do this yet, but perhaps I've simply overlooked it.

As for your point about typical searches and the retrieval time, you're right for UDP and TCP searches. They're usually unique enough to give good performance, because at least one of the port numbers is bound to be ephemeral, and therefore acts like a database key. For things like ICMP, which doesn't use ports, you could be getting more data than you wanted, though hopefully the sheer volume of ICMP between any two hosts isn't going to be very high.

KS Lee said...

H,

Is SGUIL capable to handle huge mount of raw data capture in ISP with say 300Mbytes of data minimum ?

In this case we need to equip with a huge harddisk for data capture in one week and does it make sense in practice ?

Or SGUIL will only store raw data traces for those IDS alerts being fired? Thanks

John Curry said...

David,

You can log index files to a different disk using the configuration file directive:

default index filename /index

where: is relative to the output directory as specified by -d

In this case, could be a link to a directory located on a separate disk.

Alternately, you can use an absolute path as well.

i.e. default index filename /disk2/sensor2/indexes/today/index

-John

John Curry said...

Correction my previous comment should have read:

default index filename [directory]/index

where: [directory] is relative to the output directory specified by -d