Friday, July 21, 2006

Extracting gzipped or Unix script files from pcap data

During an incident response, it's often handy to be able to examine the actual attack traffic, and if you're using Sguil, you probably have it handy. One common situation is that an intruder has transferred files to or from your network, and you'd really like to see what's in them.

There's a great tool for extracting arbitrary files from pcap dumps, tcpxtract. Similar to the way hard drive forensic tools look through bytes on the disk to find the "magic headers" at the beginning of various types of files, tcpxtract combs through pcaps looking for file types it knows, regardless of the transport protocol used to ship them over the network. When it finds them, it writes them out to individual files for manual analysis.

Tcpxtract is a great tool for NSM practicioners, and should be in everyone's standard kit. There are a few common types of files that it doesn't support, but you can easily fix this by simply editing the tcpxtract.conf file to add support for new types if you know their magic numbers.

My friend geek00L has already blogged about adding Windows PE executable support. Now I'm here to tell you how to add support for gzipped files and Unix script files like "#!/some/file" ("#!/bin/sh" or "#!/usr/bin/perl" for example).

Just add the following two lines to the end of tcpxtract.conf:


gzip(1000000, \x1f\x8b\x08);
script(1000000, \x23\x21\x2f);

A little anti-climactic after all that buildup, wasn't it? I've had some advice that the script detection is likely to throw lots of false positives in an SSL session, so maybe you should keep it commented out until you know there are script files in the session that you need to find.

Thursday, July 13, 2006

Canonicalizing funky IP addresses

I've been playing around with some phishing data recently, to see if I can correlate known phishing attempts with my outgoing network sessions in order to alert me when my users fall victim. I don't have any results to report for this yet, but I did have to do some research into canonicalizing IP addresses.

Most people don't realize this, but there are a few different IP address formats. We're using to seeing the dotted decimal quad form (e.g. "192.168.1.1") but there are others, and the phishers are using them. Thus, it behooves us to know how to work with them.

I was unable to find a comprehensive list of available formats online for some reason (if you know of one, please leave a comment!) so I examined my corpus of phishing URLs and found the following examples:


  1. 192.168.1.1
  2. 1127353292
  3. 0x43320bcc
  4. 192.0x43.9.3 (i.e., mixed decimal and hex quads)


Of course, phishers are also using normal hostnames as well, giving us at least 5 different types of possible inputs to deal with. Most of our tools use the decimal dotted quad format, so we need to normalize these formats into something useful.

Here's a Perl function that should do this for you. If you pass it any of the above formats (including the hostname), it will normalize the input and return a decimal dotted quad in string form, suitable for your favorite security tool. If it can't be converted (maybe it's a hostname that no longer resolves) the function returns undef instead.

I've wrapped this in a simple command-line tool and also incorporated it into other Perl scripts. I'm sure you'll find other uses for it as well. And hey, if you come across any other legal address formats, please let me know.


use Socket;

# Take any valid IP address or hostname and return the normalized IP address.
# Recognizes the following formats:
# some.host.com
# 192.168.0.1
# 0x43320bcc
# 1127353292
# 192.0x1c.0x10.9 (ie, mixed decimal and hex quads)
# The function returns the string value of the IP address if successful, or
# undef if not.
#
# Warnings:
# 1) This will return only a single address, so if the hostname resolves
# to more than one address, you won't get them all.
# 2) It will generate a DNS query when looking up hostnames
sub normalize_funky_addrs {
my($host) = @_;
my($addr);

# If this is a hex address (e.g., "0x01234FAc") then first convert it to
# decimal form
if($host =~ m/^0x[a-fA-F0-9]+$/i) {
$host = hex($host);
}

# If the entire address wasn't hex, individual octets might be. If so,
# find them and convert them. Split the address into octets, check and
# covert hex in each octect as necessary, then paste them all back together
# into one string and continue processing. Yes, I could just return here,
# but I'd rather have only one function exit point.
if($host =~ m/^([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)$/) {
($octet1, $octet2, $octet3, $octet4) = ($1, $2, $3, $4);
$octet1 = hex($octet1) if ($octet1 =~ m/0x/i);
$octet2 = hex($octet2) if ($octet2 =~ m/0x/i);
$octet3 = hex($octet3) if ($octet3 =~ m/0x/i);
$octet4 = hex($octet4) if ($octet4 =~ m/0x/i);
$host = "$octet1.$octet2.$octet3.$octet4";
}

# Now we've either got a hostname, a normal IP address or an integer-form
# IP address. We can work with any of those.
$addr = gethostbyname($host);
if($addr) {
return inet_ntoa($addr);
} else {
return undef;
}
}