Info_hash decoding in URLs [Updated]

Update: I have written a small utility that will take the encoded hash and decode it for us. After that, it will do a Google Search, which will often turn up the name of the file being downloaded. If not, you can take the decoded hash and search with it on several torrent tracker sites. You can download the code here.


Just about anyone who has performed investigations in corporate settings has run across a URL like this while reviewing proxy logs:

http://tracker.ccc.de:80/announce?info_hash=%25ED%25D4%25AD%2589%25A8%2508%25D8%2623%25C4%25EA%253C%2512%2512%25E6%250D_%25B73%259D%25CF%2514&peer_id=….

Clearly someone is torrenting on our network! If your logs are good enough, you probably even know who was doing it. But wouldn’t it be nice to know what they were trying to download?

I like to think I’m pretty good at tracking down information on the internet, but I had a really hard time figuring out how to decipher the information above to show me what the target of this torrent was. So here is what I discovered, in hopes that it will help someone else avoid the cursing and frustration that I went through.

For those that don’t know, torrents are identified by a SHA1 hash of part of the metadata for the requested file. This is not a post about how torrent works, and I only understand bits of it myself, so please forgive me if the description of the hash above is not completely accurate. Suffice to say, the “info_hash” is the value that uniquely identifies a particular torrent.

But the value above is not helpful, unless you happen to be a torrent tracker. And pasting the string directly into Google will only work if someone else has already posted it somewhere (probably to ask what it is). To get something useful, we need to decode the value so that it looks like a SHA1 hash.

The first step is to undo the URL encoding from the string. There are websites that will decode a string, and various functions in code that can also accomplish it. If you are decoding by hand, you are most likely to only find %25 in the string, which is actually just the % character.

If we URLDecode the info_hash above, we get the following string:

%ED%D4%AD%89%A8%08%D8&23%C4%EA%3C%12%12%E6%0D_%B73%9D%CF%14

Still doesn’t look like a hash value, but we’re getting closer. The final step is to get rid of those % signs for good. The rule for decoding this is that every time you see a % sign, take the next two characters as is and remove the % in front of them. If the next character after the two you just used is not a % sign, convert it to its ASCII hex equivalent. For example:

%E6 becomes E6, but %0D_ becomes 0D5F because the ASCII hex value for the “_” character is 5F.

When you are done, you will get a 40 character SHA1 hash. If you enter that value into a Google search, you will usually find a link with the title of the file the user was trying to download.

In this case, the user should probably be punished severely, as they were attempting to download a Taylor Swift album!

Advertisements
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s