By Paul Asadoorian
This is a nice, easy way, to build a custom dictionary for your target. I got some of the original code from SANS Security 560 by Ed Skoudis. With his permission, I’ve published some of my enhancements. The first step is to grap the entire web site:

wget -r -l 2 www.targetwebsite.com

terminalyellow.jpgI’m going two levels deep here, you can adjust that with the “-l” flag. How many levels deep depends on how big of a dictionary you want and how big your target site is. [Editors note: This can take you outside of the target website by following links to other sites. As Paul pointed out, this may be valuable. If the sites are linked, there is something in common and valuable between them] Next, we replace the spaces with new line characters and produce a uniq list:

grep -hr "" www.targetwebsite.com/ | tr '[:space:]' '\n' | sort | uniq > wordlist.lst

Next step is to remove the weird characters. Don’t worry, we can put them back. This primarily removes the HTML tags and such:

egrep -v '('\,'|'\;'|'\}'|'\{'|'\<'|'\>'|'\:'|'\='|'\"'|'\/'|'\/'|'\['|'\]')' wordlist.lst | sort -u > wordlist.clean.lst

ripper.jpgNote: I do not remove the parentheses characters “()”. We probably need to move to perl regex or something similar to do that. I get a syntax error when I try to remove the “(” or “)”. Also, different versions of grep (and wget) will behave differently, so you might have to tweak. Below, we append the default John the ripper password list to our custom list:

cat password.lst >> wordlist.clean.lst

Now, we might have duplicates and since we removed all special characters (Well, most of them anyhow) we need to put them back. Below we run John to re-generate our unique wordlist, apply some rules, and output to standard out:

john --wordlist=wordlist.clean.lst --rules --stdout | uniq > final.wordlist.lst

For bonus points you can modify the rules so that it does a better job of adding in special characters (such as replacing all “i” with “1″). We’ll leave this exercise up to the reader.
Passwords are just so easy to abuse…
- PaulDotCom

About the author