And now a guest post from Justin Hall!!
Members of my incident investigation team are constantly on the move, traveling from customer to customer, gathering data, performing analysis, and writing reports. One of the tools we find ourselves commonly reaching for is a log analysis platform. Many of us are purists – “give me grep and coffee” types – while others enjoy a more organized “browsing” approach. We’ve found one solution that meets both of these needs – Splunk.
(Note, I am not a paid spokesmodel for Splunk, we do not resell their product or make any money from them. We’re just big fans!)
I wanted to distribute a custom Splunk-based log analysis platform for our crew to use. I had a few key requirements in building this platform:
- It had to be portable. I wanted to build it and distribute it to our crew in a format that would be ready to use immediately, on any system in our fleet.
- It had to be powerful. I wanted it to meet as many of our log analysis needs as possible, in one package.
- It had to be secure. We need to make sure our customer’s data is protected while we’re investigating.
- It had to be free (as in beer). Our budget didn’t have room for crazy expensive tools.
So I went with a virtual machine that could be distributed and re-used easily. The recipe I ended up with included Oracle’s VirtualBox, a free virtual computing platform; Ubuntu Linux Server, my personal favorite flavor of Linux; and Splunk 4.2 for Linux.  Note, this article is not a “How to use Splunk” guide. If you are unfamiliar with Splunk, their documentation’s pretty good. I’ve found that just playing around with the tool is also a great learning experience!
Here’s how I built the platform.

1. Install VirtualBox, if it’s not already installed. As of this writing, v4.0.4 is the stable release.
2. Configure host-only networking to use a private subnet between your PC and the virtual machines. We use host-only networking to transfer data between the VM and the host OS so that it cannot be intercepted outside of our machine. This feature is usually enabled by default – but you should verify the subnet you’re using, and the IP you have set on your host OS’s interface.


3. Create a new VM. I use two NIC’s on the VM – the first interface is bridged to the LAN, and only up when I am patching. The second interface is the host-only network, which will receive an address on that network from VirtualBox’s built-in DHCP server. Make sure to give your VM sufficient RAM – I use at least 1GB – and enough disk space for the largest log set you can imagine storing on the VM’s filesystem. I usually max it out at around 30GB.


4. Install Linux. During the installation, we enabled full-disk encryption – not just because you might store log data on the VM’s filesystem, but because Splunk will index the logs and those indexes also may contain sensitive data. I use a dummy key initially; when I distribute the VM, our investigators add their own decryption key and remove the dummy key, so that it’s unique to each user’s VM.


5. You’ll also want to install the SSH server, so that you can securely transfer logs to the VM over a network connection.


6. I have two users on the machine – a standard user for normal operation and administration, and for manual log analysis; and a Splunk service user. I call my first one ninja, but you can use whatever less awesome name you like. Then I add a Splunk user (called splunk), and give that user a home directory of /opt/splunk.
7. Once your OS is installed, configure the SSH server to disable root login, and if you don’t want to use passwords every time you copy a file to the VM, add your public SSH key to ~/.ssh/authorized_keys (although you may not want to do so before distributing!).


8. What’s the first thing you do after installing an OS? Patch! I bring up my bridged adapter and sudo apt-get update; sudo apt-get upgrade once a week or so. Remember to shut eth0 down afterwards.
9. Install Splunk on the VM. I’ve found the easiest method is to download the .deb package from Splunk, SCP it to the VM and run dpkg -i. It will install by default in the splunk user’s home directory, /opt/splunk.
10. Add your ninja and splunk users to a “logs” group (I call mine “logs”). We will restrict the logs directory so only users in that group can write to it.
11. You can get your logs to the system in many ways – copy from mounted removable media, download from a webserver, direct syslog, etc. Personally I like to SCP them to a logs directory on the filesystem, like /logs. Give the logs group +rw permissions to this directory – this is important, because Splunk will need to read the logs from this directory, and we’re going to run Splunk as the splunk user. I use the ninja user to SCP logs to this directory, from my host OS, using the host-only network. You can remove other read/write permissions to the directory.
12. Fire up Splunk! Su to your Splunk user, make sure that user has +rw on the /opt/splunk/ directory tree, and run /opt/splunk/bin/splunk start. You’ll have to accept a EULA.
13. Open your browser on your host OS and hit the web interface (http://vm-host-only-address:8000) on the host-only network. Log in with the default credentials, which are listed at the login page.


14. Switch to Free license: go to Manage -> Licenses -> Change license group. The Free license has some features disabled (but probably nothing you’ll miss with this setup) and can only index 1GB of logs per day. If you go over once, Splunk will nag you; if you go over 3 times in a rolling 30-day period, Splunk disables your search capability until you upgrade to a paid license. So be careful!


15. Configure Splunk to monitor the /logs directory. This will make Splunk read log data as soon as it’s written to /logs. In Splunk, go to Manager -> Data Inputs and open Files and Directories. Add a new file input and point  it to the /logs directory on the filesystem. You can leave all the other options at the defaults. You also may want to disable the other inputs listed under Files and Directories – the defaults are just Splunk logfiles, but you might not want that data inadvertently showing up alongside your customer’s logs.


16. Test your setup – SCP a logfile to /logs and make sure it gets eaten by Splunk and is searchable. Watch Splunk’s main search page until events start showing up – the Events Indexed counter will increase. Run a search to verify the data is being processed properly.
17. You might want Splunk to start when you boot your VM. If so, enable Splunk boot-start with the splunk user: as root, run /opt/splunk/bin/splunk enable boot-start –user splunk
18. That’s it! You’re ready to analyze. Note, when you are done with an investigation, switch to the splunk user, and run /opt/splunk/bin/splunk clean eventdata to remove all of the indexed data and prepare a blank slate for future investigations.
We’ve found this platform extremely convenient for log analysis work. When we want to do full-scale analysis, we have Splunk; when we want to quickly grep | cut | sort | uniq, we have the command line. I handed the VM files – the VBOX config file, and the .VDI virtual disk it uses – to my investigation team, had them change the appropriate passwords, and they were ready to go.
We hope you find this recipe useful as well. Happy hunting!
Quick shameless plug: I am mentoring a session of SANS’ SEC504 – Hacker Techniques, Exploits & Incident Handling in Cincinnati, Ohio, starting May 11. You can find more information about the class, dates & times, and sign up here. Hope to see you there!

About the author