Where Is Your Big Data?
Welcome
You'd think that massive amounts of data wouldn't have the opportunity to be elusive, but we know this isn't true from the sheer number of data breaches in the past couple of years. Big data seems to be one of the greatest sources of pain for enterprises and online businesses alike. But where does all that data come from, where does it go, and why is it so hard to maintain? At first glance, the answers seem simple. Upon further inspection, the answers are still pretty simple.
Where does big data come from? This somewhat inappropriately configured question's answer is logfiles. Logfiles are by far the biggest culprits in big data generators. Every device on your network generates some type of logfile. Those logfiles either are kept on the local systems that produce them or they're sent to some type of log aggregator for further processing. Or not – meaning that someone might collect them but never bother parsing them. Preserving logfiles simply for posterity is a waste of bandwidth and disk space. If you collect logs, then you should parse, scrape, and process them for relevant and actionable information, including security breach data.
Where does the data go? The answer to this question shouldn't be much of a mystery because of how logfiles are saved or sent to another system for processing. Unfortunately, logfiles are often forgotten. Someone once called logfiles our digital exhaust. The moniker is accurate enough, because once we've jettisoned those logfiles, they're out of sight and out of mind. For a lot of us, their fate falls into the "good riddance" category. "No one looks at those stupid logfiles anyway" goes the swan song of many well-meaning but shortsighted system administrators. If you're not looking at your logfiles with some sort of aggregator and alerting system, then ignoring your big data is destined to become your biggest mistake – a mistake because you're missing security information, performance data, and user behaviors whose discovery will help you better maintain your systems and your security.
Why is big data so hard to maintain? To slightly change a quote from Douglas Adams' Hitchhiker's Guide to the Galaxy series about the size of space, big data is so hard to maintain because it's big. Really big. You just won't believe how vastly hugely mind-bogglingly big it is, until you try to archive it, retrieve it, or search through it. Maintain your big data, in this case logfiles, with a log aggregator. If you don't want to use a commercial solution like Splunk, you can take a chance on one of the many free aggregators, such as Loggly, which also offers commercial options.
If you've made it this far, you might get the idea that I think you should collect and use those logfiles for something more than an excuse to upgrade to a 10GbE network. You're correct. Those logfiles, once collected into terabytes of "digital exhaust" are actually digital gold for those who care to have a look inside them. And I know no one has the time or the patience to go plowing through even a few gigabytes of logs, but you can set up some automated scripts to capture interesting entries to send to an email distribution group or to an alert console that hopefully someone watches with interest.
Where is your big data? It's all around you. You're collecting it. You're backing it up. You're probably ignoring it. Don't ignore it. Ignoring it could cost your company a lot of money and its reputation. If you don't have a budget sufficient to purchase some decent tools, there's always those junior-level system administrators eager to learn what "real" system administrators do.
Ken Hess * ADMIN Senior Editor
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.