
Lead Image © Artsem Martysiuk, 123RF.com
An introduction to a special tool for transforming data formats
Data Lego
The process of preparing data is called "extract, transform, and load" (ETL), and specialty tools like Jaspersoft ETL help you carry out these tasks. In this article, I introduce the community version 6.0.1 of Jaspersoft ETL, which is available as a free download [1].
Data Gold
Panning for gold is tedious. Nuggets of precious metal don't simply sit around on a riverbed. Instead, the prospector has to sift through multiple pans of sand and stones, often retrieving just a few flakes for all of the trouble expended. Data is the equivalent of gold in today's world. Given this reality, you must tediously filter huge volumes of data to extract tiny particles of information that have real value.
When looking for gold, a prospector first has to get a fix on the location of a deposit and then get access to it. The same can be said for data: The process of locating a promising source involves procedures like aligning fonts, converting values and data formats, and importing results into databases. These preparatory steps need to be performed before you can effectively pan for gold in the form of data.
Converting Files
One of the easiest exercises for learning the ETL tool involves converting an input file to a different format. A simple example of a text-based book list should suffice for presenting the first practical steps (Listing 1). Columns separated by semicolons have a particular width. The lines are not in a particular order.
Listing 1
Book List (Excerpt)
Year;Number;Author ;Title ;Publisher
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
