An introduction to a special tool for transforming data formats

Data Lego

The process of preparing data is called "extract, transform, and load" (ETL), and specialty tools like Jaspersoft ETL help you carry out these tasks. In this article, I introduce the community version 6.0.1 of Jaspersoft ETL, which is available as a free download [1].

Data Gold

Panning for gold is tedious. Nuggets of precious metal don't simply sit around on a riverbed. Instead, the prospector has to sift through multiple pans of sand and stones, often retrieving just a few flakes for all of the trouble expended. Data is the equivalent of gold in today's world. Given this reality, you must tediously filter huge volumes of data to extract tiny particles of information that have real value.

When looking for gold, a prospector first has to get a fix on the location of a deposit and then get access to it. The same can be said for data: The process of locating a promising source involves procedures like aligning fonts, converting values and data formats, and importing results into databases. These preparatory steps need to be performed before you can effectively pan for gold in the form of data.

Converting Files

One of the easiest exercises for learning the ETL tool involves converting an input file to a different format. A simple example of a text-based book list should suffice for presenting the first practical steps (Listing 1). Columns separated by semicolons have a particular width. The lines are not in a particular order.

Listing 1

Book List (Excerpt)

Year;Number;Author                        ;Title                                             ;Publisher         
...
Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Creating reports with JasperReports
    The JasperReports report generator uses data from a variety of sources and outputs the results to a printer, a screen, or a file in several export formats.
  • Angular 2 client-side TypeScript framework
    Angular 2 features TypeScript instead of JavaScript, JiT and AoT compilation, and the consistent use of components.
  • Workflow-based data analysis with KNIME
    They say data is "the new oil," but all that data you collect is only valuable if it leads to new insights. An open source analysis tool called KNIME lets you analyze data through graphical workflows – without the need for programming or complex spreadsheet manipulation.
  • Mailman 3.0 – a long time coming
    Mailman 3.0 is a new major version, released 15 years after version 2.0. We put the new version through its paces and explain the installation procedure and new features.
  • Discover ransomware with PowerShell
    Simple backup strategies cannot protect files encrypted by ransomware, because they can be affected as well. A PowerShell script can ensure that your files are okay before sending them to backup.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=