HTML to database with a Perl script
Clean Start
At work, I was tasked to come up with a solution to a problem presented by a web developer who needed some 1,200 static HTML files stripped of the HTML markup, leaving behind only text. Because 1,200 files introduce too much complexity and are generally cumbersome to keep on a web server, a database-driven approach was necessary. Only a few Perl scripts would be needed to serve the pages. Although 1,200 files is perhaps an attractive approach to a low-tech solution, its simple bulk is unruly. Also, editing the text as database entries would be much simpler, employing only a few more fairly simple-to-implement Perl scripts.
In the solution described in this article, a Perl script strips the HTML markup with a simple regular expression and creates the text files that are put into a database.
The Solution
Initially, an HTML file employing three iframes is created (Listing 1) [1]. Each window is given a name with the name
attribute (line 7). A simple target attribute within an anchor element sends the data to the specified iframe. The first iframe, designated menu
, is a list of reference categories. The links from the menu
iframe are targeted to the iframe below it, list
(line 11), which contains the commands for the categories representing the language that was identified in the menu
iframe.
Listing 1
HTML iframes
01 <table border="0" cellpadding="0" cellspacing="0" align="center"> 02 <tr> 03 <td align="left" valign="top"> 04 <table border="0" cellpadding="0"
Buy this article as PDF
(incl. VAT)