Lead Image © Danila Krylov, 123RF.com

HTML to database with a Perl script

Clean Start

Article from ADMIN 60/2020

By Thomas Valentine

A Perl script strips HTML markup, creating text files, and makes each file an entry in a database.

At work, I was tasked to come up with a solution to a problem presented by a web developer who needed some 1,200 static HTML files stripped of the HTML markup, leaving behind only text. Because 1,200 files introduce too much complexity and are generally cumbersome to keep on a web server, a database-driven approach was necessary. Only a few Perl scripts would be needed to serve the pages. Although 1,200 files is perhaps an attractive approach to a low-tech solution, its simple bulk is unruly. Also, editing the text as database entries would be much simpler, employing only a few more fairly simple-to-implement Perl scripts.

In the solution described in this article, a Perl script strips the HTML markup with a simple regular expression and creates the text files that are put into a database.

The Solution

Initially, an HTML file employing three iframes is created (Listing 1) [1]. Each window is given a name with the name attribute (line 7). A simple target attribute within an anchor element sends the data to the specified iframe. The first iframe, designated menu, is a list of reference categories. The links from the menu iframe are targeted to the iframe below it, list (line 11), which contains the commands for the categories representing the language that was identified in the menu iframe.

Listing 1

HTML iframes

01 <table border="0" cellpadding="0" cellspacing="0" align="center">
02 <tr>
03   <td align="left" valign="top">
04   <table border="0" cellpadding="0"

...

Use Express-Checkout link below to read the full article (PDF).