RegEx Text Import w. MySQL Query Browser - ad ETL methods

Introduction:
MySQL Query Browser is a no longer supported cross-platform SQL Toolkit based on TrollTech's QT-Platform. Significantly it allows many automated or semi-automated routines for creating, running and editing various MySQL resources such as Views, Functions, Procedures, Triggers among others and allows syntax-highlighted editing of multi-line SQL statements. An undocumented feature is RegEx-Text Import, which is one of several useful additions to performing ETL (Extract Transform Load) tasks of naive (or slightly pre-processed) data.


The undocumented RegEx Text Import Feature of the MySQL Query Browser for running simple ETL tasks.
Doing a quick google search turned up a range of user comments such as 'never-fully implemented', 'non-functional' to 'being a future implementation', which prompted me to write a short commentary/post.

Like most aspects within the Query Browser UI, the RegEx Text Import assumes a Drag-and-Drop user interaction. RegEx adheres to MySQL string conventions, which will make it necessary to escape RegEx syntax with triple-slashes. For instance:

This simple SQL query performs a query that searches for compounds with an α-amino-acid  function :
SELECT DISTINCT a.Compound_common_name FROMaracyc_compounds a WHERE a.Smiles REGEXP '(C\\\(.?N.{0,2}\\\)\\\(?C\\\(=O\\\).?O.{0,2}\\\)?)';

Methods:

The first order of business is thus loading the source file containing the naive data and adding a RegEx, - which evaluates on the fly (- a green text underneath the RegEx textbox literally gives you the green light). You can further subdivide your RegEx by right clicking on the match and select 'Create Nested Expression'. At any time you can remove a RegEx by right clicking on it and selecting 'Remove RegEx'.
Underneath the source-data textfield, is a textfield containing INSERT INTO.. serving as the text-transform template. Simply drag and drop your desired matched into the corresponding field (e.g. $1, $2) - and in case of the match being a string, double quote accordingly - unless your string contains double quotes, in which case single quotes or encapsulating the entire expression in an  MySQL String escape function will do the job.

Notably the RegEx match variable is close to most RegEx implementations, as demonstrated in the following JavaScript example:

var src = "\ 00010 Glycolysis / Gluconeogenesis\n\ 00020 Citrate cycle (TCA cycle)\n\ 00030 Pentose phosphate pathway\n\ ";


//heed the multiline-flag!

src.match(/(\d{5})\t(.*?)$/im)
["00010 Glycolysis / Gluconeogenesis", "00010", "Glycolysis / Gluconeogenesis"]



RegExp.$1
"00010"
RegExp.$2
"Glycolysis / Gluconeogenesis"

You can find more ETL related resources and script on my gist account. Be advised that this is but one of many ETL methods I employ, depending on the scenario and the nature of the data. I intend to provide a comparison in a future post.
LihatTutupKomentar