Sequence Viewer: FASTA to GFF - Interactively pretty printing a Protein / DNA Sequence

This blog post discusses the implementation of an arbitrary character Sequence viewer implemented in essentially one line of JavaScript code.

Introduction:
So you have a long linear character sequence of nucleotides or aminoacids -...who doesn't these days, right? ;)  An established norm for printing Protein Sequences is by splitting the sequence into blocks of ten, with several such blocks per line and sequence position numbers at the start of each new line. This formatting is for instance specified in the GCG file format.
In a web-application scenario hundreds of sequences may be retrieved from a database to be rendered at once. CSS styling of individual sequence letters would give leeway towards great outline-flexibility, but at the cost of many HTML-DOM Elements and thus Browser DOM resource usage. In other words, each HTML Element entails a comprehensive set of methods and attributes, which have to be monitored for changes in order to enable the live-update capabilities that are customary to web-pages. The more Elements added, to a given Browser Document View, the less leeway is afforded within the scope of a given project later on during web-application development.
It would be ideal to update the sequence-view on-the-fly and only outline those sequences which are currently in-view or upon user interaction, such as the mouse hovering over the sequence at least once. A similar approach is seen in todays Syntax highlighters, which perform color-outlining on-the-fly of only those code-keywords which are currently in-view. For instance, the source-code Viewer of Google's Browser Chrome  highlights only screened code-lines.

A simple implementation to faciliate on-the-fly color outlining of sequence data, is provided below, along with code-comments. Interactivity is provided through the DOM-Model.

Showing the sequence viewer with an custom CSS class applied
The entire implementation is written in one essential line of JavaScript code. First a provided raw sequence seq is split after 10 arbitrary characters, empty elements are filtered, the elements processed wherein every sixth element is concatenated with a space-padded sequence-position-number, to be finally embedded in HTML tags. Newline characters are regarded by setting the css-property word-wrap to pre in a HTML-div container.
seq.split(/(.{10})/g).filter(Boolean).map(function(e,i){return(i%6?'':'\n'+'    '.slice(0,3-(''+((i*10)+1)).length)+(i*10+1)+' ')+e.replace(/(.{1})/g,'<b class="\$1">\$1</b>')}).join(' ')


Demo:



 All that is required to create a new Sequence Viewer instance, is providing the raw sequence-data as the first parameter and the element's name in Selector-Syntax as the optional second argument to seq2gff, as follows (- at the end of the HTML Document!):
var myseq2 = new seq2gff("MDCLQMVFKLFPNWKREAEVKKLVAGYKVHGDPFSTNTRRVLAVLHEKRLSYEPITVKLQTGEHKTEPFLSLNPFGQ", "#myseq-Q84TK0");

Upon hovering the mouse over the Sequence, the Sequence will by dynamically outlined. All outlined attributes can be styled via CSS. Each Letter is assigned a CSS-class for instance the classes .D,.E {...} decorate the negative Amino Acids. Two CSS-classes may differentiate the Purines from Pyrimidines. The CSS-class .sequence {...} is applied to all sequence-Views.

At some point or upon certain (DOM) events it may be useful to clear all markup-code (i.e. color formatting) to free Browser resources. A convenient way of achieving this goal is through DOM-querySelectors and forEach applied on Nodelists, as follows:
//get all DOM Element's with the attribute id starting with the letters 'myseq'
 var nlist = document.querySelectorAll('[id^=myseq]')
For more Information see the following Code Comments of the Implementation.

Implementation - Code:
  PS: As a small plea, please share your css-styles, in the comments section via Disqus or on the code-gist page. / Adjacent, nonchanging letters may be adjoined in one HTML tag via a regular expression and back-references, but at the loss of some flexibility in terms of dynamic effects.
  Why? The mini-project came about as a demonstration of splitting strings at constant length in JavaScript, Python, R or any language with a PREG Regular Expression engine.

LihatTutupKomentar