SOAP - WSDL: KEGG Error - Not an Array reference

SOAP - WSDL is a powerful but often dated form of transmitting data types over HTTP. It is still popular in enterprise and academic settings, counterpart to the increasingly popular JSON RPC. Since SOAP is an XML based language the overhead incurred by validating, parsing and encapsulating data within an XML-like format is far greater than the overhead which results from much simpler JavaScript Object Notation (JSON). Since JSON already features a highly isomorphic data structure to script- and database -based data objects, the overhead in parsing has less of an impact as opposed to XML objects. Additionally the format is much more flexible, but can just as easily be validated against a standard template.

KEGG, the Kyoto Encyclopedia of Genes and Genomes, is one of the most widely used biological databases in the world which stores and links hierarchical information ranging from genes, functional proteins to metabolites and entire pathways. KEGG 's fame is owed to its comprehensive biochemical pathways datasets, and by providing open, unimpeded direct access to its database resources. After all the SOAP interface is now a decade old.

KEGG holds many samples (tested with SOAPpy - which I ported over to version > 2.7 of Python. A port for version 3.x of python will eventually be released.)

Kegg's rich API reference is not skimpy with SOAP-python/perl examples: http://www.genome.jp/kegg/docs/keggapi_manual.html

Most of the examples provided, work as intended as long as the passed parameters are primitive types like boolean or string. However the treatment of object and arrays, which are nested types differ in various XML Schema specifications.

The following remote function from the KEGG server, will demonstrate the treatment of complex types:


PHP script initially used (error handling not shown):
Wenn using PHP Soap .dll/.so extension the remote pearl script of the KEGG-SOAP server will throw an error.




on the left python with SOAPpy - exactly as featured on the KEGG API reference
on the right: PHP with SOAP extension, showing that the SOAP client correctly  encapsulated the data as defined in the WSDL file for the function get_pathways_by_compounds.

With the XMLSchema set to the 2001 specification:
<xsd:schema targetnamespace="SOAP/KEGG" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
the PHP SOAP Client will return the following response:
  Uncaught SoapFault exception: [SOAP-ENV:Server] Can't use string ("cpd:C00033") as an ARRAY ref while "strict refs" in use at /usr/local/WWW/pub/kegg/soap/private/v6.2/SOAP/KEGG_PATHWAY.pm line 184

When the PHP SOAP Client is set to the 1999 XMLSCHEMA definition (xsd):
<xsd:schema targetnamespace="SOAP/KEGG" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
the PHP SOAP Client will return the following response:
Uncaught SoapFault exception: [SOAP-ENV:Server] Not an ARRAY reference at /usr/local/WWW/pub/kegg/soap/private/v6.2/SOAP/KEGG_PATHWAY.pm line 184.

When the XML SOAP is set to SOAP 1.2 the PHP script quits with a Fatal error: Uncaught SoapFault exception: [SOAP-ENV:Client] Content-Type must be 'text/xml' instead of 'application/soap+xml' in...

The transmission request for SOAP 1.2 will then look like this:


Testing with several SOAP applications didn't work either, when a simple array was passed to get_pathways_by_compounds. Testing with the open source php-library nusoap, provided a crucial hint: an empty XML array, even though an array containing two elements was passed to the function get_pathways_by_compounds on different implementations of SOAP Clients.



Showing the nusphere SOAP request. The clue-line is highlighted. (wireshark 1.6)


This suggested, in light of the initial investigation, that the first array gets reduced to a SOAP Array Container (as it should). However the pearl implementation on the SOAP endpoint, seems to require another layer of XML-nesting for the parameters passed to the remote function. Indeed, by merely nesting the actual array object in another array within the PHP script a response similar to that of SOAPpy is sent, around which the remote perl script is built.
Then again if the KEGG-implementation were open source, a much larger development community would have established around the project. Additionally looking at at the script running at the SOAP endpoint, which triggered the error in the first place, would have let me directly understand the error in the outlined context. The solution:

Here is a cheat-sheet for the XMLSchema 2001


Conclusion: Neither the perl scripts at the SOAP endpoint nor the featured SOAP libraries described in the KEGG SOAP reference as compatible, have been maintained for quite some time.The narrow focus of supported SOAP libraries and lack of open-source access to the project's functional implementation allowed this faulty behavior to persevere for almost a decade. The WSDL specification declares its own data type 'ArrayOfString' for the function, which gave a clear picture after investigation of the HTTP traffic, that the PHP SOAP extension was correct (as it should, given that it is a plugin which is distributed with the PHP package).

Apparently, some people struggled before me..
LihatTutupKomentar