Department of Chemistry

Murray-Rust Research Group



Jumbo, XML and other distributions

This page is a summary of our current architecture for computing to be presented and released at the [ NeSC meeting on Computational chemistry and physics]. See Cml At Nesc for more info and download.


This article describes the architecture and components which have been deployed to support high-throughput computing of computational chemistry and physics in the areas of molecules, crystals, substances and their properties. It is designed with portability in mind (platforms, languages) and could be extended to other domains. It concentrates on the abstraction of the computation at the semantic and ontological level and promotes a components based approach to computation. It is based on XML and relevant W3C protocols and is designed to be used in a Grid environment. All components are open Source but it can also be used to wrap closed source and commercial products.


Software is all intended to be distributed in open fashion, but not all are mounted publicly. There is a crude grading system for software status and robustness.

  • +++ distributed and in use elsewhere
  • ++ works on friend's machine
  • + works for me
  • . vapourware

JUMBO Schema

(+++) These tools allow the construction of bespoke XML schemas from a set of schema components. The default set (CMLComp) covers much of computational chemistry but can be extended through XML namespaces. Since most XML documents in physical science are used for computation there is a sophisticated code generator for creating DOMs and related code. This is designed to produce DOMs in several languages (Java, C++, Python, F90) so that the resultant code can interoperate with mainstream programs. JUMBO will automatically create

  • A schema
  • Its documentation in PDF, HTML and Wiki
  • A set of examples, validated against the schema
  • Java (C++, Python, F90) code
  • jars and javadoc
  • test programs for the examples

JUMBO Tools and Legacy Converters

(+++) In many cases the functionality of the DOM is limited (get/set/append/remove) and richer operations are required. Examples could be:

The Tools are a method of adding this without recompiling the DOM

In many domains there are standard legacy file formats. We provide converters for molecules and reactions, etc

  • Mol2CML


JUMBO Marker

(++) For closed codes it is necessary to write output converters ("parsers"). JUMBO Marker is a generic technology for doing this, based on regular expressions and yacc-like structuring of the document


(+++) A handcrafted DOM for CML in C++. Allows CML to be distributed as Windows DLL, or Unix *.so


(++) A set of tools in Java for converting the crystallographic CIF format to XML.


(+++) Specs and tools for creating a viewing an RSS feed containing scientific components (molecules, spectra, etc.). Include Jmol and JChempaint and RSSViewer from SF


(+++) A demonstrator of storage of molecules and properties using the Apache Xindice database. Includes:

  • Xinidce DB from Apache
  • Tomcat server
  • JChempaint editor/viewer for entry of search
  • Pre-constructed datbase of molecules and properties indexed on IChI.

(Note: the IUPAC IChI converter is freely available from NIST but we cannot yet distribute it. However we run an IChI server on our site).


(+++) A set of F90 library routines for CML/XML for incorporation into computational chemistry and physics programs. These include:

  • F90 output library for scalars, arrays and matrices and molecular components
  • sample code
  • sample dictionaries to demonstrate dict Ref


(+) A DOM for XML-CML in F90. Jon Wakelin is building the basic F90 infrastructure and then we shall autogenerate this from schemas


(+++) A stylesheet library for rendering XML documents containing CML components. In principle every element in the schema will have a stylesheet for default rendering or processing.


(+) Glueware used to create input to MOPAC from CML. Mainly as an example of how to code glueware. Mainly for demos and pedagogy. Probably requires editing for porting



  • CML schema for dictionaries
  • Dictionaries for
    • GULP
    • MOPAC
    • CML
    • scientific units

other opensource

  • Xindice
  • tomcat
  • ant
  • xerces
  • xalan
  • fop
  • saxon
  • batik

other chemical opensource

We shall probably include:

These will be specific versions rather than bleeding-edge CVS

Distribution format

For each chapter we need the following sections:

  • overview of what the software does
  • installation instructions
  • instructions on how to run a demo
  • example files
  • known bugs and problems/limitations
  • authorship and ?support?