Jumbo
Jumbo, XML and other distributions
This page is a summary of our current architecture for computing to be presented and released at the [ http://www.nesc.ac.uk/esi/events/394/ NeSC meeting on Computational chemistry and physics]. See Cml At Nesc for more info and download.
Overview
This article describes the architecture and components which have been deployed to support high-throughput computing of computational chemistry and physics in the areas of molecules, crystals, substances and their properties. It is designed with portability in mind (platforms, languages) and could be extended to other domains. It concentrates on the abstraction of the computation at the semantic and ontological level and promotes a components based approach to computation. It is based on XML and relevant W3C protocols and is designed to be used in a Grid environment. All components are open Source but it can also be used to wrap closed source and commercial products.
Software
Software is all intended to be distributed in open fashion, but not all are mounted publicly. There is a crude grading system for software status and robustness.
- +++ distributed and in use elsewhere
- ++ works on friend's machine
- + works for me
- . vapourware
JUMBO Schema
(+++) These tools allow the construction of bespoke XML schemas from a set of schema components. The default set (CMLComp) covers much of computational chemistry but can be extended through XML namespaces. Since most XML documents in physical science are used for computation there is a sophisticated code generator for creating DOMs and related code. This is designed to produce DOMs in several languages (Java, C++, Python, F90) so that the resultant code can interoperate with mainstream programs. JUMBO will automatically create
- A schema
- Its documentation in PDF, HTML and Wiki
- A set of examples, validated against the schema
- Java (C++, Python, F90) code
- jars and javadoc
- test programs for the examples
JUMBO Tools and Legacy Converters
(+++) In many cases the functionality of the DOM is limited (get/set/append/remove) and richer operations are required. Examples could be:
- molecule.get Mol Weight()
- eigen.diagonalize()
- crystal.get Reciprocal Cell()
The Tools are a method of adding this without recompiling the DOM
In many domains there are standard legacy file formats. We provide converters for molecules and reactions, etc
- Mol2CML
- SDF2CML
- RXN2CML
etc.
JUMBO Marker
(++) For closed codes it is necessary to write output converters ("parsers"). JUMBO Marker is a generic technology for doing this, based on regular expressions and yacc-like structuring of the document
CML++
(+++) A handcrafted DOM for CML in C++. Allows CML to be distributed as Windows DLL, or Unix *.so
JUMBO-CIF
(++) A set of tools in Java for converting the crystallographic CIF format to XML.
CMLRSS
(+++) Specs and tools for creating a viewing an RSS feed containing scientific components (molecules, spectra, etc.). Include Jmol and JChempaint and RSSViewer from SF
XindiceCML
(+++) A demonstrator of storage of molecules and properties using the Apache Xindice database. Includes:
- Xinidce DB from Apache
- Tomcat server
- JChempaint editor/viewer for entry of search
- Pre-constructed datbase of molecules and properties indexed on IChI.
(Note: the IUPAC IChI converter is freely available from NIST but we cannot yet distribute it. However we run an IChI server on our site).
F90-LIB
(+++) A set of F90 library routines for CML/XML for incorporation into computational chemistry and physics programs. These include:
- F90 output library for scalars, arrays and matrices and molecular components
- sample code
- sample dictionaries to demonstrate dict Ref
F90-DOM
(+) A DOM for XML-CML in F90. Jon Wakelin is building the basic F90 infrastructure and then we shall autogenerate this from schemas
JUMBO-XSL
(+++) A stylesheet library for rendering XML documents containing CML components. In principle every element in the schema will have a stylesheet for default rendering or processing.
Demonstrators
(+) Glueware used to create input to MOPAC from CML. Mainly as an example of how to code glueware. Mainly for demos and pedagogy. Probably requires editing for porting
Dictionaries
(++)
- CML schema for dictionaries
- Dictionaries for
- GULP
- MOPAC
- CML
- scientific units
other opensource
- Xindice
- tomcat
- ant
- xerces
- xalan
- fop
- saxon
- batik
other chemical opensource
We shall probably include:
- Jmol
- JChempaint
- CDK
- Open Babel
- JOELib
These will be specific versions rather than bleeding-edge CVS
Distribution format
For each chapter we need the following sections:
- overview of what the software does
- installation instructions
- instructions on how to run a demo
- example files
- known bugs and problems/limitations
- authorship and ?support?
Discussion
???
