Golem is a lightweight dictionary/ontology language, primarily, but not exclusively, designed to be used with CML, the Chemical Markup Language. pyGolem is its supporting toolkit, written in Python and available as Free Software. Together, the language and toolkit help scientists use, and write, tools for processing scientific data by reference to the concepts found therein, rather than having to fight with the formats and syntax the data happens to be serialized in.

The problem which Golem addresses is this; every code, or resource, which uses CML has a subtly different set of concepts it is trying to represent, and will as a result use CML syntax slightly differently. These differences are encapsulated in Golem/CML Dictionaries, which specify the concepts and syntax particular to a given domain of CML usage.

We have developed dictionaries for many CML-emitting codes and resources. The codes include CASTEP, SIESTA, MOPAC and GULP; we also have a dictionary for the CrystalEye crystallographic structure database. However, Golem also includes tools to make it straightforward to develop new dictionaries for new CML dialects.


  • Representing, indexing and mining scientific data using XML and RDF: Golem and CrystalEye, A. D Walkingshaw, Toby O. H. White, N. E. Day, O. J. Downing, P. M. Murray-Rust, Proceedings of XTech2008, PDF link title]

Invited presentations:

  • COST D37 Working Group (CCWF), Rome, October 2007