Department of Chemistry

Murray-Rust Research Group


OPSIN (Open Parser for Structural IUPAC Nomenclature) is a project to create a parser to convert IUPAC chemical names to connection tables and hence to CML, InChI or other formats. This is achieved using finite state parsing to tokenise chemical names and assign meanings to the tokens. Currently the majority of organic chemical names are supported. Other areas of nomenclature have thus far not been addressed in any level of detail (e.g. inorganic nomenclature, carbohydrate nomenclature etc.). An example of a moderately complicated chemical name that can be interpreted by OPSIN is shown below (the 2D diagram is generated by another program from OPSIN's output):

Opsin example1.png

(1R,2R,3R,4S)-11-diazo-2,3,4,9-tetrahydroxy-2-methyl-5,10-dioxo-2,3,4,5,10,11-hexahydro-1H-benzo[b]fluoren-1-yl acetate

OPSIN is employed by Oscar3 as a means of getting structural information from IUPAC names that have been identified in text although it can also be used a standalone application. OPSIN is believed to differ from other commercial solutions to this problem in that it is more accurate and more readily extensible; resources are kept separate from the code. Future work is likely to be in the areas of organic chemistry nomenclature that are still not fully supported and are believed to be of importance.

Demo page

Additional Information

Source Code and Binary Downloads