Department of Chemistry

Murray-Rust Research Group

Green Chain Reaction

The original meeting website with all the details can be found at

Reproduced below is a brief background summary and, most importantly, the instructions on how YOU can get involved!

Why is this so important?

Chemical synthesis is an essential part of the modern world. But traditional methods are becoming increasingly unacceptable because:

  • the processes are hazardous (explosion, toxicity)
  • they consume scarce resources (metals, petrochemicals, etc.)
  • the by-products (unwanted materials, which are often discharged to the environment) are hazardous (toxic, etc.)
  • they are energy-intensive

How it works...

The overall workplan is as follows:

  • extract documents from the Open literature or contributed documents
  • locate and extract the paragraphs containing chemical reactions (experiment.xml)
  • parse the paragraphs using Lezan Hawizy's ChemicalTagger to give POS an Chemistry-based trees (chemicalTagger.xml)
  • add CML chemical semantics to these (chemicalTreeBank.xml)
  • extract the reaction data (not yet written)
  • process this to RDF (ditto)
  • ingest into triple store
  • query the triple store to answer green questions
  • present results

Instructions for running the patent analysis

What is going on here?

What you are going to do is download a small program that runs in Java. You almost certainly have java installed on your computer if you have a web browser. The program reads an instruction file which tells it how to read through a list of patents that relate to chemistry. You will also need to download these two files and instructions are given below.

Why would I want to do this?

This project is attempting to ask a question by getting computers to "read" as many patents as possible from the recent to the quite old. The question we are asking is "Is chemistry becoming more green in the processes and reagents that it uses?" To do this work we are asking volunteers to become involved by contributing their computing resources to help read the patents. No knowledge of chemistry is necessary!

More generally we are trying to demonstrate the feasibility of collecting information from across a wide range of documents that relate to science to ask wider questions. The results of this work will be presented at Science Online London 2010 in a few weeks time.

Sounds great! How do I do it?

Prerequisites: Java

N.B. Please always use the code from Hudson...

Latest instructions for the computationally confident:

1. Download latest jar from$patent-analysis/patent-analysis-0.0.1-jar-with-dependencies.jar

2. Download into anywhere convenient (let's call this <yourDir>)

3. Download to anywhere convenient (<yourDir>)

4. Create a sub-directory/folder of <yourDir> named e.g. patentData where the index is and where the results will come

5. Download a random patent catalogue (though pre-1990 may be lacking Chemistry patents) from into the patentData folder

6. Run "java -Xmx512m -jar patent-analysis-0.0.1-jar-with-dependencies.jar -p <yourDir>/parsePatent.xml -d <patentData>"

7. Then run "java -Xmx512m -jar patent-analysis-0.0.1-jar-with-dependencies.jar -p <yourDir>/uploadWeek.xml -d <patentData>" to upload the results