CIF2CML: Adding bond orders and charges
>>>>>>>>>>> To be completed soon <<<<<<<<<<<<<<<<
Once we have the complete connection table for the formula unit of the crystal we can go about adding bond orders and charges to the moieties. This can be difficult to get right if you do not know the charge on the moiety, as there may be more than one correct looking set of bonds that can be assigned, but which correspond to different charges on the molecule. The cif has a data item called
_chemical_formula_moiety
where the authors are meant to give the formula for each moiety and its corresponding charge. Unfortunately, this is often left blank, or empirical formula is sometimes provided instead. Even if the moiety formula is provided, this still is only really a help for organic structures. For metal-complex moieties, the overall charge of the metal plus the organic ligands is usually provided, so we are left not knowing the exact charge on anything.
Thus, this is a very inexact science. In the future we might be able to extract the charge on the metal from the CIFs title or systematic name, but that would take longer than we currently have time for.
- for each moiety in the unit cell,
- add a 'bonded to metal' flag to any atom bonded to a metal,
- remove all metal atoms and their bonds (saving them for readdition later),
- split the moiety minus metal atoms into its new component moieties, for each of these:
- check each of the moieties and markup a set of predefined common molecules with correct bond orders and charges (e.g. SiF62-, CO32-, HPO42-, CO, CN- etc.)
- from the connections of each atom, calculate the pi-electrons available,
- get a list of the pi-systems in the moiety,
- using a set of heuristics we can add charges to certain sets of pi-systems, for instance:
- if there is a pi-system consisting of a ring containing an odd number of atoms, and all these atoms also have the metal flag, then one of them will be negatively charged.
- an O or S atom that has a metal flag, which is also situated next to a ringed pi-system of an even number of atoms can be marked up with a negative charge.
- And so on, though there are further heuristics that we need to put into action.
- with the remaining pi-systems, an iterative algorithm is used to go through the bonds and mark them up as double or triple.
- the last step is to reattach the metal atoms and bonds and remove the 'bonded to metal' flags.
Using this method we are getting almost all (though I have no stats yet) neutral organic molecules correct, most charged organic molecules right and somewhere above 50% of organometallic structures correct. Haven't a clue about the inorganics at present.
