Stigmata: An Algorithm To Determine Structural Commonalities in Diverse Datasets

Abstract
An algorithm, Stigmata, is described, which extracts structural commonalities from chemical datasets. It is discussed using several illustrative examples and a pharmaceutically interesting set of dopamine D2 agonists. The commonalities are determined using two-dimensional topological chemical descriptions and are incorporated into the key feature of the algorithm, the modal fingerprint. Flexibility is built into the algorithm by means of a user-defined threshold value, which affects the information content of the modal fingerprint. The use of the modal fingerprint as a diversity assessment tool, as a database similarity query, and as a basis for color mapping the determined commonalities back onto the chemical structures is demonstrated.