Selecting XFEL single-particle snapshots by geometric machine learning

Open Access

1 January 2021

journal article
research article
Published by AIP Publishing in Structural Dynamics

Vol. 8 (1), 014701
https://doi.org/10.1063/4.0000060

Abstract

A promising new route for structural biology is single-particle imaging with an X-ray Free-Electron Laser (XFEL). This method has the advantage that the samples do not require crystallization and can be examined at room temperature. However, high-resolution structures can only be obtained from a sufficiently large number of diffraction patterns of individual molecules, so-called single particles. Here, we present a method that allows for efficient identification of single particles in very large XFEL datasets, operates at low signal levels, and is tolerant to background. This method uses supervised Geometric Machine Learning (GML) to extract low-dimensional feature vectors from a training dataset, fuse test datasets into the feature space of training datasets, and separate the data into binary distributions of "single particles" and "non-single particles." As a proof of principle, we tested simulated and experimental datasets of the Coliphage PR772 virus. We created a training dataset and classified three types of test datasets: First, a noise-free simulated test dataset, which gave near perfect separation. Second, simulated test datasets that were modified to reflect different levels of photon counts and background noise. These modified datasets were used to quantify the predictive limits of our approach. Third, an experimental dataset collected at the Stanford Linear Accelerator Center. The single-particle identification for this experimental dataset was compared with previously published results and it was found that GML covers a wide photon-count range, outperforming other single-particle identification methods. Moreover, a major advantage of GML is its ability to retrieve single particles in the presence of structural variability.

Funding Information

National Science Foundation (STC 1231306)
National Science Foundation (DBI-2029533)
U.S. Department of Energy (DE-SC0002164)

This publication has 49 references indexed in Scilit:

The Coherent X-ray Imaging Data Bank
Nature Methods, 2012
Unsupervised classification of single-particle X-ray diffraction snapshots by spectral clustering
Optics Express, 2011
Systematic determination of order parameters for chain dynamics using diffusion maps
Proceedings of the National Academy of Sciences of the United States of America, 2010
Mapping the conformations of biological assemblies
New Journal of Physics, 2010
Reconstruction algorithm for single-particle diffraction imaging experiments
Physical Review E, 2009
Gas dynamic virtual nozzle for generation of microscopic droplet streams
Journal of Physics D: Applied Physics, 2008
Single Particle X-ray Diffractive Imaging
Nano Letters, 2007
Maximum-likelihood Multi-reference Refinement for Electron Microscopy Images
Journal of Molecular Biology, 2005
X-ray image reconstruction from a diffraction pattern alone
Physical Review B, 2003
Phase retrieval algorithms: a comparison
Applied Optics, 1982

Cited by 6 articles