DoubleHelix: nucleic acid sequence identification, assignment and validation tool for cryo-EM and crystal structure models

Nucleic Acids Res. 2023 Aug 25;51(15):8255-8269. doi: 10.1093/nar/gkad553.

Abstract

Sequence assignment is a key step of the model building process in both cryogenic electron microscopy (cryo-EM) and macromolecular crystallography (MX). If the assignment fails, it can result in difficult to identify errors affecting the interpretation of a model. There are many model validation strategies that help experimentalists in this step of protein model building, but they are virtually non-existent for nucleic acids. Here, I present doubleHelix-a comprehensive method for assignment, identification, and validation of nucleic acid sequences in structures determined using cryo-EM and MX. The method combines a neural network classifier of nucleobase identities and a sequence-independent secondary structure assignment approach. I show that the presented method can successfully assist sequence-assignment step in nucleic-acid model building at lower resolutions, where visual map interpretation is very difficult. Moreover, I present examples of sequence assignment errors detected using doubleHelix in cryo-EM and MX structures of ribosomes deposited in the Protein Data Bank, which escaped the scrutiny of available model-validation approaches. The doubleHelix program source code is available under BSD-3 license at https://gitlab.com/gchojnowski/doublehelix.

MeSH terms

  • Cryoelectron Microscopy / methods
  • Crystallography, X-Ray
  • Models, Molecular
  • Nucleic Acids*
  • Protein Conformation
  • Software*

Substances

  • Nucleic Acids