E2EDNA 2.0: Python Pipeline for Simulating DNA Aptamers with Ligands

DNA aptamers are short sequences of single-stranded DNA with untapped potential in molecular medicine, drug design, and materials design due to their strong and selective and most importantly tunable binding affinity to target molecules (Tucker et al., 2012; Zhou & Rossi, 2017). For instance, DNA aptamers can be used as therapeutics (Corey et al., 2021) for a wide range of diseases such as epilepsy (Zamay et al., 2020) and cancer (Morita et al., 2018). They can also be used to detect a wide variety of molecular ligands, including antibiotics (Mehlhorn et al., 2018), neurotransmitters (Sinha & Das Mukhopadhyay, 2020), metals (Qu et al., 2016), proteins (Kirby et al., 2004), nucleotides (Shen et al., 2007) and metabolites (Dale, 2021; Dauphin-Ducharme et al., 2022) in real time, even in harsh environments (McConnell et al., 2020).


Summary
DNA aptamers are short sequences of single-stranded DNA with untapped potential in molecular medicine, drug design, and materials design due to their strong and selective and most importantly tunable binding affinity to target molecules (Tucker et al., 2012;Zhou & Rossi, 2017). For instance, DNA aptamers can be used as therapeutics (Corey et al., 2021) for a wide range of diseases such as epilepsy (Zamay et al., 2020) and cancer (Morita et al., 2018). They can also be used to detect a wide variety of molecular ligands, including antibiotics (Mehlhorn et al., 2018), neurotransmitters (Sinha & Das Mukhopadhyay, 2020), metals (Qu et al., 2016), proteins (Kirby et al., 2004), nucleotides (Shen et al., 2007) and metabolites (Dale, 2021;Dauphin-Ducharme et al., 2022) in real time, even in harsh environments (McConnell et al., 2020).
We present E2EDNA 2.0: End-2-End DNA 2.0, a Python simulation pipeline that offers a unified and automated solution to computational modeling of DNA aptamers with molecular ligands. It is broadly aimed at researchers developing therapeutics and sensors based on DNA aptamers who require detailed atomistic information on the behavior of aptamers and ligands in realistic media. Similar to its predecessor E2EDNA (Kilgour et al., 2021), E2EDNA 2.0 predicts DNA aptamers' secondary and tertiary structures, and if a ligand is present, the configuration of the solvated aptamer-ligand complex.

Statement of Need
With E2EDNA 2.0 our goal is to create a Python-interfacing simulation package for singlestranded DNA with small ligands that is easy to install and use in Python-based workflows. The pipeline is automated, yet flexible, taking us from DNA aptamer sequence to folded aptamer, and aptamer-ligand complex. Currently, available software packages designed for computationally studying DNA aptamers predominantly focus on partial feature analyses of RNA and DNA aptamers. For example, APTANI (Caroli et al., 2016) and APTANI2 (Caroli et al., 2020), commonly used by the aptamer community, select potentially relevant aptamers from SELEX (Systematic Evolution of Ligands by EXponential enrichment) (McKeague & DeRosa, 2014;Tuerk & Gold, 1990) experimental data sets through a sequence-structure analysis, and AEGIS, a platform equipped with a generative deep learning model to propose novel aptamer sequences (Biondi & Benner, 2018). These approaches are aimed at fast black-box analysis of large numbers of sequences. A detailed analysis of a small number of high-promise candidate sequences is often required but no automated and easy-to-use simulation package exists in the computational space aside from E2EDNA (Kilgour et al., 2021) and E2EDNA 2.0, with the latter preferable for ease of installation and user-friendly implementations of eight modes of simulation.
The gap that the E2EDNA family of programs addresses is the absence of a non-black-box one-stop-shop aptamer simulation package, which is capable of providing in silico predictions of 2D structure, 3D structure, and aptamer-ligand binding simulation while keeping methodology flexible and analysis open-ended. E2EDNA 2.0 achieves this in two key ways: a Pythoninterfacing implementation that makes installation and access easier for users than the original E2EDNA package, and the automation of the dozens of tasks required in setting up, running, and interpreting atomistic aptamer-ligand simulations.

Components and Features
As shown in Figure 1, the complete simulation pipeline in E2EDNA 2.0 consists of the following main steps: 1. Secondary structure prediction, 2. Tertiary structure prediction, 3. Molecular dynamics simulation, and 4. Aptamer-ligand docking. The key inputs into E2EDNA 2.0 are the DNA sequence in the FASTA format, the structure of the ligand in the pdb format (optional), and the choice of simulation mode. The output includes secondary structure in dot-bracket notation, tertiary structures in pdb format of free aptamer and aptamer-ligand complex (optional), and simulation trajectories in the dcd format. It is worth noting that there are other parameters besides "key inputs" that can be customized, such as solvent ionic strength. Detailed analysis of the generated trajectories is not performed by E2EDNA 2.0, though is straightforward to set up for a particular workflow using built-in or user-specified functions.
Next we briefly discuss the external software packages engaged in the E2EDNA 2.0 pipeline. For developers we point out that the pipeline is modular and any of these packages may be drop-in replaced with any equivalent or competing software. The first module, NUPACK (Zadeh et al., 2011), generates a predicted secondary structure given DNA FASTA sequence, temperature, and ionic strength. It can output explicit probability of observing the most likely secondary structure for a given sequence, as well as suboptimal structures and their probabilities. The second module, MacroMoleculeBuilder (MMB) (Flores et al., 2011) is a multifunctional software from simTK that allows for rapid directed folding of oligonucleotides and peptides via straightforward inputs on various platforms. MMB initializes a given ssDNA sequence in a single-helix configuration, and folds it according to user-specified base-pairing conditions via simulation with ficticious forces that pull the respective bases together. E2EDNA 2.0 includes scripts that automatically take in secondary structure instructions from NUPACK in the form of a list of paired bases, and generate MMB command files accordingly. The MMB outputs, as initial structures, are then passed to MD simulation for relaxation. The third module, OpenMM (Eastman et al., 2017), is the molecular dynamics engine powering E2EDNA 2.0. OpenMM is used to sample 3D structures of both the 'free' aptamer and its aptamer-target complex. The representative DNA aptamer structure is chosen from the MD trajectory via a principal component analysis on backbone dihedrals: a free energy is constructed in the space of top-ranked principal components (5) as reaction coordinates and the lowest free energy structure is passed on to the docking with ligand step. The fourth module, LightDock (Jiménez-García et al., 2018;Roel-Touris et al., 2020), automates the docking between a free DNA aptamer and a given target ligand. LightDock uses a glowworm swarm algorithm; we compute the number of swarms required as being proportional to the approximate surface area of the aptamer and use the best-scored glowworm as the docked complex structure.
We conclude the presentation of E2EDNA 2.0 by listing the available simulation modes. We refer the reader to the documentation for more details on the modes, as well as for installation and running the simulation instructions.