buildH: Build hydrogen atoms from united-atom molecular dynamics of lipids and calculate the order parameters

1 CNRS, Université de Paris, UPR 9080, Laboratoire de Biochimie Théorique, 13 Rue Pierre et Marie Curie, F-75005 Paris, France 2 Institut de Biologie Physico-Chimique–Fondation Edmond de Rothschild, PSL Research University, Paris, France 3 Laboratoire Coopératif “Lipotoxicity and Channelopathies ConicMeds,” Université de Poitiers, F-86000 Poitiers, France 4 Université de Paris, CNRS, Institut Jacques Monod, F-75006, Paris, France 5 Sorbonne Université, Ecole Normale Supérieure, PSL Research University, CNRS, Laboratoire des Biomolécules (LBM), F-75005 Paris, France 6 Université de Paris, UFR Sciences du Vivant, F-75013 Paris, France DOI: 10.21105/joss.03521


Background
Molecular dynamics (MD) simulations of lipids are widely used to understand the complex structure and dynamics of biological or model membranes (Feller, 2000;Lyubartsev & Rabinovich, 2011;Tieleman et al., 1997). They are very complementary to biophysical experiments and have thus become important to get insights at the microscopic scale. Many of them are performed using all-atom (AA) or united-atom (UA) representations. In AA force fields (such as CHARMM36 (Klauda et al., 2010)), all the atoms are considered whereas in UA force fields (such as Berger (Berger et al., 1997)) the aliphatic hydrogen atoms (Hs) are merged with their parent carbon into a larger particle representing a CH, CH2 or CH3 (e.g. a methyl group is represented by a single CH3 particle). In simulations of phospholipids, the use of UA representations allows one to reduce the number of particles that are simulated to almost a third of the true number of atoms. This is because lipid molecules contain many aliphatic Hs. This simplification thus reduces the computational cost without losing important chemical details.
MD simulations of lipids are usually validated against experimental data (Klauda et al., 2010) or used to help interpret experiments (Feller, 2007). One type of experiment which is often used for that is 2 H NMR. In this type of experiment, aliphatic Hs are replaced by deuterons. 2 H NMR allows one to measure the order parameter of a given C-H bond (where the H is replaced by a deuteron): where θ is the angle between the C-H bond and the magnetic field. The symbol ⟨...⟩ means averaging over time and molecules.
This order parameter is useful because it is directly related to the flexibility of the given C-H bond. It describes the amount of possible orientations visited by the C-H bond, ranges between -0.5 and 1 (inclusive), and is unitless. Values close to 0 indicate high mobility, while values with higher magnitudes (either positive or negative) indicate that the C-H bond is less mobile. Although it varies theoretically between -0.5 and 1, its absolute value |S CH | is often reported because its sign is usually difficult to measure experimentally (Ollila, 2014). In MD simulations, θ is the angle between the C-H bond and the normal to the membrane (usually the z axis). For AA simulations, S CH is trivial to calculate. However, it is more difficult for UA simulations since the aliphatic Hs are not present. There are two main strategies to compute S CH from UA simulations (Piggot et al., 2017): i) expressing S CH as a function of the coordinates of other atoms (Douliez et al., 1995), ii) reconstructing hydrogen coordinates and calculating S CH as in AA simulations. The trend in the recent years has been towards strategy ii), such as in the NMRlipids project (Antila et al., 2019;Bacle et al., 2021;Botan et al., 2015;Catte et al., 2016). NMRlipids is an open science project which uses MD simulations and experimental S CH with the goal of improving lipid force fields or conducting fundamental research on the structure and dynamics of lipid membranes.

Statement of Need
Reconstructing Hs from the heavy atom coordinates can be done using standard geometric rules respecting stereochemistry. In the first NMRlipids project (Botan et al., 2015), H reconstruction was performed using a program called g_protonate from the software GROMACS version 3.* (Berendsen et al., 1995). However, g_protonate has been removed from GRO-MACS version 4.* or higher. So, there was a need to find other solutions. Currently, there are many programs in the field of chemoinformatics that are able to build Hs such as OpenBabel (O'Boyle et al., 2011) or other proprietary softwares. It is also possible to use pdb2gmx from the GROMACS software (Abraham et al., 2015) to build Hs, but it is not initially intended for that since its main purpose is to build a topology. Many of these solutions remain workarounds where one uses a sophisticated software for just doing one basic task. Using these open or proprietary software usually has a slow learning curve. Moreover, it is not always easy to use them in the Unix command line when they have complicated graphical interfaces, thus complicating their use in automatic pipelines of trajectory analyses. Here, we propose the buildH software to meet this need. buildH is very light and usable in the Unix command line or as a Python module, making it a tool of choice to integrate in analysis pipelines or Jupyter notebooks. It can be easily extended or customized if needed. buildH is currently widely used in the NMRlipids project IVb dealing with the conformational plasticity of lipid headgroups in cellular membranes and protein-lipid complexes (Bacle et al., 2021). In addition, it is planned to use buildH in the next NMRlipids project dealing with a databank containing MD trajectories of lipids (Kiirikki & Ollila, 2021).

Overview
buildH is a Python software that enables automatic analyses of order parameter calculations from UA trajectories of lipids. The software has the following features: • It reads a single structure or a trajectory of lipids in a UA representation.
• It reconstructs the aliphatic Hs using standard geometric rules.
• From the reconstructed Hs, it calculates and outputs the order parameters on each requested C-H bond. • Optionally, it outputs a structure (in pdb format) and a trajectory (in xtc format) with all reconstructed Hs.
Beyond order parameter calculations, the trajectory with Hs can be used for any further analyses (e.g. precise molecular volume calculation).
buildH has been natively developed for a use in the Unix command line. It possesses a minimum number of options, making it easy to use. It is also possible to utilize it as a Python module, which may be convenient in some cases.
To reconstruct H atoms, buildH uses standard geometric rules. These rules require so-called helper atoms. For example, the reconstruction of the two Hs of a CH2 on carbon C i , requires two helpers which are C i−1 and C i+1 , that is, the two neighbors of C i in the chain (note that helpers can also be other heavy atoms such as oxygen or nitrogen). The list of helpers used for the reconstruction of each H is written in a json file. Many json files are already present on the buildH repository representing the major lipids: phosphatidylcholine (PC), phosphatidylethanolamine (PE), phosphatidylglycerol (PG), phosphatidylserine (PS) for the polar heads and palmitoyl, myristoyl, oleoyl for the aliphatic chains, as well as cholesterol.
Major UA force fields are also represented (Berger (Berger et al., 1997), GROMOS-CKP (Piggot et al., 2012), CHARMM-UA (Lee et al., 2014)). In case a user wants to analyze a lipid which is not present in buildH, a step-by-step documentation guides the user in the process of creating and supplying his/her lipid description as a json file.
All structure and trajectory read / write operations are handled by the MDAnalysis module (Gowers et al., 2016;Michaud-Agrawal et al., 2011). Mathematical vector operations are performed by Numpy (Harris et al., 2020) and accelerated with Numba (Lam et al., 2015), leading to very decent performances. For example, the reconstruction of all Hs and order parameter calculation on a trajectory of 2500 frames with 128 POPC molecules can be handled in approximately 7 minutes using a single core Xeon processor at 3.60 GHz, whereas it was almost 30 minutes long without Numba.
Some notebooks are provided in the GitHub repository to explain how buildH works and how to analyze the data produced. In case of trouble, any user can post an issue on GitHub.
buildH is available in the Python Package Index (PyPI) as well as in the Bioconda repository. The current version 1.6.0 of buildH is archived in the Zenodo repository (https://zenodo.org/record/5356246) and in the Software Heritage archive (swh:1:dir:4c63d5ca3497726a1e54ac152ce1667d7c004d2b).