TrackSegNet: a tool for trajectory segmentation into diffusive states using supervised deep learning

TrackSegNet is a command-line python program, which permits the classification and segmentation of trajectories into diffusive states. A deep neural network is trained for each particular case using synthetic data and trajectory features as inputs. After classification on the experimental data using the trained network, the trajectories are segmented and grouped per diffusive state. TrackSegNet further estimates the motion parameters (the diffusion constant 𝐷 and anomalous exponent 𝛼 ) for each segmented track using the mean squared displacement (MSD) analysis


Statement of need
Recent advances in the field of microscopy allow the capture, at nanometer resolution, of the motion of fluorescently-labeled particles in live cells such as proteins or chromatin loci.Therefore, the development of methods to characterize the dynamics of a group of particles has become more than necessary (Muñoz-Gil et al., 2021).A typical analysis is the classification and segmentation of trajectories into diverse diffusive states when multiple types of motion are present in a dataset (e.g., confined, superdiffusive) due to the properties of the labeled molecule (e.g., protein bound/unbound to the DNA).
Several trajectory classification methods have recently been developed by the community exhibiting a diverse range of methodologies.For instance, Wagner et al. (2017) utilizes Random Forests, Hansen et al. (2018) relies on histogram of displacements, Pinholt et al. (2021) employs hidden Markov model (HMM) and Kabbech & Smal (2022) utilizes an unsupervised denoising technique.However, not all methods have readily available tools for direct application.Consequently, there is a growing need for the development of more user-friendly software tailored for practical implementation.

Method
This software is based on the method of Arts et al. (2019) with major improvements, making use of a stack of LSTM layers trained on synthetic trajectory features.The improvements include the calculation of angles as a feature to better distinguish the trajectory confinement, better management of trajectory gaps, and the use of the mean squared displacement (MSD) instead of the moment scaling spectrum (MSS) analysis to better estimate the dynamics.This version includes a user-friendly software allowing the replicability for other datasets.

Neural Network
Tracking particles from 2-dimensional images results in a set  of trajectories   ∈ ,  = {1, … , }, where  is the total number of trajectories, and   () = (  (),   ()) are the 2D coordinates of the particle  at time .
The network is built using functions from the Keras library, and is composed of a bidirectional long short-term memory (LSTM) layer (having 200 hidden units), followed by a fully connected time-distributed layer with a SoftMax activation function.The inputs of the network are of six trajectory features previously computed, while the outputs are probabilities for each trajectory point of belonging to one of the  diffusive states, the predicted state is defined by the highest probability.

Training
The network is trained using synthetic fractional Brownian motion (fBm) trajectories exhibiting mixed diffusive states.For this purpose, 10,000 fBm trajectories with a switching mode between states and a total length of 27 frames are generated for each independent training.The fBm process is characterized using the fBm kernel (Lundahl et al., 1986) defined as  FBM () =  [ | + 1|  − 2||  + | − 1|  ], with  = Δ/ (Δ the time measured between two frames) and the pre-defined motion parameters  = (, ).
The model is optimized using Adam during the training and a categorical cross-entropy loss function.

Model parameters
The main parameters of the training are tunable from the params.csvfile to create a new variant of the model: • num_states is an important parameter permitting to decide the number  of diffusive states.This number can vary from 2 to 6 states, but it is recommended to choose 2 to 4 states.• state_i_diff and state_i_alpha the approximate motion parameters  for each of the  diffusive state.The diffusion constant  is dimensionless, and the anomalous exponent value  is ranging from 0 to 2 (]0-1[: subdiffusion, 1: Brownian motion, ]1-2[: superdiffusion).• pt_i_j the probability of transitioning from the state i to the state j.The total number of probabilities should be  2 .
The remaining parameters are related to the experimental dataset: • data_path, the path of the dataset of trajectories to segment.
• track_format, the format of the files containing the trajectory coordinates, either in MDF (see MTrackJ data file format) or CSV • time_frame, the time interval between two trajectory points in seconds.
• pixel_size, the dimension of a pixel in .

Classification and MSD analysis
Before computing the features for each experimental trajectory, gaps in trajectories of length 1 are filled by a randomly generated point; while the larger gaps are split in two separate trajectories.Each point is therefore classified as one of the  diffusive states using the trained LSTM model.Based on the state classification, the trajectories are segmented and the motion parameters are estimated for each segmented track (longer than 5 frames) using the MSD analysis.The latter consists of applying a least-square fit from the logarithm form of the MSD power-law equation (Metzler et al., 2014).Both  and  distributions can be plotted in a scatterplot as shown in Figure 1.The new probability transition matrix and proportion of tracklet points in each diffusive state are also calculated.
In brief, the outputs of this software permit the segmentation of trajectories into shorter segments based on their type of diffusion.Measurements are then performed (MSD analysis, angles and distributions) to evaluate the dynamics for each state group/segment.

Figure 1 :
Figure 1: Analysis pipeline of TrackSegNet described in two steps.