Traja: A Python toolbox for animal trajectory analysis

There are generally four categories of trajectory data: mobility of people, mobility of transportation vehicles, mobility of animals, and mobility of natural phenomena (Zheng, 2015). Animal tracking is important for fields as diverse as ethology, optimal foraging theory, and neuroscience. Mouse behavior, for example, is a widely studied in biomedical and brain research in models of neurological disease such as stroke.1


Summary
There are generally four categories of trajectory data: mobility of people, mobility of transportation vehicles, mobility of animals, and mobility of natural phenomena (Zheng, 2015). Animal tracking is important for fields as diverse as ethology, optimal foraging theory, and neuroscience. Mouse behavior, for example, is a widely studied in biomedical and brain research in models of neurological disease such as stroke. 1 Several tools exist which allow analyzing mouse locomotion. Tools such as Ethovision (Spink et al., 2001) and DeepLabCut (Mathis et al., 2018) allow converting video data to pose coordinates, which can further be analyzed by other open source tools. DLCAnalyzer 2 provides a collection of R scripts for analyzing positional data, in particular visualizing, classifying and plotting movement. B-SOiD (Hsu & Yttri, 2020) allows unsupervised clustering of behaviors, extracted from the pose coordinate outputs of DeepLabCut. SimBA (sgoldenlab et al., 2021) provides several classifiers and tools for behavioral analysis in video streams in a Windowsbased graphical user interface (GUI) application.
These tools are primarily useful for video data, which is not available for the majority of animal studies. For example, video monitoring of home cage mouse data is impractical today due to housing space constraints. Researchers using Python working with non-visual animal tracking data sources are not able to fully leverage these tools. Thus, a tool that supports modeling in the language of state-of-the-art predictive models (Amirian et al., 2019;Chandra et al., 2019;Liang et al., 2019), and which provides animal researchers with a high-level API for multivariate time series feature extraction, modeling and visualization is needed.
Traja is a Python package for statistical analysis and computational modelling of trajectories. Traja extends the familiar pandas (McKinney, 2010;team, 2020) methods by providing a pandas accessor to the df.traja namespace upon import. The API for Traja was designed to provide an object-oriented and user-friendly interface to common methods in analysis and visualization of animal trajectories. Traja also interfaces well with relevant spatial analysis packages in R (e.g., trajr (McLean & Volponi, 2018) and adehabitat (Calenge, 2006)), Shapely , and MovingPandas (Graser, 2019) allowing rapid prototyping and comparison of relevant methods in Python. A comprehensive source of documentation is provided on the home page (http://traja.readthedocs.io).

Statement of Need
The data used in this project includes animal trajectory data provided by http://www.tecniplast.it, manufacturer of laboratory animal equipment based in Varese, Italy, and Radboud University, Nijmegen, Netherlands. Tecniplast provided the mouse locomotion data collected with their Digital Ventilated Cages (DVC). The extracted coordinates of the mice requires further analysis with external tools. Due to lack of access to equipment, mouse home cage data is rather difficult to collect and analyze, thus few studies have been done on home cage data. Furthermore, researchers who are interested in developing novel algorithms must implement from scratch much of the computational and algorithmic infrastructure for analysis and visualization. By packaging a library that is particularly useful for animal locomotion analysis, future researchers can benefit from access to a high-level interface and clearly documented methods for their work.
Other toolkits for animal behavioral analysis either rely on visual data (Mathis et al., 2018;Sridhar, 2017) to estimate the pose of animals or are limited to the R programming language (McLean & Volponi, 2018). Prototyping analytical approaches and exploratory data analysis is furthered by access to a wide range of methods which existing libraries do not provide. Python is the de facto language for machine learning and data science programming, thus a toolkit in Python which provides methods for prototyping multivariate time series data analysis and deep neural network modeling is needed.

Overview of the Library
Traja targets Python because of its popularity with data scientists. The library leverages the powerful pandas library (McKinney, 2010), while adding methods specifically for trajectory analysis. When importing Traja, the Traja namespace registers itself within the pandas dataframe namespace via df.traja.
The software is structured into three parts. These provide functionality to transform, analyse and visualize trajectories. Full details are available at https://traja.readthedocs.io/. The traj ectory module provides analytical and preprocessing functionalities. The models subpackage provides both traditional and neural network-based tools to determine trajectory properties. The plotting module allows visualizing trajectories in various ways.
Data, e.g., x and y coordinates, are stored as one-dimensional labelled arrays as instances of the pandas native Series class. Further, subclassing the pandas DataFrame allows providing an API that mirrors the pandas API which is familiar to most data scientists, thus reducing the barrier for entry while providing methods and properties specific to trajectories for rapid prototyping. Traja depends on Matplotlib (Hunter, 2007) and Seaborn (Waskom, 2021) for plotting and NumPy (Harris et al., 2020) for computation.

Trajectory Data Sources
Trajectory data as time series can be extracted from a wide range of sources, including video processing tools as described above, GPS sensors for large animals or via home cage floor sensors, as described in the section below. The methods presented here are implemented for orthogonal coordinates (x, y) primarily to track animal centroids, however with some modification they could be extended to work in 3-dimensions and with body part locations as inputs. Traja is thus positioned at the end of the data scientist's chain of tools with the hope of supporting prototyping novel data processing approaches. A sample dataset of jaguar movement (Morato et al., 2018) is provided in the traja.dataset subpackage.

Mouse Locomotion Data
The data samples presented here 3 are in 2-dimensional location coordinates, reflecting the mouse home cage (25x12.5 cm) dimensions. Analytical methods relevant to 2D rectilinear analysis of highly constrained spatial coordinates are thus primarily considered.
High volume data like animal trajectories has an increased tendency to have missing data due to data collection issues or noise. Filling in the missing data values, referred to as data imputation, is achieved with a wide variety of statistical or learning-based methods. As previously observed, data science projects typically require at least 95% of the time to be spent on cleaning, pre-processing and managing the data (Bosch et al., 2021). Therefore, several methods relevant to preprocessing animal data are demonstrated throughout the following sections.

Spatial Trajectory
A spatial trajectory is a trace generated by a moving object in geographical space. Trajectories are traditionally modelled as a sequence of spatial points like: Generating spatial trajectory data via a random walk is possible by sampling from a distribution of angles and step sizes (Kareiva & Shigesada, 1983;McLean & Volponi, 2018). A correlated random walk ( Figure 1) is generated with traja.generate.

Spatial Transformations
Transformation of trajectories can be useful for comparing trajectories from various geospatial coordinates, data compression, or simply for visualization purposes.

Feature Scaling
Feature scaling is common practice for preprocessing data for machine learning (Grus, 2015) and is essential for even application of methods to attributes. For example, a high dimensional feature vector x ∈ R n where some attributes are in (0, 100) and others are in (−1, 1) would lead to biases in the treatment of certain attributes. To limit the dynamic range for multiple data instances simultaneously, scaling is applied to a feature matrix X = {x 1 , x 2 , ..., x N } ∈ R n×N , where n is the number of instances.

Min-Max Scaling
To guarantee that the algorithm applies equally to all attributes, the normalized feature matrixX is rescaled into range (0, 1) such that The result of standardization is that the features will be rescaled to have the property of a standard normal distribution with µ = 0 and σ = 1 where µ is the mean (average) of the data and σ is the standard deviation from the mean. Standard scores (also known as z-scores are calculated such that

Rotation
Rotation of a 2D rectilinear trajectory is a coordinate transformation of orthonormal bases x and y at angle θ (in radians) around the origin defined by

Trip Grid
One strategy for compressing the representation of trajectories is binning the coordinates to produce an image as shown in Figure 2. Allowing computation on discrete variables rather than continuous ones has several advantages stemming from the ability to store trajectories in a more memory efficient form. 4 The advantage is that computation is generally faster, more data can fit in memory in the case of complex models, and item noise can be reduced.
Creation of an M * N grid allows mapping trajectory T k onto uniform grid cells. Generalizing the nomenclature of (Wang, 2017) to rectangular grids, C mn (1 ≤ m ≤ M ; 1 ≤ n ≤ N ) denotes the cell in row m and column n of the grid. Each point P ki is assigned to a cell indicates the relative number of points assigned to cell C mn . Partionining of spatial position into separate grid cells is often followed by generation of hidden Markov models (Jeung et al., 2007) (see below) or visualization of heat maps ( Figure  3).

Figure 3:
Visualization of heat map from bins generated with df.trip_grid. Note regularly spaced artifacts (bright yellow) in this sample due to a bias in the sensor data interpolation. This type of noise can be minimized by thresholding or using a logarithmic scale, as shown above.

Resampling and Rediscretizing
Trajectories can be resampled by time or rediscretized by an arbitrary step length. This can be useful for aligning trajectories from various data sources and sampling rates or reducing the number of data points to improve computational efficiency. Care must be taken to select a time interval which maintains information on the significant behavior. If the minimal time interval observed is selected for the points, calculations will be computationally intractable for some systems. If too large of an interval is selected, we will fail to capture changes relevant to the target behavior in the data.
Resampling by time is performed with resample_time ( Figure 4). Rediscretizing by step length is performed with rediscretize. For example, the Fortasyn dataset (Shenk et al., 2020) demonstrated in this paper was sampled at 4 Hz and converted to single-precision floating point data. Pandas dataframes store this data in 4 bytes, thus there are approximately 4.15 MB 5 bytes required to store data for x and y dimensions plus an index reference for a single day. In the case of (Shenk et al., 2020), 24 mice were observed over 35 days. This translates to 3.4 GB (10 9 ) of storage capacity for the uncompressed datasets prior to feature engineering. Thus resampling can be a useful way to reduce the memory footprint for memory constrained processes that have to fit into a standard laptop with 8 GB memory space. A demonstration of how reduction in precision for trajectory data analysis is provided in Figure 4, as applied to a sample from the Fortasyn experiment (Shenk et al., 2020). Broad effects such as cage crossings, for example, can still be identified while downsampling data to a lower frequency, such as 0.1 Hz, reducing the memory footprint by a factor of 40 (4 Hz/0.1 Hz) and providing significant speedups for processing.

Movement Analysis
Traja includes traditional as well as advanced methods for trajectory analysis.

Distance traveled
Distance traveled is a common metric in animal studies -it accounts for the total distance covered by the animal within a given time interval. The distance traveled is typically quantified by summing the square straight-line displacement between discretely sampled trajectories (Rowcliffe et al., 2012;Solla et al., 1999). Alternative distance metrics for the case of animal tracking are discussed in (Noonan et al., 2019).
Let p(t) = [p x (t), p y (t)] be a 2 × 1 vector of coordinates on the ground representing the position of the animal at time t. Then, the distance traveled within the time interval t 1 and t 2 can be computed as a sum of step-wise Euclidean distances is the Euclidean distance between two positions in adjacent time samples.

Speed
Speed or velocity is the first derivative of centroids with respect to time. Peak velocity in a home cage environment is perhaps less interesting than a distribution of velocity observations, as in Figure 5. Additionally, noise can be eliminated from velocity calculations by using a minimal distance moved threshold, as demonstrated in (Shenk et al., 2020). This allows identifying broad-scale behaviors such as cage crossings.

Turn Angles
Turn angles are the angle between the movement vectors of two consecutive samples. They can be calculated with calc_turn_angles.

Laterality
Laterality is the preference for left or right turning and a laterality index is defined as: where RT is the number of right turns observed and LT is the number of left turns observed. Turns are counted within a left turn angle ∈ (θ, 90) and right turn angle ∈ (−θ, −90). A turn is considered to have a minimal step length.

Periodicity
Periodic behaviors are a consequence of the circadian rhythm as well as observing expression of underlying cognitive traits. Some basic implementations of periodic analysis of mouse cage data are presented.

Autocorrelation
Autocorrelation is the correlation of a signal with a delayed copy of itself as a function of the decay. Basically, it is similarity of observations as a function of the time lag between them. An example is shown in Figure 6.

Power Spectrum
Power spectrum of a time series signal can be estimated (Figure 7). This is useful for analyzing signals, for example, the influence of neuromotor noise on delays in hand movement (Van Galen et al., 1990).

Algorithms and Statistical Models Machine Learning for Time Series Data
Machine learning methods enable researchers to solve tasks computationally without explicit instructions by detecting patterns or relying on inference. Thus they are particularly relevant for data exploration of high volume datasets such as spatial trajectories and other multivariate time series.

Principal Component Analysis
Principal Component Analysis projects the data into a linear subspace with a minimum loss of information by multiplying the data by the eigenvectors of the covariance matrix.

Clustering
Clustering of trajectories is an extensive topic with applications in geospatial data, vehicle and pedestrian classification, as well as molecular identification. K-means clustering is an iterative unsupervised learning method that assigns a label to data points based on a distance function (Bishop, 2006) (Figure 10).

Hierarchical Agglomerative Clustering
Clustering spatial trajectories has broad applications for behavioral research, including unsupervised phenotyping (Huang et al., 2020). For mice, hierarchical agglomerative clustering can also be used to identify similarities between groups, for example periodic activity and location visit frequency (Bak et al., 2009).

Gaussian Processes
Gaussian Processes is a non-parametric method which can be used to model spatial trajectories. This method is not currently implemented in Traja and is thus outside the scope of the current paper, however the interested reader is directed to the excellent text on Gaussian processes by Rasmussen and Williams (Rasmussen & Williams, 2006) for a complete reference and (Cox et al., 2012) for an application to spatial trajectories.

Fractal Methods
Fractal (i.e. multiscale) methods are useful for analyzing transitions and clustering in trajectories. For example, search trajectories such as eye movement, hand-eye coordination, and foraging can be analyzed by quantifying the spatial distribution or nesting of temporal point processes using spatial Allen Factor analysis (Huette et al., 2013;Kerster et al., 2016).
Recurrence plots and derivative recurrence factor analysis can be applied to trajectories to identify multiscale temporal processes to study transition or nonlinear parameters in a system, such as postural fluctuation (Ross et al., 2016) and synchrony (Shockley et al., 2003) in humans and to movement of animals such as ants (Neves et al., 2017) and bees (Ayers et al., 2015). These methods are not yet implemented in Traja, but are planned for a future release.

Graph Models
A graph is a pair G = (V, E) comprising a set of vertices and a set of connecting edges. A probabilistic graphical model of a spatial occupancy grid can be used to identify probabilities of state transitions between nodes. A basic example is given with hidden Markov models below. Figure 11: Transition matrix. Rows and columns are flattened histogram of a grid 20 cells high and 10 cells wide. Spatially adjacent grid cells are visible at a spacing of -11, -10, -9, 1, 10, and 11 cells from the diagonal. The intensity of pixels in the diagonal represents relative likelihood to stay in the same position.

Hidden Markov Models
Transition probabilities are most commonly modelled with Hidden Markov Models (HMM) because of their ability to capture spatial and temporal dependencies. A recent introduction to these methods is available provided by . HMMs have successfully been used to analyze movement of caribou (Franke et al., 2004), fruit flies (Holzmann et al., 2006), and tuna , among others. Trajectories are typically modelled as bivariate time series consisting of step length and turn angle, regularly spaced in time.
Traja implements the rectangular spatial grid version of HMM with transitions.
The probability of transition from each cell to another cell is stored as a probability within the transition matrix. This can visualized as a heatmap and plotted with plot_transition_ma trix (Figure 11).

Convex Hull
The convex hull of a subtrajectory is the set X of points in the Euclidean plane that is the smallest convex set to include X. For computational efficiency, a geometric k-simplex can be plotted covering the convex hull by converting to a Shapely object and using Shapely's convex_hull method.

Recurrent Neural Networks
In recent years, deep learning has transformed the field of machine learning. For example, the current state of the art models for a wide range of tasks, including computer vision, speech to text, and pedestrian trajectory prediction, are achieved with deep neural networks. Neural networks are essentially sequences of matrix operations and elementwise function application based on a collection of computing units known as nodes or neurons. These units perform operations, such as matrix multiplication on input features of a dataset, followed by backpropagation of errors, to identify parameters useful for approximating a function.

Figure 12: Neural network architectures available in Traja
Recurrent Neural Networks (RNNs) are a special type of Neural Networks that use a state S(t i−1 ) from the previous timestep t i−1 alongside X(t i ) as input. They output a prediction Y (t i ) and a new state S(t i ) at every step. Utilising previous states makes RNNs particularly good at analyzing time series like trajectories, since they can process arbitrarily long inputs. They remember information from previous time steps X(t i−k ), ..., X(t i−1 ) when processing the current time step X(t i ).
Trajectory prediction lets researchers forecast the location and trajectory of animals (Wijeyakulasuriya et al., 2020). Where this technique works well, it is also a sign that the trajectory is highly regular and, fundamentally, follows certain rules and patterns. When tracking an animal live, it would also let researchers predict when it will arrive at a particular location, or where it will go, letting them rig cameras and other equipment ahead of time.
A particularly interesting type of RNN is the Long Short Term Memory (LSTM) architecture. Their layers use stacks of units, each with two hidden variables -one that quickly discards old states and one that slowly does so -to consider relevant information from previous time steps. They can thus look at a trajectory and determine a property of the animal -whether it is sick or injured, say -something that is time-consuming and difficult to do by hand. They can also predict future time steps based on past ones, letting researchers estimate where the animal will go next. LSTMs can also classify trajectories, determining whether a trajectory comes from an animal belonging in a specific category. This lets researchers determine how a controlled or semi-controlled variable (e.g., pregnancy) changes the movement pattern of an animal.
Traja implements neural networks by extending the widely used open source machine learning library PyTorch (Paszke et al., 2019), primarily developed by Facebook AI Research Group. Traja allows framework-agnostic modeling through data loaders designed for time series. In addition, the Traja package comes with several predefined model architectures which can be configured according to the user's requirements.
Because RNNs work with time series, the trajectories require special handling. The traja.d ataset.MultiModalDataLoader efficiently groups subsequent samples and into series and splits these series into training and test data. It represents a Python iterable over the dataset and extends the PyTorch DataLoader class, with support for • random, weighted sampling, • data scaling, • data shuffling, • train/validation/test split.
MultiModalDataLoader accepts several important configuration parameters and allows batched sampling of the data. The two constructor arguments n_past and n_future specify the number of samples that the network will be shown and the number that the network will have to guess, respectively. batch_size is generally in the dozens and is used to regularise the network.
The RNNs also need to be trained -this is done by the high-level Trainer class below. It performs nonlinear optimisation with a Stochastic Gradient Descent-like algorithm. The Trainer class by default implements the Huber loss function (Huber, 1964), also known as smooth L 1 loss, which is a loss function commonly used in robust regression: for |a| ≤ δ, δ(|a| − 1 2 δ), otherwise.
In comparison to mean-squared error loss, Huber loss is less sensitive to outliers in data: it is quadratic for small values of a, and linear for large values. It extends the PyTorch SmoothL1Loss class, where the d parameter is set to 1. 6 A common optimization algorithm is ADAM and is Traja's default, but several others are provided as well. Although training with only a CPU is possible, a GPU can provide a 40 − 100x speedup (Arpteg et al., 2018).

Recurrent Autoencoder Networks
Traja can also train autoencoders to either predict the future position of a track or classify the track into a number of categories. Autoencoders embed the time series into a time-invariant latent space, allowing representation of each trajectory or sub-trajectory as a vector. A class of well-separated trajectories would then be restricted to a region of the latent space. The technique is similar to Word2vec (Mikolov et al., 2013), where words are converted to a 100+ dimensional vector. In this approach, forecasting and classification are both preceded by training the data in an autoencoder, which learns an efficient representation of the data for further computation of the target function.
Traja allows training a classifier that works directly on the latent space output -since each class of trajectories converges to a distinct region in the latent space, this technique is often superior to classifying the trajectory itself. Traja trains classifiers for both Autoencoder-style and Variational Autoencoder-style RNNs. When investigating whether animal behavior has changed, or whether two experimental categories of animals behave differently, this unstructured data mining can suggest fruitful avenues for investigation.