High-performance neural population dynamics modeling enabled by scalable computational infrastructure

Advances in neural interface technology are facilitating parallel, high-dimensional time series measurements of the brain in action. A powerful strategy for analyzing these measurements is to apply unsupervised learning techniques to uncover lower-dimensional latent dynamics that explain much of the variance in the high-dimensional measurements (Cunningham & Yu, 2014; Golub et al., 2018; Vyas et al., 2020). Latent factor analysis via dynamical systems (LFADS) (Pandarinath et al., 2018) provides a deep learning approach for extracting estimates of these latent dynamics from neural population data. The recently developed AutoLFADS framework (Keshtkaran et al., 2022) extends LFADS by using Population Based Training (PBT) (Jaderberg et al., 2017) to effectively and scalably tune model hyperparameters, a critical step for accurate modeling of neural population data. As hyperparameter sweeps are one of the most computationally demanding processes in model development, these workflows should be deployed in a computationally efficient and cost effective manner given the compute resources available (e.g., local, institutionally-supported, or commercial computing clusters). The initial implementation of AutoLFADS used the Ray library (Moritz et al., 2018) to enable support for specific local and commercial cloud workflows. We extend this support, by providing additional options for training AutoLFADS models using local clusters in a container-native approach (e.g., Docker,

As the neurosciences increasingly employ deep learning based models that require compute intensive hyperparameter optimization (Keshtkaran & Pandarinath, 2019;Willett et al., 2021;Yu et al., 2021), standardization and dissemination of computational methods becomes increasingly challenging. Although this work specifically provides implementations of AutoLFADS, the tooling provided demonstrates strategies for employing computation at scale while facilitating dissemination and reproducibility.

Statement of need
Machine learning models enable neuroscience researchers to uncover new insights regarding the neural basis of perception, cognition, and behavior (Vu et al., 2018). However, models are often developed with hyperparameters tuned for a specific dataset, despite their intended generality. Application to new datasets requires computationally intensive hyperparameter searches for model tuning. Given the diversity of data across tasks, species, neural interface technologies, and brain regions, hyperparameter tuning is common and presents a significant barrier to evaluation and adoption of new algorithms. With the maturation of "AutoML" hyperparameter exploration libraries (HyperOpt, SkOpt, Ray), it is now easier to effectively search an extensive hyperparameter space. Solutions like KubeFlow (Kubeflow, 2018) additionally enable scaling on managed clusters and provide near codeless workflows for the entire machine learning lifecycle. This lifecycle typically begins with data ingest and initial evaluation of machine learning algorithms with respect to data, and then matures to compute intensive model training and tuning. Building upon these tools, we empower researchers with multiple deployment strategies for leveraging AutoLFADS on local compute, on ad-hoc or unmanaged compute, and on managed or cloud compute, as illustrated in Figure 1.
When training models on a novel dataset, it is often helpful to probe hyperparameters and investigate model performance locally prior to conducting a more exhaustive, automated hyperparameter search. This need can be met by installing the LFADS package locally or in a virtual environment. Isolating the workflow from local computational environments, we provide a pair of reference container images targeting CPU and GPU architectures. This allows users to treat the bundled algorithm as a portable executable for which they simply provide the input neural data and desired LFADS model configuration to initiate model training. This approach eliminates the need for users to configure their environments with compatible interpreters and dependencies. Instead, the user installs a container runtime engine (e.g., Docker, Podman), which are generally well-supported cross-platform tools, to run the image based solution. In addition to streamlining configuration, this approach enables reproducibility as the software environment employed for computation is fully defined and version controlled.
Scaling initial investigations may involve evaluating data on internal lab resources, which may comprise a set of loosely connected compute devices. In such a heterogeneous environment, we leverage Ray to efficiently create processing jobs. In this approach Ray spawns a set of workers on compute nodes that the primary spawner is then able to send jobs to. This approach requires users to provide a mapping of machine locations (e.g., IP, hostname) and access credentials. It provides useful flexibility beyond single node local compute, but requires users to manage compute cluster configuration details. Ray can also be deployed in managed compute environments, but similarly requires users to have knowledge of the underlying compute infrastructure configuration defined by the managed environment. In short, the Ray based solution requires researchers to specifically target, and potentially modify, compute cluster configuration.
To more effectively leverage large scale compute in managed infrastructure, such as those provided by commercial and academic cloud providers, we use KubeFlow which is a comprehensive machine learning solution designed to be operated as a service on top of Kubernetes based orchestration. This approach enables code-less workflows and provides a rich set of tooling around development (e.g., notebooks, algorithm exploration) and automation (e.g., Pipelines) that reduces research iteration time. In contrast to Ray, configuration requirements are algorithm focused and are generally agnostic to the lower level details related to compute cluster configuration. With this solution, Kubernetes manages the underlying compute resource pool and is able to efficiently schedule compute jobs. Within KubeFlow, we leverage Katib (George et al., 2020) -KubeFlow's "AutoML" framework -to efficiently explore the hyperparameter space and specify individual sweeps. As KubeFlow is an industry-grade tool, many cloud providers offer KubeFlow as a service or provide supported pathways for deploying a KubeFlow cluster, facilitating replication and compute resource scaling.
The two distributed workflows provided, Ray and KubeFlow based, each have their respective advantages and disadvantages. The correct choice for a specific research end user is dependent upon their requirements and access to compute resources. Thus we provide an evaluation in Table 1 as a starting point in this decision making process.

Evaluation
A core innovation of AutoLFADS is the integration of PBT for hyperparameter exploration.
As the underlying job scheduler and PBT implementation are unique in KubeFlow, we used the MC Maze dataset  from the Neural Latents Benchmark (Pei et al., 2021) to train and evaluate two AutoLFADS models. One model was trained with the Ray solution and the other with the KubeFlow solution using matching PBT hyperparameters and model configurations to ensure that models of comparable quality can be learned across both solutions. A comprehensive description of the AutoLFADS algorithm and results applying the algorithm to neural data using Ray can be found in Keshtkaran et al. (2022). We demonstrate similar converged model performances on metrics relevant to the quality of inferred firing rates in Table 2 (Pei et al., 2021). In Figure  3, inferred firing rates from the KubeFlow trained AutoLFADS model are shown along with conventional firing rate estimation strategies. Qualitatively, these example inferences are similar to those described in Keshtkaran et al. (2022), showing similar consistency across trials and resemblance to peristimulus time histograms (PSTH). In Figure 2, we plot the hyperparameter and associated loss values for the KubeFlow based implementation of AutoLFADS to provide a visualization of the PBT based optimization process on these data.
These results demonstrate that although PBT is stochastic, both the original Ray and novel KubeFlow implementations are converging to stable, comparable solutions.   AutoLFADS inferred firing rates, relative to conventional estimation strategies, aligned to movement onset time (dashed vertical line at 250ms) for 3 example neurons (columns) and 6 example conditions (colors; out of 108 conditions  AutoLFADS Performance. An evaluation of AutoLFADS performance on Ray and KubeFlow. Test trial performance comparison on four neurally relevant metrics for evaluating latent variable models: co-smoothing on held-out neurons (co-bps), hand trajectory decoding on held-out neurons (vel R2), match to peristimulus time histogram (PSTH) on held-out neurons (psth R2), forward prediction on held-in neurons (fp-bps). The trained models converge with less than 5% difference between the frameworks on the above metrics. The percent difference is calculated with respect to the Ray framework.