Zoobot: Adaptable Deep Learning Models for Galaxy Morphology

1 Jodrell Bank Centre for Astrophysics, Department of Physics and Astronomy, University of Manchester, Manchester, UK 2 Zooniverse.org, University of Oxford, Oxford, UK 3 Institut für Planetologie, Westfälische Wilhelms-Universität Münster, Münster, Germany 4 Astronomical Observatory of the University of Warsaw, Warsaw, Poland 5 Oxford Astrophysics, Department of Physics, University of Oxford, Oxford, UK 6 The Alan Turing Institute, London, UK 7 Theoretical and Scientific Data Science Group, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste Italy 8 Université Paris Cité, Université Paris-Saclay, CEA, CNRS, AIM, Gif-sur-Yvette, France 9 Department of Physics, Lancaster University, Lancaster, UK 10 Physical Research Laboratory, Navrangpura, Ahmedabad, India 11 Vanderbilt University, Nashville, USA 12 Center for Astrophysics | Harvard & Smithsonian, Cambridge, USA 13 Dipartimento di Fisica, Università di Roma “Tor Vergata”, Roma, Italy 14 Department of Astronomy, Faculty of Mathematics, University of Belgrade, Belgrade, Serbia 15 Ruprecht Karl University of Heidelberg, Germany ¶ Corresponding author * These authors contributed equally. DOI: 10.21105/joss.05312


Summary
Zoobot is a Python package for measuring the detailed appearance of galaxies in telescope images using deep learning. Zoobot is aimed at astronomers who want to solve a galaxy image task such as finding merging galaxies or counting spiral arms. Astronomers can use Zoobot to adapt (finetune) pretrained deep learning models to solve their task. These finetuned models perform better and require far fewer new labels than training from scratch (Walmsley, Slijepcevic, et al., 2022).
The models included with Zoobot are pretrained on up to 92 million responses from Galaxy Zoo volunteers. Each volunteer answers a series of tasks describing the detailed appearance of each galaxy. Zoobot's models are trained to answer all of these diverse tasks simultaneously. The models can then be adapted to new related tasks. Zoobot provides a high-level API and guided workflow for carrying out the finetuning process. The API abstracts away engineering details such as efficiently loading astronomical images, multi-GPU training, iteratively finetuning deeper model layers, and so forth. Behind the scenes, these steps are implemented via either PyTorch or TensorFlow, according to the user's choice. Zoobot is therefore accessible to astronomers with no previous experience in deep learning.
For advanced users, Zoobot also includes the code to replicate and extend our pretrained models. This is used routinely at Galaxy Zoo to scale up galaxy measurement catalogs (Walmsley, Lintott, et al., 2022) and to prioritise the galaxies shown to volunteers for labelling. Zoobot models have been applied to measure galaxy appearance in SDSS (Walmsley et al., 2020), Hubble, HSC, and DESI, and are included in the data pipeline of upcoming space telescope Euclid (Laureijs et al., 2011). We hope that Zoobot will help empower astronomers to apply deep learning to answer their own science questions.

Statement of need
One common way to investigate why galaxies look the way they do is by measuring the appearance -morphology -of millions of galaxies and looking for connections between appearance and other physical properties (Masters, 2019). The sheer number of images requires most of these measurement to be made automatically with software (Walmsley et al., 2020).
Unfortunately, making automated measurements of complicated features like spiral arms is difficult because it is hard to write down a set of steps that reliably identify those and only those features. This mirrors many image classification problems back on Earth (LeCun et al., 2015). Astronomers often aim instead to learn the measurement steps directly from data by providing deep learning models with large sets of galaxy images with labels (e.g. spiral or not) (Huertas-Company & Lanusse, 2022).
Gathering large sets of labelled galaxy images is a major practical barrier. Models trained on millions to billions of labelled images consistently perform better (Bommasani et al., 2021;Dehghani et al., 2023), but astronomers cannot routinely label this many images. Neither can most other people; terrestrial practictioners often start with a model already trained ("pretrained") on a broad generic task and then adapt it ("finetune") to their specific measurement task ' (Ridnik et al., 2021).
Zoobot makes this approach available to astronomers. We provide models pretrained on millions of galaxy images and present a convenient API for finetuning those models. Early results (O'Ryan et al., 2023;Walmsley, Slijepcevic, et al., 2022) show that our pretrained models can be efficiently adapted to new morphology tasks.