Atomic Simulation Interface (ASI): application programming interface for electronic structure codes

1 Cardiff Catalysis Institute, School of Chemistry, Cardiff University, Cardiff, United Kingdom 2 Scientific Computing Department, STFC Daresbury Laboratory, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom 3 Department of Chemistry, Kathleen Lonsdale Materials Chemistry, University College London, London, United Kingdom 4 Thomas Lord Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States 5 SUPA, Department of Physics, University of Strathclyde, John Anderson Building, 107 Rottenrow, Glasgow G4 0NG, United Kingdom 6 Department of Chemistry, University of Warwick, Coventry, CV4 7AL, United Kingdom 7 Department of Physics, University of Warwick, Coventry, CV4 7AL, United Kingdom ¶ Corresponding author DOI: 10.21105/joss.05186


Statement of need
Although numerous modern electronic structure codes have a common mathematical basis and often share core algorithm implementations (such as ESL (Oliveira et al., 2020), ELSI (Yu et al., 2018(Yu et al., , 2020)), libxc (Lehtola et al., 2018;Marques et al., 2012)), a portable and efficient way to access resulting electronic structure variables from the user side remains elusive.For classical, AIMD, and hybrid QM/MM calculations, a similar issue is solved by the widely adopted i-PI interface (Kapil et al., 2019), MolSSI Driver Interface (Barnes et al., 2021), ASE library (Larsen et al., 2017), and ChemShell environment (Lu et al., 2019), however, for electronic structure data, such as wave functions, band structure, Hamiltonian or density matrices, there is no widely adopted solution, probably due to diversity of basis sets.Therefore, many well-developed codes for electronic structure analysis and integrating machine learning are hard to employ due to their explicit dependence on the specific electronic structure code used.For example, the MPE solvent model (Filser et al., 2022) is implemented only in FHI-aims, there are SchNOrb (Schütt, Gastegger, et al., 2019) models that are available only for ORCA code (Neese et al., 2020).Machine-learning models can be trained disregarding details of a specific electronic structure code's basis sets, therefore even without a universal representation of electronic quantities, a convenient and efficient way to access detailed electronic structure description will benefit the efforts towards modularization of electronic structure software.

State of the field
Demand to access electronic structure data is driven currently by the urge to apply recent machine-learning advances in the quantum chemistry field.After numerous successful applications of machine learning for direct prediction of energies and forces from atomic coordinates (Bartók et al., 2018;Z. Li et al., 2015;Schütt, Kessel, et al., 2019), data-driven models that predict electronic structure beyond energies and forces are being developed.Such data-driven models provide more interpretable outcomes, possess higher transferability, and can be used for prediction of a wider set of material properties.For example, Carleo & Troyer (2017) have employed reinforcement learning to compute ground-state and unitary time evolution of a few prototypical systems, and H. Li et al. (2018) have developed a deep-learning model that predicts a Hamiltonian matrix for subsequent DFTB calculations.The SchNOrb deep-learning framework uses neural tensor network representation of wave-functions for Hamiltonian matrix prediction (Schütt, Gastegger, et al., 2019).
Today, many electronic structure software packages use files to export or import information about electronic structure or potentials; this approach is implemented in ORCA (Neese et al., 2020), Quantum Espresso (Giannozzi et al., 2020), CP2K (Kühne et al., 2020), FHI-aims (Blum et al., 2009), DFTB+ (Hourahine et al., 2020), etc.Although file-based data exchange has advantages, it unavoidably introduces performance penalties and often takes additional coding efforts for data parsing and formatting, as storage formats are rarely good for active calculations.Rare exceptions such as GPAW (Mortensen et al., 2005), Psi4 (Turney et al., 2012), DFTK.jl (Herbst et al., 2021) do provide Python or Julia API's and thus simplify development of new functionality.
We believe the field will benefit from implementation of a universal API for access to DFTrelated quantities in popular quantum chemistry codes.Even without universal specifications of basis sets, an API for access to Hamiltonian, overlap, and density matrices would be helpful for the applications mentioned above.It will pave the way to implementation of new machine learning models and electronic structure analysis tools, and will accelerate their adoption.

Functionality
The Atomic Simulation Interface (ASI) is a specification of pure C functions, designed to be implemented in existing quantum chemistry codes.The scope and capabilities of ASI mostly focus on efficient way to transfer electronic structure data.The complete ASI specification can be found on the project web page pvst.gitlab.io/asi.The plain C API was chosen for simplicity of implementation in Fortran codes and for simplicity of invocation from other languages including Python and Julia.For the sake of convenience, a Python wrapper for ASI functions has been created: asi4py is available for installation via pip (package installer for Python).The asi4py wrapper was designed to be used with the ASE framework (Larsen et al., 2017) implementing the ASE's Calculator interface.Therefore, any DFT code that implements ASI API automatically gets an ASE calculator with efficient data transfer.
There are four groups of key ASI functions that are briefly described in Table 1, united by their primary purpose: control flow, atomic information, electrostatic potential exchange, and transfer of large arrays describing electronic structure of the simulated system (currently routines for Hamiltonian, overlap and density matrices are included in the ASI specification).Table 1 lists only a subset of ASI functions, ommiting auxiliary functions such as ASI_get_basis_size, ASI_n_atoms, etc.
Codes implementing the ASI API are expected to be built as a shared object library for dynamic linkage with the client code.Therefore, client codes get direct access to internal data structures of an ASI-implementing code, minimizing interoperability performance overhead.The suggested control-flow of a client code that employs ASI is shown in Figure 1.Given that the ASI API is expected to be implemented within existing codes using minimal changes necessary in a code base, we have made the group of control-flow functions as small as possible, preferring employment of callback functions.Registering a callback function gives the client code direct access to data objects of an ASI-implementing code and eliminates the need to copy them or to manage their lifetime.Callback functions that work with large matrices (Hamiltonian, overlap, and density matrices) support distributed storage via BLACS (Basic Linear Algebra Communication Subprograms) library (Basic Linear Algebra Communication Subprograms, n.d.).Each callback receives a BLACS descriptor of the matrix if MPI (Message Passing Interface (Walker & Dongarra, 1996)) parallelization is enabled.The dense storage format is currently supported, whilst support for sparse formats is expected in future versions.
The atomic information functions are designed to simplify calculation setup and for integration with classical simulation codes.The group of functions for electrostatic potential exchange are primarily meant for integration in QM/MM and AIMD workflows, for example into ChemShell (Lu et al., 2019) framework or in IC-QMMM (Golze et al., 2013) calculations.
The group of functions for electronic structure calculations are designed to support development of new algorithms for density functional theory, for example density-matrix extrapolation (Polack et al., 2021), and employment of machine-learning techniques on the electronic structure level, such as SchNOrb model (Schütt, Gastegger, et al., 2019) or atomic cluster expansion of Hamiltonians (Zhang et al., 2022).New use cases are expected to emerge once the electronic structure properties are exposed via the ASI API.
Currently the ASI API is implemented in the open-source DFTB+ code, and in the FHI-aims code.Most of the ASI API is implemented in both codes with exception of ASI_register_dm_init_callback that is unavailable for DFTB+ due to different nature of the self-consistent loop in its algorithm.In addition, the set of supported charge partitioning schemes in the ASI_atomic_charges function depends on implementation.Although the currently available implementations both use localized basis sets, we do not foresee obstacles for ASI API implementation for plane-wave or hybrid basis sets.
For the sake of implementation simplicity, we tried to keep the number of ASI functions as small as possible.Therefore, implementations of client code in languages like C or Fortran can be cumbersome.To ease code development with the ASI API, we have created a Python wrapper for it: asi4py.asi4py provides a wrapping class ASIlib for a dynamically loaded shared library that implements ASI.That class forwards Python calls to native C calls via the ctypes library.Direct access to large matrices and arrays is provided via NumPy (Harris et al., 2020) arrays, so no data is copied during wrapping, and therefore performance overhead is minimal.For redistribution of BLACS matrices, access to necessary subroutines of the ScaLAPACK (Blackford et al., 1997) implementation linked with the loaded ASI library is provided via an additional package scalapack4py, so the existing BLACS contexts and MPI communicators are reused by a client code.To ease the calculation setup, an Atomic Simulation Environment (ASE) (Larsen et al., 2017) calculator interface is implemented by ASI_ASE_calculator class.

Use Case
A minimal example of ASI API usage with DFTB+ for access to Hamiltonian, overlap, and density matrices is shown below: In this example the ASI_ASE_calculator class is used for ASI library loading.The path to the library is specified in ASI_LIB_PATH environment variable.To create DFTB+ input files, the ase.calculators.dftb.Dftb class from ASE is used.The ASIlib object is aggregated by the calculator as asi property.Three boolean properties keep_* are used to enable storing of copies of corresponding matrices in dictionaries named *_storage.Keys of these dictionaries are pairs of 1-based indices of k-points and spin channels (always (1,1) for non-periodic spin-paired systems).The exemplar code just saves three matrices of a water molecule and prints out the system energy, number of electrons, sum of eigenvalues, and number of the density matrix evaluation iterations from the dm_calc_cnt dictionary.

Figure 1 :
Figure 1: Suggested control-flow of a code that uses ASI API.The loop condition depends on the particular use case.

Table 1 . Key functions of the ASI API. For full list see ASI API Specification Control flow
ASI_energyReturns total system energy.ASI_forcesReturns pointer to the array of forces acting on atoms.ASI_calc_espCalculate local electrostatic potential and its gradient in arbitrary points (after ASI_run call)Electronic structure calculationsASI_register_dm_init_callbackInitialize SCF loop via density matrixASI_register_dm_callbackGet density matrix on each SCF iterationASI_register_overlap_callbackGet overlap matrix on each geometry change ASI_register_hamiltonian_callback Get Hamiltonian matrix on each SCF iteration