Krylov.jl: A Julia basket of hand-picked Krylov methods

AH denotes the conjugate transpose of A. It coincides with AT, the transpose of A, if A is real. Krylov methods are iterative methods based on Krylov (1931) subspaces. They are an alternative to direct methods such as Gaussian elimination or QR decomposition when storage requirements or computational costs become prohibitive, which is often the case for large and sparse linear problems. Contrary to direct methods, which require storing A explicitly, Krylov methods support linear operators to model operator-vector products u ← Av, and in some instances u ← AHw because Krylov processes only require those operations to build Krylov subspaces. The same goes with preconditioners, i.e., transformations that modify a linear system into an equivalent form with favorable spectral properties that may yield faster convergence in finite-precision arithmetic. We refer interested readers to Ipsen & Meyer (1998) for an introduction to Krylov methods along with Greenbaum (1997) and Saad (2003) for more details.


𝐴𝑥 = 𝑏 𝐴
denotes the conjugate transpose of .It coincides with   , the transpose of , if  is real.Krylov methods are iterative methods based on Krylov (1931) subspaces.They are an alternative to direct methods such as Gaussian elimination or QR decomposition when storage requirements or computational costs become prohibitive, which is often the case for large and sparse linear problems.Contrary to direct methods, which require storing  explicitly, Krylov methods support linear operators to model operator-vector products  ← , and in some instances  ←    because Krylov processes only require those operations to build Krylov subspaces.The same goes with preconditioners, i.e., transformations that modify a linear system into an equivalent form with favorable spectral properties that may yield faster convergence in finite-precision arithmetic.We refer interested readers to Ipsen & Meyer (1998) for an introduction to Krylov methods along with Greenbaum (1997) and Saad (2003) for more details.

Largest collection of Krylov processes and methods
Krylov.jl aims to provide a user-friendly and unified interface for the largest collection of Krylov processes and methods, all programming languages taken together, with six and thirty-five implementations, respectively: Some processes and methods are not available elsewhere and are the product of our own research.References for each process and method are available in the extensive documentation.
Beyond the number of methods, Krylov.jl is the only package that offers all of the features that we describe below.

Support for any floating-point system supported by Julia
Krylov.jl works with real and complex data in any floating-point system supported by Julia, which means that Krylov.jlhandles any precision T and Complex{T} where T <: AbstractFloat.
Although most personal computers offer IEEE 754 single and double precision computations, new architectures implement native computations in other floating-point systems.In addition, software libraries such as the GNU MPFR, shipped with Julia, let users experiment with computations in variable, extended precision at the software level with the BigFloat data type.
Working in high precision has obvious benefits in terms of accuracy.

Support for NVIDIA, AMD and Intel GPUs
Krylov methods are well suited for GPU computations because they only require operator-vector products ( ← ,  ←   ) and vector operations (‖‖,   ,  ←  + ), which are highly parallelizable.The implementations in Krylov.jlare generic so as to take advantage of the multiple dispatch and broadcast features of Julia.Those allow the implementations to be specialized automatically by the compiler for both CPU and GPU.Thus, Krylov.jlworks with GPU backends that build on GPUArrays.jl,including CUDA.jl, AMDGPU.jl, and oneAPI.jl, the Julia interfaces to NVIDIA, AMD, and Intel GPUs.

Support for linear operators
The input arguments of all Krylov.jlsolvers that model , , ,  and preconditioners can be any object that represents a linear operator.Krylov methods combined with linear operators allow to reduce computation time and memory requirements considerably by avoiding building and storing matrices.In nonlinear optimization, finding a critical point of a continuous function frequently involves linear systems where  is a Hessian or a Jacobian.Materializing such operators as matrices is expensive in terms of operations and memory consumption and is unreasonable for high-dimensional problems.However, it is often possible to implement efficient Hessian-vector and Jacobian-vector products, for example with the help of automatic differentiation tools.

In-place methods
All solvers in Krylov.jlhave an in-place variant that allows to solve multiple linear systems with the same dimensions, precision and architecture.Optimization methods such as the Newton and Gauss-Newton methods can take advantage of this functionality by allocating workspace for the solve only once.The in-place variants only require a Julia structure that contains all the storage needed by a Krylov method as additional argument.In-place methods limit memory allocations and deallocations, which are particularly expensive on GPUs.

Performance optimizations and storage requirements
Operator-vector products and vector operations are the most expensive operations in Krylov.jl.The vectors in Krylov.jlare always dense.One may then expect that taking advantage of an optimized BLAS library when one is available on CPU and when the problem data is stored in a supported representation should improve performance.Thus, we dispatch vector-vector operations to BLAS1 routines, and operator-vector operations to BLAS2 routines when the operator is a dense matrix.By default, Julia ships with OpenBLAS and provides multithreaded routines.Since Julia 1.6, users can also switch dynamically to other BLAS backends, such as the Intel MKL, BLIS or Apple Accelerate, thanks to the BLAS demuxing library libblastrampoline, if an optimized BLAS is available.
A "Storage Requirements" section is available in the documentation to provide the theoretical number of bytes required by each method.Our implementations are storage-optimal in the sense that they are guaranteed to match the theoretical storage amount.The match is verified in the unit tests by way of functions that return the number of bytes allocated by our implementations.

Examples
Our first example is a simple implementation of the Gauss-Newton method without linesearch for nonlinear least squares.It illustrates several of the facilities of Krylov.jl:solver preallocation and reuse, genericity with respect to data types, and linear operators.Another example based on a simplistic Newton method without linesearch for convex optimization is also available in the documentation, and illustrates the same concepts in the sections "In-place methods" and "Factorization-free operators"., 2, 3, 4, 5, 6, 7, 8]  # Check the solution returned by the Gauss-Newton method @test norm(x -x_exact) ≤ 1e-4 Our second example concerns the solution of a complex Hermitian linear system from the SuiteSparse Matrix Collection (Davis & Hu, 2011) with an incomplete Cholesky factorization preconditioner on GPU.The preconditioner is implemented as an in-place linear operator that performs the forward and backward sweeps with the Cholesky factor of the incomplete decomposition.Because the system matrix is Hermitian and positive definite, we use the conjugate gradient method.However, other methods for Hermitian systems could be used, including SYMMLQ, CR, and MINRES.

#
Transfer the linear system from the CPU to the GPU A_gpu = CuSparseMatrixCSR(A_cpu) b_gpu = CuVector(b_cpu) # Incomplete Cholesky factorization LLᴴ ≈ A with zero fill-in P (T, m, n, symmetric, hermitian, (y, x) -> ldiv_ic0!(P,x, y, z)) # Solve an Hermitian positive definite system with an incomplete Cholesky factorization preconditioner x_gpu, stats = cg(A_gpu, b_gpu, M=P⁻¹) # Check the solution returned by the conjugate gradient method x_cpu = Vector{ComplexF64}(x_gpu) @test norm(x_cpu -x_exact) ≤ 1e-5 end (Balay et al., 2023)omparing existing methods with each other as well as new ones.The number of distinct Krylov methods is twenty-two for PETSc(Balay et al., 2023), eleven for MATLAB (2022) and KrylovMethods.jl, nine for IterativeSolvers.jl and three for KrylovKit.jl.However Krylov.jldoesn't have implementations of recycling Krylov methods nor block Krylov methods, unlike some alternatives, except for special cases, including TRICG, TRIMR, and GPMR.Note that we only consider the number of Krylov methods that generate different iterates without preconditioning.Variants with preconditioning are not counted except flexible ones such as FGMRES.