Quilë: C++ genetic algorithms scientific library

This work discusses a general-purpose genetic algorithms (Holland, 1975) scientific header-only library named Quilë. The software is written in C++20 and has been released under the terms of the MIT license. It is available at https://github.com/ttarkowski/quile/. The name of the library come from the fictional language Neo-Quenya and means “color” (cf. origin of the word chromosome )

The genetic algorithm is a conceptually simple procedure. The aim is to find the most optimal solution of a given optimization problem. First, one begins with some sampling of the space of potential solutions; this can be done randomly or ad hoc. This procedure forms the first population of the evolutionary process, i.e., its first iteration. Each candidate solution from the population is evaluated in terms of its "fitness". A "fitter" candidate solution has a greater probability of becoming a parent and entering the next population of the evolution's subsequent iteration. However, less "fit" individuals still can propagate, albeit with lower probability. This trait of evolutionary computations helps preventing premature optimization to the wrong local optimum. Iterations of the evolution continue as long as the termination condition is not met. One example of a termination condition is reaching the fitness function plateau.

Overview
Nature of problem: Floating-point, integer, binary, or permutation single-objective constrained finite-dimensional optimization problems of arbitrary nature, i.e., maximization of ∶ → ℝ, , where is equal to set of logical values (i.e. false and true) or is bounded subset of set of real numbers ℝ or integer numbers ℤ.
Solution method: Single-objective constrained genetic algorithm with pure floating-point, integer, binary, and permutation representation with set of exchangeable components (e.g., variation operators, selection probability functions, selection mechanisms, termination conditions).
Unique features: Header-only and easy-to-deploy genetic algorithms C++20 library tailored for problems with computationally expensive fitness functions.

Functionality:
• Floating-point, integer, binary, and permutation genotype representations. • Automatic compile-time representation/variation compatibility checks. • Mutation operators: Gaussian, self-adaptive, swap, random-reset, and bit-flipping. • Recombination operators: arithmetic, single arithmetic, one-point crossover, and cutand-crossfill. • Canonical composition of mutation and recombination is available. • Variations can be applied stochastically. • Selection mechanisms: stochastic universal sampling, roulette wheel algorithm, and generational selection. • Selection probability functions: fitness proportionate selection with windowing procedure, linear and exponential pressure ranking selection. • Termination conditions: reaching the fitness function plateau, reaching the maximum number of iterations, reaching the given fitness function value, reaching some user defined threshold. • Conjunction and disjunction of termination conditions are supported.
• Possibility of addition of new variation operators, selection mechanisms, selection probability functions, and termination conditions at client-side code. Limitations: • Client-side program compilation time is comparatively long due to header-only nature of the library and the use of templates. • Set of variation operators is limited. In case of floating-point representation alone, the popular BLX recombination (Eshelman & Schaffer, 1993) is not implemented. • Very popular mechanisms of selection to the next generation, e.g., ( + )-selection and ( , )-selection, are not implemented. • Recombination with number of parents different than 2 is not supported.
Note: For information about API documentation, tutorial, installation instructions, test suite, code statistics, reporting problems with the library, support inquiries, and feature requests, please see the README file in the software archive.

Statement of need
Scientific use of C++ genetic algorithms libraries seems to be dominated by four software packages: GAlib (Wall, n.d.), Evolving Objects (Keijzer et al., 2002), OpenBEAGLE (Gagné & Parizeau, 2006), and Evolutionary Computation Framework (ECF) (Jakobović, n.d.). Evolving Objects, OpenBEAGLE, and ECF are written in the C++98 standard of the language, while GAlib is written in a pre-standard version of it. Those libraries, which are written in C++98, while being comprehensive and feature-rich, also tend to be relatively hard to use for the novice user inclined toward scientific computation. The reason is the comparatively complex installation process or complexity of the library itself.
The Quilë library tries to fill the niche of easy-to-use, high-performance genetic algorithms scientific libraries by implementing the features using the modern C++20 standard of the language. The software provides an easy starting point for researchers and academic teachers who need genetic algorithms and use C++ for their work. The library is available as a header-only (one file) implementation that can be installed by simply copying its source code. The user can run the accompanying examples in matter of mere minutes. On the other hand, the library also intentionally strives to be minimal, so only a limited set of use cases is covered.
The library is implemented in generic (template metaprogramming) and partly in functional programming style with elements of concurrency. Modern elements of C++ are used, e.g., Tarkowski. (2023) concepts (Sutton, 2017), alongside established constructs, e.g., substitution failure is not an error (SFINAE) (Vandevoorde & Josuttis, 2002). In order to compile programs developed with the library, a C++ compiler supporting the C++20 standard of the language is needed.
The library employs a database of already-computed fitness function values. The software is therefore well suited for optimization tasks with fitness functions calculated from simulation codes like ab initio (condensed matter physics) or finite element method calculations. This feature, however, does not exclude the use of inexpensive fitness functions. For the sake of convenience the database itself is available through the intermediary objects, which are responsible for database cohesion and lifetime management. The intermediary object type is implemented with the use of the smart pointer std::shared_ptr (Alexandrescu, 2001). Calculations of fitness function values not yet available in the database are computed concurrently in order to speed up the whole evolutionary process; thread pool design pattern is used to optimally balance the load on CPU cores (Williams, 2019).