JavaPermutationTools : A Java Library of Permutation Distance Metrics

Permutations can represent a wide variety of ordered data. For example, a permutation may represent an individual’s preferences (or ranking) of a collection of products such as books or music. Or perhaps a permutation may represent a route for delivering a set of packages. Permutations can also represent one-to-one mappings between sets (e.g., instructors to courses at a fixed time). There are applications where measuring the distance between a pair of permutations is necessary. For example, a recommender system may assess the similarity of two individuals’ preferences for music to make song recommendations. Depending upon the application, the permutation features most important to distance calculation may be the absolute positions of the elements (e.g., one-to-one mappings), the adjacency of elements (e.g., the routing example), or general precedence of pairs of elements (e.g., music preferences). Thus, it is no surprise that there are many permutation metrics in the research literature. Knuth’s seminal books on algorithms (Knuth, 1997, 1998a, 1998b) cover permutation related algorithms more generally such as mixed radix representation, permutation inverse computation, etc.

The motivation and origin of this library is our research on fitness landscape analysis for permutation optimization (Cicirello, 2014(Cicirello, , 2016(Cicirello, , 2018a;;Cicirello & Cernera, 2013).In a permutation optimization problem, solutions are represented by permutations of some set, and the objective is to maximize or minimize some function.For example, a solution to a traveling salesperson problem is the permutation of the set of cities that corresponds to the minimal cost tour.During our research, we developed a Java library of permutation distance metrics.Most of the distance metrics in the literature are described mathematically with no source code available.Thus, our library offers convenient access to efficient implementations of a variety of metrics with a common programmatic interface.
The library also provides metrics on sequences (strings and arrays of various types); where unlike a permutation, a sequence may contain multiple copies of the same element.
The source repository (https://github.com/cicirello/JavaPermutationTools)contains source code of the library, programs that provide example usage of key functionality, as well as programs that reproduce results from papers that have used the library.API documentation is hosted on the web (https://jpt.cicirello.org/).

Statement of Need
The target audience of this library are those conducting computational research where the similarity of permutations or sequences must be assessed, or for which other computation on permutations is required (e.g., includes functionality for generating and mutating permutations in various ways).Permutation distance is important to those developing recommender systems, and also important to those applying evolutionary computation to the solution of permutation optimization problems.
Evolutionary computation, such as genetic algorithms, solve problems through simulated evolution (Mitchell, 1998).They maintain a population of solutions to the problem, and this population evolves over many generations using operators such as mutation and crossover.Just as in natural evolution, a diverse gene pool is important.In later generations, if variation within the population declines, then search can stagnate.Population management (Sevaux & Sörensen, 2005), such as in scatter search (Campos, Laguna, & Martí, 2005), attempts to maintain population diversity, requiring a measure of distance.
In search landscape analysis, one must often compute the distance between points on the landscape.A fitness (or search) landscape (Mitchell, 1998) is the space of possible solutions to an optimization problem spatially organized on a landscape with similar solutions as neighbors, and where elevation corresponds to fitness (or solution quality).Peaks (maximization problems) and valleys (minimization problems) correspond to locally optimal solutions.The problem is to find an optimal point on that landscape.Search landscape analysis deals with the theoretical and practical techniques for studying what characteristics of a problem make it hard, how different search operators affect fitness landscape topology, among others.There has been much work on fitness landscape analysis, including for permutation landscapes (Cicirello, 2014(Cicirello, , 2016(Cicirello, , 2018a;;Cicirello & Cernera, 2013;Hernando, Mendiburu, & Lozano, 2016;Schiavinotto & Stützle, 2007;Sörensen, 2007;Tayarani-N & Prugel-Bennett, 2014).Fitness landscape analysis techniques, such as fitness distance correlation (FDC) (Jones & Forrest, 1995) and search landscape calculus (Cicirello, 2016) require distance metrics for the type of structure you are optimizing.

The Metrics of the Library
The following table summarizes the permutation distances in the library, their runtimes (n is permutation length), and whether they satisfy the metric requirements.