OpenSkill: A faster asymmetric multi-team, multiplayer rating system

Assessing and comparing player skill in online multiplayer gaming environments is essential for fair matchmaking and player engagement. Traditional ranking models like Elo and Glicko-2, designed for two-player games, are insufficient for the complexity of multi-player, asymmetric team-based matches. To address this gap, the OpenSkill library offers a suite of sophisticated, fast, and adaptable models tailored for such dynamics. Drawing from Bayesian inference methods, OpenSkill provides a more accurate representation of individual player contributions and speeds up the computation of ranks. This paper introduces the OpenSkill library, featuring a Python implementation of the Plackett-Luce model among others, highlighting its performance advantages and predictive accuracy against proprietary systems like TrueSkill. OpenSkill is a valuable tool for game developers and researchers, ensuring a responsive and fair gaming experience by efficiently adjusting player rankings based on game outcomes. The library's support for time decay and diligent documentation further aid in its practical application, making it a robust solution for the nuanced world of multiplayer ranking systems. This paper also acknowledges areas for future enhancement, such as partial play and contribution weighting, emphasizing the library's ongoing development to meet the evolving needs of online gaming communities.


Statement of need
Bayesian inference of skill ratings from game outcomes is a crucial aspect of online video game development and research.This is usually challenging because the players' performance changes over time and also varies based on who they are competing against.Our project primarily targets game developers and researchers interested in ranking players fairly and accurately.Nevertheless, the problem that the software solves applies to any context where you have multiple players or entities and you need to track their skills over time while they compete against each other.
The OpenSkill library furnishes a versatile suite of models and algorithms designed to support a broad spectrum of applications.While popular use cases include assisting video game developers and researchers dealing with multi-agent scripting environments like Neural MMO (Suarez et al., 2019), its practical use extends far beyond this particular domain.For instance, it finds substantial utilization in recommendation systems, where it efficiently gauges unique user behaviours and preferences to suggest personalized recommendations.The matchmaking mechanisms in ranking of sports players as seen by Opta Analyst (Rico, 2022) and dating apps are another area where OpenSkill proves crucial, ensuring an optimal pairing based on the comparative ranking of user profiles' competencies.
Derived from the research paper by Weng and Lin (Weng & Lin, 2011), OpenSkill offers a pure Python implementation of their Bayesian approximation method for probabilistic models of ranked data.OpenSkill attempts to solve the same problems TrueSkill does.TrueSkill however employs factor graphs to model the probability distributions of players' skills, updating their ranks through Bayesian inference after each game by evaluating the likelihood of observed outcomes.Similar to TrueSkill this library is specifically designed for asymmetric multifaction multiplayer games.In the games it's intended for, the term "asymmetric" means that teams might have varying numbers of players.For example, one team could have three players while another has just one.This creates an uneven playing field where the challenge is to balance these differences.The term "multi-faction" means that there are several distinct teams or groups within a single game.Unlike simple one-on-one contests, these games feature multiple teams, each potentially with a different number of players, all competing in the same match.This library aims to assess and balance player skill in such dynamic and complex game environments.
OpenSkill boasts several advantages over implementations of proprietary models like TrueSkill.Notably, it delivers faster rating updates, with 3 times the performance of the popular Python open-source implementation of TrueSkill as seen in Lee (2018).OpenSkill also includes five distinct models, each with its unique characteristics and tradeoffs.While all the models are general purpose, the recommended model for most use cases is Plackett-Luce.This model extends the regular Plackett-Luce as described in Guiver & Snelson (2009) by incorporating variance parameters to account for the probability that a certain team is the winner among a set of competing teams.
The Plackett-Luce model can be thought of as a generalized extension of the Braldey-Terry model originally introduced in Bradley & Terry (1952).Both models follow logistic distribution, while in contrast, the Thurstone-Mosteller model follows the Gaussian distribution.Both models can be also used with partial pairing and full pairing approaches for rating updates.Partial pairing models engage only a subset of players who are paired with each other during rating updates.This strategy considerably improves computational efficiency while sacrificing a certain level of accuracy.On the other hand, full pairing models leverage all available information within the paired data to make precise rating updates at the cost of increased computational requirements.

Usage
To install the library simply pip install openskill and import the library.A conventional example of usage is given below: Each player has a mu and a sigma value corresponding to their skill (µ) and uncertainty (σ) in skill.Comparisons between two players can be done by calling the ordinal() method.In this case it would be on the instances of PlackettLuceRating.

Benchmarks
A reproducible set of benchmarks is available in the benchmark/ folder at the root of the openskill.pyrepository.Simply run the appropriate Jupyter Notebook file to run the relevant benchmark.
Using a dataset of Overwatch (Joshy, 2023)  Using a dataset of chess matches, we also see a similar trend, where OpenSkill gives a similar predictive performance to TrueSkill, but in less time.
It should be noted that the difference in speed may be partially due to the the efficiency of the TrueSkill implementation in question.For instance, switching to Scipy backend in the TrueSkill implementation slows the inference to around 8 seconds even though we should be expecting a speedup since Scipy drops into faster C code.
Finally, running the project against a large dataset of PUBG online matches results in a Rank-Biased Overlap (Webber et al., 2010) of 64.11 and an accuracy of 92.03%.

Discussion
Our OpenSkill library has demonstrated significant improvements over proprietary models in terms of both speed and efficiency.However, we recognize that there are still areas that warrant further exploration and improvement.
One such area is partial play.Ideally, a comprehensive skill ranking system should take into account both the winning and losing side of a game and adjust their ratings accordingly.Partial play, where only a subset of players are engaged during a match, presents a unique challenge in this regard.While it is theoretically easy to implement this feature, the lack of relevant data makes it difficult for us to verify its efficacy.Consequently, any modifications we make to such models run the risk of overfitting the available data.The absence of a clearly defined metric for partial play further complicates matters, as different groups interpret it in various ways.Our interpretation of partial play pertains to the duration a player participates in a game, but significant work is required to operationalize this concept in a tangible way within our library.
More substantially, as of now, OpenSkill does not support weight integration, where weights represent a player's contributions to an overall victory.The ability to assign different significance to different players based on their contributions could greatly improve the accuracy of a player's resulting skill rating.We realize the value of this feature, and it is a primary area of focus in our ongoing improvements to the library.
On a positive note, OpenSkill does indeed support time decay, an important aspect of maintaining an accurate skill rating system.Over time, a player's skill can decrease due to inactivity; our library allows users to adjust the sigma value accordingly.This feature ensures that our library maintains its adaptability and relevance even when faced with variable player engagement levels.
Despite these limitations, our OpenSkill library remains a powerful tool for video game developers and researchers tasked with competently evaluating player skills.It addresses several long-standing issues encountered in multiplayer video game ranking systems.By continuously seeking out improvements and refining our approach, we hope to make OpenSkill an ever more effective and flexible resource in the world of online gaming.

Related Packages
This project was originally a direct port of the openskill.jsproject (Busby, 2023) from Javascript to Python.However, we have deviated slightly from their implementation in that we focus more on Python-specific features, and thorough documentation of every object.All documented objects have the mathematical formulas from their respective papers included for easier inspection of code.We also provide an easy way to customize all the constants used in any model very easily.There are also published ports of OpenSkill in Elixir (Busby, 2020), Kotlin (Brezina, 2022) and Lua (GitHub -Bstummer/Openskill.lua -Github.com, 2022) on GitHub.
When comparing our OpenSkill to similar packages like that of Lee's TrueSkill implementation, we also provide support for PyPy 3, which uses a Just-In-Time compiler as opposed to the standard CPython implementation.We also support strict typing of objects, to enable auto-completion in your Integrated Development Environments (IDEs).Our development workflow also requires a test coverage of 100% for any code to be merged.This serves as a starting point to prevent erroneous math from making it into the library.