Estimate TrueSkill-style rating movement from mu, sigma, beta, tau, draw probability, match result, and team average comparison.
| Player | Old mu | Old sigma | New mu | New sigma | mu - 3 sigma |
|---|---|---|---|---|---|
| A1 | 25.00 | 8.33 | 25.00 | 8.33 | 0.00 |
| Team | Total mu | Average mu | Team sigma | Avg conservative | Comparison |
|---|---|---|---|---|---|
| Team A | 25.00 | 25.00 | 8.33 | 0.00 | Even |
| Scenario | Result Tested | Performance Gap | V Factor | W Factor | Read |
|---|---|---|---|---|---|
| Current | A wins | 0.00 | 0.00 | 0.00 | Even |
| Parameter | Common Default | What It Changes | Practical Use |
|---|---|---|---|
| mu | 25.000 | Center of rating estimate | Higher mu means higher estimated skill. |
| sigma | 8.333 | Uncertainty around mu | High sigma moves faster after results. |
| beta | 4.167 | Game-to-game performance spread | Higher beta softens upset impact. |
| tau | 0.083 | Rating drift before each match | Use more tau when skills change often. |
| Draw probability | 10% | Draw margin width | Higher values make draws less surprising. |
| Preset | Team Size | Key Question | Expected Movement |
|---|---|---|---|
| Fresh 1v1 Match | 1v1 | How do defaults move? | Moderate, symmetric update. |
| Favored Player Upset | 1v1 | How big is an upset? | Large winner gain, favored loss. |
| Balanced 2v2 | 2v2 | How are team wins shared? | Uncertain players move more. |
| Close Rated Draw | 1v1 | What does a draw do? | Ratings pull toward each other. |
| Conservative Rating | Formula | Meaning | Leaderboard Read |
|---|---|---|---|
| Standard | mu - 3 sigma | Skill estimate with uncertainty penalty | New players start low despite mu 25. |
| Loose | mu - 2 sigma | Less penalty for uncertainty | Useful for casual ladders. |
| Strict | mu - 3 sigma | Requires confidence to climb | Useful for competitive ladders. |
| Team average | avg(mu - k sigma) | Average safe rating by team | Highlights roster balance. |
TrueSkill ratings exists in order to provide a method of converting game results to skill ratings. A single game result dont provide a person with much information about their skill, and a single game result is not enough to determine the skill of a player. TrueSkill ratings help to provide information about the result of a single game by comparing that result to the results of all previous games by that player.
Therefore, the TrueSkill system accounts for the fact that a player’s skill can be uncertain. The TrueSkill system treats each TrueSkill rating as two numbers rather than one. One number is a best guess at a players skill, and the other number is a measurement of how much that best guess could potentially be incorrect.
When game results occurs, the system updates the skill and uncertainty ratings for each player at the same time. If a player win a game against another player with a lower skill, the winning player’s skill is incremented and the uncertainty in that skill is decrease. If a player lose a game, their skill and uncertainty both decrease.
If a game ends in a draw, the players’ skills are both pulled towards each other; however, there is no large change in either player’s skill. The change in each player’s skill is based off the uncertainty band for each player. Players with large uncertainty bands will experience a greater change in their skill than players with small uncertainty bands.
Many leaderboards do not include the skill guess for each player, but instead display a conservative rating. A conservative rating is calculated by taking a player’s skill and subtracting some multiple of the uncertainty from that skill. A player that have few games played for their skill will likely have a high skill guess but very little uncertainty.
By subtracting a multiple of the uncertainty, the skill of other players will not be too impacted by a player with few games played. Therefore, this is one way in which the system can penalize players for having few games played. A calculator can be used to determine the change in conservative ratings if the multiplier is changed.
This can help to determine whether a ladder should be more generous or more strict toward new players. In the case of team games, the system require performing additional steps beyond those for individual players. The system does not consider each player in the team as having separate skill.
Instead, the system calculates an average skill for the team and an uncertainty in that average skill. Based upon the outcome of the game, the system calculates individual changes to each player’s skill. Players that have played fewer games will experience larger changes in their skill than players that have high level of certainty in their skill.
Thus, when players that are new to a team sport experience a winning or losing team, their changes in ranking will be faster than for more experienced players. The draw probability for a game can be set to be high or low. A high draw probability will cause the system to treat matches between players of similar skills as draws, but lower draw probabilities will treat those same games as having a clear winner and loser.
For some games, such as chess, draws are very common, but in other games draws may be very rare. Thus, the draw probability should be adjusted to account for these different games. A calculator can be used to view the changes in skill if the draw margin is changed.
The beta parameter can be used to determine how much a player’s skill can vary from game to game. High values of the beta parameter will cause most games to have high levels of noisy outcomes, whereas low values of the beta parameter indicate that the better player will win the majority of games against players of lesser skill. This value can be adjusted to account for the common occurrence of upsets in a population of players.
The tau parameter is a small drift term that is used to ensure that a player’s rating does not remain frozen once they stop playing. If tau were 0, a player who stopped playing would retain their current skill level for the rest of time. Instead, the uncertainty in a player’s skill increases over time.
This means that when a player returns to playing games, their early games will have more weight than if they had simply returned to the game after taking a short break. These parameters can all be adjusted. Microsoft published the default parameters for the system, and these are the parameters that is used in the Xbox platform for matchmaking.
The other parameters can be adjusted, but each parameter should be adjusted one at a time so that the effect on conservative ratings and update size can be determined. If new players are climbing too quickly in rank, then the conservative multiplier should be increased. Draw probabilities can be increased if there are too many draws between skilled players, or decreased if there are too many games that have clear winners and losers.
For veterans who have not played for many months, their skill should be adjusted to update if the tau parameter is increased. The values of these parameters is important to understand how the system is functioning. Each adjustment to a player’s games can be simulated with the system, and the outcomes of those games can be used to understand if a change to the ladder is beneficial.
These numbers can be used to determine the balance of each team prior to each game. Thus, incorporating these ratings can remove the guesswork in determining team balance. A TrueSkill rating is not just a single number, but it is a skill rating and a measurement of how much that rating could potentially be incorrect.
The fact that both of these numbers are provided for each player is what ensures that the system is honest and does not provide inaccurate rating to players.
