Engine Detection In Online Chess

Recently, as a consequence of the quarantines as well as the success of Netflix's Queen's Gambit, many new players have become interested in chess. A natural consequence is that the number of cheaters also increased at least in proportion to the amount of new players. Cheating is, and has always been, a problem in online chess.

Since many people are quick to accuse others of cheating (and are sometimes even right!), there are frequently discussions on various chess forums (Reddit, etc.) about cheating, during which it's not uncommon to encouter bizarre or false claims about cheat detection in online chess. In this article, I will go through some of the methods that Lichess uses for detecting cheaters by perusing their freely available source code to hopefully help others understand cheat detection. I'll try not to give hints to cheaters on how to avoid detection, though.

I should also say that I'm not a Lichess developer and not an expert on Scala (the language Lichess is largely written in), so it's quite possible I've misunderstood some things. Without further ado..

Suspicious behavior

Everyone who has played a significant amount of online chess has probably run in to a player that seems to make unnatural, engine-like moves way beyond his rating level. This may be frustrating, but moves simply seeming "too good" is no proof that they actually did come from an engine. This is why, instead of just relying on people's reports, Lichess analyzes games using objective criteria.

When someone triggers (due to reports, etc) Lichess' cheat detection algorithms, it will take a bunch of games played by the suspected users and analyze them for various behaviors. I won't go in to detail to avoid giving cheaters too many hints, but many of these are quite obvious. For example, one criterion is the accuracy of moves compared to engine (and how quickly they were played -- getting Stockfish's top suggestions in 0.5 seconds is a tad suspicious).

I will also say that if you think Lichess won't detect "clever" browser plugins, think again ;)

Based on these criteria, then, Lichess classifies games in to categories:

  1. Cheating
  2. Likely cheating
  3. Unclear
  4. Unlikely cheating
  5. Not cheating

It should be quite obvious what these mean. Though the category is named "cheating", getting a game flagged as a "cheating" game doesn't necessarily prove the user really did use an engine: this is just a heuristic criterion. Someone could, after playing hundreds of games, play a nearly perfect game by chance and there is nothing particularly weird about that (even I have played an extremely accurate game once, which well could have been flagged cheating by Lichess).

Once Lichess has gone through several of the player's games, a total player assessment is performed.

Player assessment

Simply having one or two suspicious games out of hundreds is not grounds for a ban, according to the logic of Lichess. Instead, after games have been analyzed, Lichess calculates something called "weightedCheating" and "weightedLikelyCheating" scores.

These weighted scores are calculated by taking the number of cheating games (and likely cheating games) and then multiplying these by factors depending on what is considered more or less suspicious. For example, an accurate game in the classical time format is obviously less suspicious than an accurate game in bullet. So, for example, if a user got flagged as "cheating" in two classical games, that might be counted as less than 2 "weighted cheating" games.

Another number Lichess calculates is the proportion of (weighted) suspicious games. If a user has a total of 3 "cheating" games, but has played 10000 games, he probably isn't really cheating.

The number of weighted cheating games and the proportion of them is then compared to pre-set limits. New users who come right out of the gate playing like Stockfish are compared to stricter limits, older users are compared to a little bit more lax limits. Nevertheless, if these numbers get too high, you get flagged as an engine and banned.

There is one exception, which is what Lichess calls "great users". These are users who have played more than 100 games and whose maximum rating is more than 2500. Such users, since they didn't get banned before reaching 100 games, are probably just extremely good. Instead of being flagged "engine" straight away, they are flagged as "reported", which I assume (though don't know) means a human will do manual review.

Conclusion

Automatic cheat detection relies on objective criteria and is statistical in nature. It is not possible to determine if someone is cheating from just one game. Such cheat detection is also necessarily incomplete, because you have to try to make sure that you're at least not banning innocent users. Another limitation is that you can't really do extensive and deep statistical analysis on every player, as that would require a huge amount of computing resources.

For those interested in learning more about cheat detection, I recommend googling the name of Ken Regan, an expert on this topic; you can find a lot of fascinating material, like this video.

Back to the main site.