Using Machine Learning to Build Fair CS:GO Tournament Teams

PCA Figure: Principal Component Analysis of CS:GO Player Skill Variables

This post deviates a little from the science publication fare and is related to applied machine techiques and how they can be used in the most unusual places. Let me start first by illuminating the motivation. I am a part of the organizing body of Snooze Ry., which arranges LAN party events for youth. As you might expect, there are computer game (eSports) competitions. However, one of the issues is that the skill level of the participants varies a lot and we aim to maximise fun at the competitions. Many of the games that are played there are team-based and we try to have teams evenly matched to have the matches to be as exciting as possible. Recently we have manually assigned the best players into the different teams, but this is not a long term solution and involves a lot of manual work.

Stand Back, I am Going to Try Science

To address this issue in constructing teams for competitions a good friend of mine set out to program a more fair tournament ladder constructor. For the first test case we chose a popular first person team action game, Counterstrike: Global Offensive. The game is connected to Steam, a popular gaming platform that also records statistics and creates public profiles for players, so that we have a wealth of data available from hours played to player performance per level and the weapon of choice. This wealth of data also provided an issue: Which statistics should we use to decide the player skill? Win/lose -ratio? What if he just had good teammates? Total hours played? What if the guy is just a slow learner?

I had the bright idea to bring machine learning into the mix. Why should we decide, when we could let a machine do it? That always ends well. We chose the naive bayesian classifier for this supervised machine learning experiment (i.e. we provide enough samples to the machine and hope that it learns to classify new data from that).

Anyhow, once we were committed the system was completed pretty fast. We surveyed our friends, asked them to name good and bad players, downloaded their statistics from the Steam API and used them to show examples of good, mediocre and bad players to the machine learning algorithm. Now when the system builds teams, it does the following steps:

  1. Gets desired team sizes and player details as input
  2. Gets the Steam username for each new player and downloads stats using the Steam API
  3. Uses the trained classifier to label a player skilled, mediocre or unskilled
  4. Divides players so that each team should have as even mix of players as possible (the same average skill level for each team)

Initial Results: Player Statistics That Correlate With Our Subjective Measure of Skill

In our rather limited dataset three statistics were assigned most importance by the bayesian classifier: Total matches won in the safehouse level, win rate and total shots using the Tec9 weapon. The relationship of these is visualized in the figure below. The more points awarded, the higher chances is that the player is “good”. In our dataset it appears that a player who wins 50% of safehouse level matches, has a 100% winrate and most shots with the Tec9 weapon is an ideal “good” player. For our panel’s arbitrary definition of good, that is.

Bayesian Classifier Stats Figure: “Skill” Points Awarded by the Classifier for Three Most Important Player Statistics


Naive Bayes is a machine learning method where an algorithm uses probabilistic analysis to categorize a collection of values (e.g. measurements from some objects) to one of the predefined categories. Learn more in Wikipedia or from a Coursera course.

Also, Lappeenranta University of Technology is arranging a machine learning course under the course name CT10A7060 Advanced Topics in Software Engineering during the intensive week May 16th – 20th, 2016. If you are a student at our university, sign up using the course management system Weboodi or fire me off a message in Twitter and I’ll connect you with the lecturer.

Computer-Supported Collaborative Learning in Software Engineering Education: A Systematic Mapping Study

New! My systematic mapping study on Computer-Supported Collaborative Learning in Software Engineering Education has now been published. It cites and summarizes the results of over a hundred publications in the field of CSCL.


A systematic mapping study (SMS) is a secondary study that aims at classification and thematic analysis of earlier research. According to Kitchenham and Charters performing a SMS can be especially suitable if few literature reviews are available on the topic and there is a need to get a general overview of the field of interest. It can also be used to identify research gaps in the current state of research.

Paper Abstract

Computer-supported collaborative learning (CSCL) has been a steady topic of research since the early 1990s, and the trend has continued to this date. The basic benefits of CSCL in the classroom have been established in many fields of education to improve especially student motivation and critical thinking. In this paper we present a systematic mapping study about the state of research of computer-supported collaborative learning in software engineering education. The mapping study examines published articles from 2003 to 2013 to find out how this field of science has progressed. Ongoing research topics in CSCL in software engineering education concern wider learning communities and the effectiveness of different collaborative approaches. We found that while the research establishes the benefits of CSCL in several different environments from local to global ones, these approaches are not always detailed and comparative enough to pinpoint which factors have enabled their success.

Read More

Preprint is available in ResearchGate.

The NAILS Project

       aes(PublicationName, PublicationTotalCitations)) + 
    geom_bar(stat = "identity", fill = "orange") + 
    coord_flip() +
    theme(legend.position = "none") +
    ggtitle("Most cited publication venues") + 
    xlab("Publication venue") + ylab("Total times cited")

Figure: Some visualization code from NAILS

For the first article in my blog I’ll introduce the NAILS project. It is a collection of cloud-based tools for performing statistical and Social Network Analysis (SNA) on citation data. SNA is a new way for researchers to map large datasets and get insights from new angles by analyzing connections between articles. As the amount of publications grows on any given field, automatic tools for this sort of analysis are becoming increasingly important prior to starting research on new fields. nails also provides useful data when performing systematic mapping studies in scientific literature.


The basic design and bibliometric principles of the system have been published in a research article:

Antti Knutas, Arash Hajikhani, Juho Salminen, Jouni Ikonen, and Jari Porras. 2015. Cloud-Based Bibliometric Analysis Service for Systematic Mapping Studies. In Proceedings of the 16th International Conference on Computer Systems and Technologies (CompSysTech ‘15). DOI: 10.1145/2812428.2812442

A preprint version of the article is available for download as PDF. The official version is also available in the ACM Digital Library.


Still unclear on the topic but would like to know more? You can find more information and tutorials at the following links.

Gamifying an Introduction to Programming Course: A Case Study

Figure: A Social Network Graph of One Online Community Figure: A Social Network Graph of One Online Community

Through the magic of backdating I’m able to blog about our case study on gamification! A couple years back we did a case study on applying gamified online collaboration to an introduction to programming course with the help of the Q2A platform. The case study on increasing collaborative communications with gamification had positive outcomes in increasing the students’ mutual online support. The case study paper got a pretty good reception, too, and was the 6th most downloaded article in the conference’s digital library at the time. Now we are working on our gamified platform and I hope to blog about the new system soon.

Read More

A pre-print version of the case study paper is freely available to view in ResearchGate and the full version in the ACM Digital Library.


We used social network analysis on the communication logs to examine the shape of the community in order to discover if true mutual support was established between the students. After analysing the case study results we were able to conclude that students formed the core of the community, with course staff facilitating, but not dominating the conversation. More details in the article!


In this case study we present an approach for using gamification elements to increase online student collaboration. In the study a gamified online discussion system was added to an introduction to programming course, with the aim of motivating the students to help each other. The actions in the discussion systems were analyzed and compared with user profiles and a student survey. The system had a positive impact on the course, increasing student collaboration, reduced response times and made course communications 88% more efficient by reducing email traffic.