Using semantic segmentation to boost reinforcement learning performance

This was my master thesis project and, although my master thesis was focused on supervised training for Computer Vision, I wanted to do something more related to reinforcement learning, so I decided to do a project that combined both, and as initial results were promising, I extended this work during my first PhD months, with great results, and published a paper, that you can read here!. Since publishing my master thesis, I have seen some similar works also done with the Doom videogame in the same line, and my paper has been cited by other guys doing their master thesis in a similar line (but with other games such as Sonic), so I'm glad it served as inspiration for other people!

In this blog-style entry I will resume some of the main ideas of the paper, and overline the main contributions of the project.

The main idea was: "To complete levels from the super mario videogame, I don't really need textures, I just need to know what type of element everything in the frame is", and as such, semantic segmentation seemed like the perfect tool to achieve this. Therefore, I built a system that, given a frame from the game, would return a segmentation map, and then I used this segmentation map as input for a reinforcement learning agent, instead of the original frame. The final system looks like this:

To train the semantic segmentation model, I generated synthetic frames using cutouts that were combined procedurally, with some rules depending on the data type. Afterwards, I trained a DeepLabV3 model with a ResNet-50 backbone, and used it as a pre-processing for the image seen by the reinforcement learning model (based on Double Deep Q-Learning).

The result was that the RL model was able to converge in fewer episodes than the baseline model, and also achieved a higher score in the game, as can be seen on the following figure:

The colored area represents the 95% confidence interval, and the line represents the median score, calculated over 5 different training runs each.

But my favourite result from the paper came when training the RL model on multiple levels at the same time. Trained on a round-robin fashion, with the model playing one of three levels on each episode, the baseline model without segmentation was unable to learn how to play any of the three levels, with noticeable variation in performance across levels, while the model that uses segmentation is able to reach higher performance on all three levels, with all levels performing similarly. Horizontal lines represent mean performance.

Per-Level Reward Evolution Training on Multiple Levels

Baseline

With Semantic Segmentation

In the paper I also try other RL algorithms, and evaluate performance improvemends after finetuning, if interested, you can read the full paper here, and all the code is available from this repository.