SensAI: Intuitive AI based learning in the game of Go

If you're familiar with the developments in computer Go over the recent years and want to skip the intro, you can jump straight to the juicy parts.
Otherwise...

Introduction

In the year of 2016 a tremor ran through the Go playing world. For the first time, a computer program managed to beat a human world class player in the Game of Go. That program was called AlphaGo and it managed to best one of the strongest and most exciting players of modern Go history - Lee Sedol - with a score of 4 - 1.

The AlphaGo documentary

The event marked a changing point for the Go community. Up until that moment Go players had gleefully enjoyed the fact that their game was still a domain of human excellence, unconquered by soulless machines which they thought could never grasp essential human qualities like intuition and creativity which they deemed necessary to master their world of 19 by 19 points. A world more intricate, complex and beautiful than the sad 8 by 8 squares of chess for example - where piece movement is constricted and possibilities are limited enough to have enabled dumb calculators to find superior moves since decades ago. A free world - where humanity could thrive and endure forever.

Little however did they anticipate the true power of neural networks. Initially trained on thousands of games of human experts, AlphaGo continued to grow by playing against itself over and over again and adjusting its understanding of the game according to the results. Hundreds, thousands, millions of times - the experience of an untold number of human lifetimes of play amassed in a few days. No need for rest or food or sleep. Equipped only with the drive to improve for as long as its creators wished.

The victory against Lee Sedol was not the end of the story. Deepmind continued to improve AlphaGo's algorithms and publish and showcase their results. At the end stood a version of the program that had never seen a single move of human play and had learned exclusively by playing itself, starting from nothing. It was called AlphaGo Zero.

Even though AlphaGo was eventually retired by its creators they freely published their methods which made it possible to reimplement the approach by anyone willing to do so. This lead to the development of Leela Zero, an open source Go engine, which was trained collectively by the community crowd-pooling their computing resources - an ingenious approach in the absence of Google's computing power. Out of further community-driven improvements, KataGo was born, an even stronger version with more features supporting analysis and different playing formats. Additionally, other commercial engines powered by neural networks are available in the market. Together with their free siblings they make superhuman gameplay and analysis available to everyone with a computer.

These new possibilities were adopted by the majority of the Go community with a lot of enthusiasm. Suddenly you could get all your games analyzed and corrected by a superhuman entity at any time, foregoing the need of a stronger player donating (or selling) their time for your improvement. Studying the game on your own was never as easy as now. For professional players - who often make their livelihood through teaching - the new situation presents a challenging outlook. However they too - especially the younger generation - jumped on the new technology in order to study and train and improve their game. Since then, Go theory has evolved significantly. Established patterns have been discarded in the favor of new ones suggested by AI, also moves that were uncommon or even regarded as bad in the past have received new popularity in professional and amateur play. Most human players have accepted AI engines as their ultimate teachers. However I believe it is time to go one step further.

A new method

As explained before, AI engines are commonly used to analyze games and study variations by professionals and amateurs alike. The level of play has risen and players studying heavily with AI dominate the top ranks of players nowadays. The point I want to make in this article is that this type of learning is a conscious one and relies on heavy processing by the brain. This requires a lot of energy and time investment until new patterns are established. I believe we can additionally use AI in a different way in order to augment this learning process with an intuitive one.

The basic statistical search algorithms to calculate the best moves in Go are actually not recent developments, they have been used for quite some years and are in some form still used today in the new engines. The real advancement of neural networks is their strength in evaluation. Modern Go engines play at expert level even without a single bit of calculation. One look at a board position enables a Go AI to immediately see promising moves and strong results can be obtained with very little thinking time. This - in my view - is an equivalent of intuition, at least as far as we're willing to ascribe this quality to a computer program. And I think it is still an untapped source for further insights into the world of Go - and other domains.

The main idea of this article is to introduce live feedback to the game for training purposes. Whether it comes from a teacher, a friend or our own body, without feedback there is no learning at all - in any domain. In the game of Go, feedback can be obtained by the actual results of our games, discussing with other (ideally stronger) players and teachers and by analyzing variations and games with AI engines. All these types of feedback happen after the actual game, thus creating a significant delay between the initial action (i.e. playing a move on the board) and the feedback signal. Also, as mentioned before, the feedback has to be consciously processed and integrated into the existing domain knowledge.

I argue that it should be possible to use a form of live feedback generated by AI in order to accelerate the learning process and significantly improve intuition - exactly the elusive quality we need to efficiently prune actual calculations. The feedback should be a very simple signal that can be subconsciously understood - it should be felt, not seen.

I have implemented a prototype called SensAI that works as follows:

a process observes the player while playing
game progress is forwarded to an AI engine
engine live-evaluates the position from viewpoint of the player
program generates an audio output based on a feedback metric (win probability)
feedback signal is a heartbeat
baseline = 60bpm at 50% win probability
faster beat when win probability goes down, slower when up
a special fast heartbeat signal is generated when there is an urgent move that has to be played (one move that is significantly better than all others)
additional visual feedback has also been added to complement the audio signal

I have created a video in order to showcase the prototype. Please check it out for a demonstration:

From my own tests so far, I can offer my view on immediate benefits of the method:

constant accurate awareness of the state of the game supports good decision making and a calm mind
immediately knowing when you've made a mistake makes it easier to remember the shape and to know what to focus on in the review
some feedback patterns can give you clues about the game:
- multiple big mistakes by both players in a row means they are missing some urgent point
- mistakes can sometimes be inferred by context, you might recognize when a slow move is played or if a move is sente or not
- when the position only gets worse with each move from start to end it might be a sign that you're playing against an AI opponent

I sincerely believe that this direction of development and research is worth following up on, not only for the game of Go, but for other domains as well.

Utopic vision of AI working together with humans :)

My vision for the Go community is the creation of a Go server/application (or integration into existing ones) that - apart from normal rated games - enables players to play each other in "intuition training mode", with both receiving the same feedback signal (inverted for each side, of course) to ensure fairness. I believe that consistent, long-term training with this method (in addition to classical training methods) has the potential to significantly improve a player's judgement and feel for the game and might even be able to raise human playing level to never before seen heights in the future.

This claim is of course impossible to verify without actually having players use the technique consistently over long periods of time. I'm currently not planning on releasing the prototype in the near future, mainly because of concerns about fairness - accurately being able to judge the game state is a definite advantage after all, even if nobody tells you what to play exactly. As mentioned above, the ideal solution would be a go server that integrates this concept whole-heartedly at its core. I hope there is a good way to make this method available to anyone, but that depends of course on the question if there is enough interest in the community.

I'm definitely open to suggestions and collaborations, please feel free to contact me with ideas.

For now I want to continue to showcase the approach with additional video demonstrations to collect feedback from the community. Let me know your thoughts by commenting, and thanks for reading until the end. 🙂