Problem Detail: In some (historical) papers, chess has been referred to as the drosophila of artificial intelligence. While I suppose that in current research, the mere application of a search algorithm is at best advanced computer science, I believe that there are still area’s where can apply (and practice) AI-techniques. A simple example would be opening book learning where one can teach the program whether to use or to not use certain moves in the opening because the program is unsuited to certain types of position. We can use a form of re-inforcement learning and automate this: I suppose I could play the program against itself and increase the probability of winning lines and decrease the probability of losing lines. The more complex example is to use a learning evaluation function (for example, one could tweak the values of piece-square tables). However, I’m thinking:
- given all the noise due to there being an enormous amount of realistic positions (as opposed to the amount of realistic opening lines)
- and with the cost (duration) of a computer chess game, and the need to play loads.
How can one do this effectively? (or should I look at other techniques, for example neural networks.)
Asked By : ljgw
Answered By : BartoszKP
The whole state space for chess is enormous – it can be roughly estimated as 1043 (Shannon number (Shannon, 1950), (Wikipedia)). The idea you present – Reinforcement Learning agents playing with each other to learn the game – was successfully applied to Backgammon TD-Gammon (Tesauro, 1995), (Chapter in Reinforcement Learning by Sutton&Barto). It also used Neural Networks to estimate game’s value function. This problem is however much simpler, as number of states in Backgammon is significantly smaller than in chess, namely: 18,528,584,051,601,162,496 (Backgammon Forum Archive thread). If you, however, would end the game after few initial moves and aim only to learn “good openings” you could succeed with analogous approach. The main problem would be to evaluate the game after the opening game, which seems hard. Just a similarity measure to the established positions after well known openings is not enough, because position can be far from them if opponent would make a stupid move (so it wouldn’t be because of learning agent’s mistake, so the position even if “incorrect” should be evaluated as a good outcome). References:
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/21970