Analysis of the Impact of Randomization of Search-Control Parameters in Monte-Carlo Tree Search

Main Article Content

Chiara F. Sironi
Mark H. M. Winands


Monte-Carlo Tree Search (MCTS) has been applied successfully in many domains, including games. However, its performance is not uniform on all domains, and it also depends on how parameters that control the search are set. Parameter values that are optimal for a task might be sub-optimal for another. In a domain that tackles many games with different characteristics, like general game playing (GGP), selecting appropriate parameter settings is not a trivial task. Games are unknown to the player, thus, finding optimal parameters for a given game in advance is not feasible. Previous work has looked into tuning parameter values online, while the game is being played, showing some promising results. This tuning approach looks for optimal parameter values, balancing exploitation of values that performed well so far in the search and exploration of less sampled values. Continuously changing parameter values while performing the search, combined also with exploration of multiple values, introduces some randomization in the process. In addition, previous research indicates that adding randomization to certain components of MCTS might increase the diversification of the search and improve the performance. Therefore, this article investigates the effect of randomly selecting values for MCTS search-control parameters online among predefined sets of reasonable values. For the GGP domain, this article evaluates four different online parameter randomization strategies by comparing them with other methods to set parameter values: online parameter tuning, offline parameter tuning and sub-optimal parameter choices. Results on a set of 14 heterogeneous abstract games show that randomizing parameter values before each simulation has a positive effect on the search in some of the tested games, with respect to using fixed offline-tuned parameters. Moreover, results show a clear distinction between games for which online parameter tuning works best and games for which online randomization works best. In addition, the overall performance of online parameter randomization is closer to the one of online parameter turning than the one of sub-optimal parameter values, showing that online randomization is a reasonable parameter selection strategy. When analyzing the structure of the search trees generated by agents that use the different parameters selection strategies, it is clear that randomization causes MCTS to become more explorative, which is helpful for alignment games that present many winning paths in their trees. Online parameter tuning, instead, seems more suitable for games that present narrow winning paths and many losing paths.

Article Details