Dynamic Algorithm Configuration

TNT members involved in this project:
MSc. Theresa Eimer
Prof. Dr. rer. nat. Marius Lindauer

As configurations should be chosen during runtime depending on the current algorithm state, it can be viewed as a reinforcement learning (RL) problem where at each timestep an agent selects the configuration to use based on the performance in the last step and the current state of the algorithm. This enables us to use  powerful RL methods on one hand; on the other, RL also brings a set of challenges like instability, noise and sample inefficiency that need to be adressed in applications such as DAC. Therefore research on DAC also includes research on reliable, interpretable, general and fast reinforcement learning.

Large Search Space

Adapting one hyperparameter each step of an algorithm run already provides a plethora of possible hyperparameter schedules. The combination of dynamic adadptions and the optimization of multiple hyperparameters and algorithm components makes the problem much harder than the original static algorithm configuration problem which requires new optimization methods.

Complex Reasoning

It is often not immediately clear which action at which timestep will cause an improvement in an algorithm's run down the line. Any DAC method therefore needs to model complex dependecies between algorithm states, domain & instance information, performance and configurations. This requires solutions to extract knowledge from the given information to guide the learning process efficiently.

New State & Domain Descriptions

As DAC includes a temporal component in contrast to traditional algorithm configuration, DAC also requires additional information for the learning process. The state information as well as the domain information should include meaningful descriptions that help distinguish between different configuration options. Finding new ways to quantify this information content for DAC methods will help to explain their performance and behaviour as well as improve and accelerate learning.

AI Planning

In AI planning (e.g., solving complex scheduling or logistic problems), efficiency is key. Therefore an important hyperparameter is the heuristic that is used to solve the planning problem. Learning heuristic schedules for planning domains is an example of a successful application of DAC, but there are also other possibilities like learning new search strategies from algorithm components or expanding the heuristic schedules across domain boundaries.

Evolutionary Algorithms

To solve complex black-box optimization problems, evolutionary algorithms are well established methods. There are several theoretical results proving that dynamic step size adaption for evolutionary strategies is superior to static values. In fact, it has been empirically shown that it also performs better than simple hand-designed schedules. There are a number of different hyperparameters and algorithm components that determine the behaviour of an ES; however, so there is a variety of possiblities to explore with DAC.

Deep Learning

One of the key technologies in AI is deep learning these days. However, the training process of deep neural networks is often guided by hand-designed heuristics, e.g., learning rate schedules. DAC opens up way to learn these heuristics from meta-data describing the learning process and thus speeding and improving learning quality in a robust fashion.

The young, but very promising field of DAC enables us to address very basic research questions, but also to apply new insights and methods to exciting applications. Regarding the basic research for the development of new methods, we are funded by the DFG. To address applications, another industry partner is also providing funding.

Show all publications
  • Gresa Shala, Andre Biedenkapp, Noor Awad, Steven Adriaensen, Marius Lindauer, Frank Hutter
    Learning Step-Size Adaptation in CMA-ES
    Proceedings of the Sixteenth International Conference on Parallel Problem Solving from Nature ({PPSN}'20), September 2020
  • Theresa Eimer, Andre Biedenkapp, Frank Hutter, Marius Lindauer
    Towards Self-Paced Context Evaluations for Contextual Reinforcement Learning
    Workshop on Inductive Biases, Invariances and Generalization in Reinforcement Learning (BIG@ICML'20), July 2020
  • Andre Biedenkapp, Raghu Rajan, Frank Hutter, Marius Lindauer
    Towards TempoRL: Learning When to Act
    Workshop on Inductive Biases, Invariances and Generalization in Reinforcement Learning (BIG@ICML'20), July 2020