ProjectB: Bayesian Optimisation

ProjectB was my Master’s thesis at Imperial College London. Supervised by Dr Ruth Misener and Dr Marc Deisenroth and in collaboration with Dr Caroline Baroukh and Dr Benoit Chachuat.

Below is Academic description of my work, whist in future posts, I plan to breakdown the project into smaller pieces.

Bayesian Optimisation (BO) is a data-efficient, global black-box optimisation method optimising an expensive-to-evaluate fitness function; BO uses Gaussian Processes (GPs) to describe a posterior distribution over fitness functions from available experiments. Similar to experimental design, an acquisition function is applied to the GP posterior distribution over fitness functions to suggest the next (optimal) experiment. Dynamic models of biological processes allow us to test biological hypotheses without running costly real-world experiments. But model construction requires estimating biological parameters (e.g. reaction rate kinetics) from costly experiments. BO efficiently estimates the parameters and thereby reduces the number of model simulations. We focus on parameter estimation for a dynamic model of microalgae metabolism [1]. In biological parameter estimation, Bayesian Optimisation (BO) is challenging because the parameters interact nonlinearly and the broad parameter bounds result in a huge search space. Due to the high problem dimensionality (in this context, 10 parameters), balancing exploration versus exploitation becomes more intricate and traditional Bayesian methods do not scale well. Therefore, we introduce a new Dimension Scheduling Algorithm (DSA) to deal with high dimensional models. The DSA optimises the fitness function only along a limited set of dimensions at each iteration. In each iteration, a random set of dimensions is selected to be optimised. This reduces the necessary computation time, and allows the dimension scheduling method to find good solutions faster than the traditional method. The increased computational speed stems from the reduced number of data points per each GP and the reduced input dimensions in the GP; GPs scale linearly in the number of dimensions but cubically in the number data points. Additionally, considering a limited number of dimensions at each node allows us to easily parallelise the algorithm.

Compared to commercial parameter estimation for biological models and a traditional Bayesian Optimisation algorithm, our approach achieves strong performance in significantly fewer experiments and a reduced computation time. We also design and provide a graphical user interface (GUI), which allows untrained users to optimise any model that can be invoked through a command line. The GUI is built on top of a modular Bayesian Optimisation library, pybo [2], which includes most common acquisition functions and kernels. The framework removes the barrier of programming language by providing the user with a straightforward user interface to set BO parameters, observe the optimisation as the code runs, and examine the GP after the experiment has been completed.

You can find the code and the GUI on my github profile, ProjectB. And the thesis is available here.

ProjectB has been presented at: