Les missions du poste

Établissement : Université Côte d'Azur École doctorale : STIC - Sciences et Technologies de l'Information et de la Communication Laboratoire de recherche : I3S - Informatique, Signaux et Systèmes de Sophia-Antipolis Direction de la thèse : Denis PALLEZ ORCID 0000000153588037 Début de la thèse : 2026-10-01 Date limite de candidature : 2026-05-03T23:59:59 Dans ce travail, nous pensons que l'apprentissage automatique, et plus particulièrement l'apprentissage par renforcement (RL) avec
l'algorithme Monte Carlo Tree Search (MCTS) (Swiechowski et al. 2023 ; Browne et al. 2012), devrait donner de meilleurs résultats car il
est bien adapté à l'optimisation combinatoire.
Il a été combiné à l'apprentissage profond (DL) pour battre le champion du monde au jeu de Go (Silver et al. 2016).
MCTS utilise une politique pour équilibrer l'exploration et l'exploitation lors de la sélection de l'état le plus intéressant dans l'espace de
recherche. (Sabharwal et al. 2012) ont lancé cette approche en combinant la limite supérieure de confiance pour les arbres (UCT) avec les
solveurs CPLEX. L'idée de base est d'hybrider MCTS et évolution artificielle (AE), puisque MCTS a prouvé sa supériorité pour
l'optimisation combinatoire et AE pour l'optimisation continue. En outre, les deux techniques ont été adaptées à d'autre type d'optimisation.
Quelle est donc la meilleure combinaison pour MIO ? Comme nous avons récemment amélioré l'algorithme MCTS dans l'espace continu
(Michelucci et al. 2024), nous nous demandons quelle technique AE devrait être remplacée par l'approche MCTS ? Il est également
possible d'envisager les SCTM pour l'optimisation discrète et continue, et à notre connaissance, cela n'a pas été étudié. Mixed-integer optimization (MIO), which combines continuous and discrete variables, is
a key area in operations research and artificial intelligence. It is used in complex problems where the variables
to be optimized are not only continuous, such as weights or proportions, but also discrete, such as binary choices
or integer assignments.
This type of optimisation is fundamental to a wide range of applications, such as planning and scheduling
(production management, logistics), design of complex systems (engineering, communication networks, bioinformatics),
energy optimisation (smart grids, resource allocation), machine learning (setting hyperparameters
in hybrid models). For instance in production, a factory producing several types of product has to decide how
much of each product to make to maximise its profit, while respecting the production capacity and market demand
constraints. In this case, the quantities of products produced could be considered as continuous variables
and decisions to switch certain machines on or off could be integer variables. More formally, the goal is to
minimize (or maximize) a function f defined on d discrete and c continuous variables (d, c > 0). Furthermore,
in most real-world applications, the function f is not defined analytically and is often the result of numerical
simulations, which makes it impossible to use the derivative. As a consequence, free-derivative methods as
metaheuristics or Artificial Evolution (AE) (Del Ser et al. 2019) may be used (Ploskas and Sahinidis 2022).
AE are nature-inspired and stochastic algorithms that mimic Darwin's theory for problem optimisation1 by
evolving a set of candidate solutions in silicio using selection, mutation, crossover and reproduction operators.
According to (Talbi 2024), MIO techniques can be divided into two different approaches: global or decomposition-
based. In the global approach, the optimisation process is performed on the entire mixed variable space
by considering the discrete variables as continuous variables, or discretising continuous variables. In the opposite,
the decomposition-based approach involves optimising separately the continuous variables from the discrete
ones. In this case, a collaboration strategy gives good performances. It decomposed the initial problem into several
sub-problems, each of which is solved in a separate process to generate partial optimal solutions. All search
processes collaborate together to construct complete solutions to the initial problem. Most of techniques in the
literature studied by (Talbi 2024) mainly considers population-based metaheuristics for optimising both continuous
and discrete variables. Some hybrid approaches combining global and decomposition-based approaches
are mentioned, but most are based on a single AE technique such as DE, PSO or ACO... In most cases, the
algorithm, initially dedicated to continuous optimisation is then adapted to deal with discrete variables. Research questions. In this work, we are convinced that machine learning, and more specifically reinforcement
learning (RL) with the Monte Carlo tree search (MCTS) algorithm (´Swiechowski et al. 2023; Browne
et al. 2012), should give better results as it is well suited for combinatorial optimization. It has been combined
with deep learning (DL) to beat the world champion in Go (Silver et al. 2016). This technique uses a policy to
balance exploration and exploitation when selecting the most interesting state in the search space. (Sabharwal
et al. 2012) initiated this approach by combining the upper confidence bound for trees (UCT) with CPLEX
solvers. The basic idea is to hybridise MCTS with AE, since MCTS has proven its superiority for combinatorial
optimisation and AE for continuous optimisation. In addition, both techniques have been adapted for the other
type of optimisation. So, which combination would be more suitable for MIO? As we have recently improved
the MCTS algorithm in continuous space (Michelucci et al. 2024) and verify on standard benchmarks (Amoussou
et al. 2026), we wonder which AE technique should be replaced by the MCTS approach ? It may also be
possible to consider MCTS for both discrete and continuous optimisation, and as far as we know this has not
been investigated. Methodology and Work Plan. This PhD proposal is ambitious because it mixes two different subfields
of computer science such as stochastic optimisation with artificial evolution and reinforcement learning with
MCTS. The PhD work could be divided into the following phases:
1. starting with Artificial Evolution by developing an AE technique applied to MIO and comparing the
results for the different approaches (global and decomposition-based) on the benchmark functions;
2. use the MCTS4R library developed at the I3S laboratory for educational purposes to apply MCTS to
MIO, and using the same experimental protocol as in the previous step;
3. investigate different hybridisations between AE techniques and MCTS for the optimisation of continuous
and discrete variables.

Le profil recherché

Le candidat devra être titulaire d'un M2 ou grade équivalent au moment du recrutement.

Postuler sur le site du recruteur