Job title:

Post-Doctoral Research Visit F/M Theoretical Foundations of Online Convex Reinforcement Learning

Company:

Job description

Offer DescriptionThis proposal is supported by the Inria Thoth ( ) project-team and may involve collaborations with the Inria Ghost( ) project-team. It will be supervised by Pierre Gaillard.The position will be based in the Inria Center at the University Grenoble-Alpes.AssignmentsThe project will focus on theoretical aspects of convex reinforcement learning (CURL). In recent years, deep reinforcement learning (RL) has seen remarkable success in fields such as language modeling, computer vision, and robotics. However, RL relies on assumptions of linearity in the objective function, which are not always satisfied.The CURL problem generalizes RL to a convex objective. More precisely, it consists in minimizing a convex function f over the state-action distributions μ induced by an agent’s policy π by solving: $\min_{π} f(μ_{π})$Beyond RL, CURL generalizes several frameworks in machine learning, including:Pure exploration [1],Imitation learning [2],Certain instances of mean-field control [3],Mean-field games [4],Risk-averse reinforcement learning [5].The non-linearity of CURL breaks the linear structure inherent in standard RL, rendering the classical Bellman equations invalid. The theoretical performance analysis of algorithms in this general framework remains largely unexplored [6-8], and existing solutions rely on strong assumptions and require finite state and action spaces, leading to poor scalability as these spaces grow.In this postdoctoral project, we aim to lift these restrictive assumptions and extend this line of work to parametrized state and action spaces. The main challenge will be to develop an efficient solution that adapts to the effective dimension of these spaces. We also anticipate that new research directions may emerge during the visit.SkillsA Phd degree in mathematics or theoretical computer science, with specialisation optimization, machine learning, statistical learning or game theory, as witnessed by publications in relevant venues including NeurIPS, COLT, ICML, ALT, AISTATS, FOCS, STOC, SODA, EC, JMLR, GEB.References[1] E. Hazan, S. Kakade, K. Singh et A. Van Soest. “Provably Efficient Maximum Entropy Exploration”. In : Interna-
tional Conference on Machine Learning. T. 97. Sept. 2019, p. 2681-2691.[2] J. W. Lavington, S. Vaswani et M. Schmidt. “Improved Policy Optimization for Online Imitation Learning”. In :
Proceedings of The 1st Conference on Lifelong Learning Agents. Sous la dir. de S. Chandar, R. Pascanu et
D. Precup. T. 199. Proceedings of Machine Learning Research. PMLR, 22-24 Aug 2022, p. 1146-1173.[3] A. Bensoussan, P. Yam et J. Frehse. Mean Field Games and Mean Field Type Control Theory. English. Sprin-
gerBriefs in Mathematics. Springer, 2013.[4] P. Lavigne et L. Pfeiffer. Generalized conditional gradient and learning in potential mean field games. 2023.[5] J. Garcia, Fern et o Fernandez. “A Comprehensive Survey on Safe Reinforcement Learning”. In : Journal of
Machine Learning Research 16.42 (2015), p. 1437-1480.[6] B. M. Moreno, M. Bregere, P. Gaillard et N. Oudjane. “Efficient model-based concave utility reinforcement
learning through greedy mirror descent”. In : International Conference on Artificial Intelligence and Statistics.
PMLR. 2024, p. 2206-2214.[7] B. M. Moreno, M. Bregere, P. Gaillard et N. Oudjane. “MetaCURL : Non-stationary Concave Utility Reinfor-
cement Learning”. In : NeurIPS’24 : Advances in Neural Information Processing Systems. 2024.[8] B. M. Moreno, K. Eldowa, P. Gaillard, M. Bregere et N. Oudjane. “Online Episodic Convex Reinforcement
Learning”. In : arXiv preprint arXiv :2505.07303 (2025).The research mission includes the production of both theoretical and practical contributions, to be enhanced by:
– publications and presentations in machine learning or optimization conferences or journals,
– creation of Python packagesWhere to apply WebsiteRequirementsLanguages FRENCH Level BasicLanguages ENGLISH Level GoodAdditional InformationBenefits

Subsidized meals
Partial reimbursement of public transport costs
Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
Possibility of teleworking (90 days / year) and flexible organization of working hours
Professional equipment available (videoconferencing, loan of computer equipment, etc.)
Social, cultural and sports events and activities
Access to vocational training
Complementary health insurance under conditions

2788€ gross salary / monthSelection processApplications must be submitted online on the Inria website.Processing of applications sent by other channels is not guaranteed.Website for additional job detailsWork Location(s)Number of offers available 1 Company/Institute Inria Country France City Montbonnot GeofieldContact CityLE CHESNAY CEDEX WebsiteStreetDomaine de Voluceau – Rocquencourt Postal Code78153STATUS: EXPIREDShare this page

Expected salary

Location

Montbonnot-Saint-Martin, Isère

Job date

Mon, 23 Jun 2025 02:37:13 GMT

To help us track our recruitment effort, please indicate in your email/cover letter where (vacanciesin.eu) you saw this job posting.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Post-Doctoral Research Visit F/M Theoretical Foundations of Online Convex Reinforcement Learning

Job title:

Company:

Job description

Expected salary

Location

Job date

Application Design Engineer

Program Coordinator, IE Lifelong Learning (Temporary)

Business Developer pour la France – région Centre et Bretagne Copie Copie Copie

Cariste pontier (H/F)

Job Location

Job title:

Company:

Job description

Expected salary

Location

Job date

Application Design Engineer

Program Coordinator, IE Lifelong Learning (Temporary)

Business Developer pour la France – région Centre et Bretagne Copie Copie Copie

Cariste pontier (H/F)

Job Location

Adblock Detected!