Data Sciences Institute - Data Speaker Series
Overview
LLM Post-Training and Reasoning via Efficient Value-Based RL
Reinforcement learning (RL) has a newfound killer application in post-training LLMs pre-trained to predict next token to adapt to tasks like instruction following, math-problem solving, and generating content or recommendations that maximize user outcomes. But are the same RL algorithms that animated robots and conquered Atari the right ones to post-train LLMs? In this talk I will present new value-based algorithms for post-training and for scaling test-time compute that leverage both the unique structure of autoregressive LLMs and recent advances on increasing efficiency by changing the Q-learning loss function. I will show how (and argue why) these new algorithms achieve state-of-the-art performance on frontier math reasoning tasks with smaller models and at a fraction of test-time FLOPs.
Biography:
Prof. Kallus’ research interests include causal inference, especially when combined with machine learning; the statistics of optimization under uncertainty; sequential and dynamic decision making; and algorithmic fairness. He is the author of the book “Applied Causal Inference Powered by ML and AI”. Before coming to Cornell, Nathan was a Visiting Scholar at USC’s Department of Data Sciences and Operations and a Postdoctoral Associate at MIT’s Operations Research and Statistics group.
This talk is co-sponsored by the Data Sciences Institute and the Master of Management Analytics Program (MMA), Rotman School of Management, University of Toronto.
For more information, please visit https://datasciences.utoronto.ca/dsi-home/data-sciences-speaker-series/.
Good to know
Highlights
- 1 hour
- In person
Location
Data Science Institute, University of Toronto
700 University Avenue
#10th floor Toronto, ON M7A 2S4 Canada
How do you want to get there?
Organized by
Data Sciences Institute
Followers
--
Events
--
Hosting
--