Data Sciences Institute - Data Speaker Series

By Data Sciences Institute

Overview

Prof. Nathan Kallus, Cornell Tech, Cornell University

LLM Post-Training and Reasoning via Efficient Value-Based RL

Reinforcement learning (RL) has a newfound killer application in post-training LLMs pre-trained to predict next token to adapt to tasks like instruction following, math-problem solving, and generating content or recommendations that maximize user outcomes. But are the same RL algorithms that animated robots and conquered Atari the right ones to post-train LLMs? In this talk I will present new value-based algorithms for post-training and for scaling test-time compute that leverage both the unique structure of autoregressive LLMs and recent advances on increasing efficiency by changing the Q-learning loss function. I will show how (and argue why) these new algorithms achieve state-of-the-art performance on frontier math reasoning tasks with smaller models and at a fraction of test-time FLOPs.

Biography:

Prof. Kallus’ research interests include causal inference, especially when combined with machine learning; the statistics of optimization under uncertainty; sequential and dynamic decision making; and algorithmic fairness. He is the author of the book “Applied Causal Inference Powered by ML and AI”. Before coming to Cornell, Nathan was a Visiting Scholar at USC’s Department of Data Sciences and Operations and a Postdoctoral Associate at MIT’s Operations Research and Statistics group.

This talk is co-sponsored by the Data Sciences Institute and the Master of Management Analytics Program (MMA), Rotman School of Management, University of Toronto.

For more information, please visit https://datasciences.utoronto.ca/dsi-home/data-sciences-speaker-series/.

Category: Science & Tech, Science

Good to know

Highlights

1 hour
In person

Location

Data Science Institute, University of Toronto

700 University Avenue

#10th floor Toronto, ON M7A 2S4 Canada

How do you want to get there?

Driving Public transport Biking Walking

Organized by