Skip to content
Loading Events

« All Events

  • This event has passed.

TECoSA Research Seminar: Learn and Align RL Policies from Human Feedback

November 24, 2023, 12:0013:00

Speaker: Daniel Simões MartaTECoSA PhD student
(Venue, Zoom link and sign-up link circulated to members)
Please email vickid@kth.se if you have any questions.

ABSTRACT: Reinforcement learning from informed by human feedback (RLHF) has emerged as a novel domain in machine learning, where human insights are crucial in shaping the behavior of an AI agent. Within this domain, a significant strategy is preference-based reinforcement learning, in which a human-informed reward system is developed through the evaluation and selection among different sets of action sequences. In this talk, I will present several works conducted by our group on aligning RL policies with human feedback. I will primarily focus on aligning relevant features such as safety and perceived safety—even though the work can be extended to any desired feature—and will discuss the application of these principles in the context of RL policies. Additionally, I will provide concrete examples of leveraging human intrinsic knowledge through methods such as ranking, stating preferences, and analyzing text.

Details

Date:
November 24, 2023
Time:
12:00 – 13:00
Event Categories:
, ,