- This event has passed.
TECoSA Research Seminar: Learn and Align RL Policies from Human Feedback
November 24, 12:00 – 13:00
Speaker: Daniel Simões Marta, TECoSA PhD student
(Venue, Zoom link and sign-up link circulated to members)
Please email email@example.com if you have any questions.
ABSTRACT: Reinforcement learning from informed by human feedback (RLHF) has emerged as a novel domain in machine learning, where human insights are crucial in shaping the behavior of an AI agent. Within this domain, a significant strategy is preference-based reinforcement learning, in which a human-informed reward system is developed through the evaluation and selection among different sets of action sequences. In this talk, I will present several works conducted by our group on aligning RL policies with human feedback. I will primarily focus on aligning relevant features such as safety and perceived safety—even though the work can be extended to any desired feature—and will discuss the application of these principles in the context of RL policies. Additionally, I will provide concrete examples of leveraging human intrinsic knowledge through methods such as ranking, stating preferences, and analyzing text.