Mydra logo
Artificial Intelligence
DeepLearning.AI logo

DeepLearning.AI

Reinforcement Learning from Human Feedback

  • up to 1 hour
  • Intermediate

Get a conceptual understanding of Reinforcement Learning from Human Feedback (RLHF), as well as the datasets needed for this technique. Fine-tune the Llama 2 model using RLHF with the open source Google Cloud Pipeline Components Library and evaluate tuned model performance against the base model with evaluation methods.

  • Reinforcement Learning from Human Feedback
  • Fine-tuning LLMs
  • Google Cloud Pipeline Components Library
  • Model evaluation
  • Loss curves analysis

Overview

In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will explore the two datasets that are used in RLHF training: the 'preference' and 'prompt' datasets. Use the open source Google Cloud Pipeline Components Library to fine-tune the Llama 2 model with RLHF. Assess the tuned LLM against the original base model by comparing loss curves and using the 'Side-by-Side (SxS)' method.

  • Web Streamline Icon: https://streamlinehq.com
    Online
    course location
  • Layers 1 Streamline Icon: https://streamlinehq.com
    English
    course language
  • Self-paced
    course format
  • Live classes
    delivered online

Who is this course for?

Python Developers

Anyone with intermediate Python knowledge who’s interested in learning about using the Reinforcement Learning from Human Feedback technique.

AI Enthusiasts

Individuals looking to understand how to align large language models with human values and preferences.

Data Scientists

Professionals aiming to fine-tune language models using advanced techniques like RLHF.

This course offers a deep dive into Reinforcement Learning from Human Feedback (RLHF), a key method for aligning large language models with human values and preferences. Ideal for Python developers, AI enthusiasts, and data scientists, this course will help you fine-tune LLMs and evaluate their performance, advancing your skills and career in AI.

Pre-Requisites

1 / 3

  • Intermediate Python knowledge

  • Basic understanding of machine learning concepts

  • Familiarity with large language models (LLMs)

What will you learn?

Introduction to RLHF
Get a conceptual understanding of Reinforcement Learning from Human Feedback (RLHF) and its importance in aligning LLMs with human values and preferences.
Datasets for RLHF
Explore the two datasets used in RLHF training: the 'preference' and 'prompt' datasets.
Using Google Cloud Pipeline Components Library
Learn how to use the open source Google Cloud Pipeline Components Library to fine-tune the Llama 2 model with RLHF.
Model Evaluation
Assess the tuned LLM against the original base model by comparing loss curves and using the 'Side-by-Side (SxS)' method.

Meet your instructor

  • Nikita Namjoshi

    Product Manager | Google.org Fellow, Woodwell Climate Research Center

    Nikita Namjoshi is a product manager and Google.org Fellow at the Woodwell Climate Research Center who creates products that use machine learning and Google Cloud to investigate ways to lower her environmental footprint.

Upcoming cohorts

  • Dates

    start now

Free