Mydra logo
Artificial Intelligence
DeepLearning.AI logo

DeepLearning.AI

Preprocessing Unstructured Data for LLM Applications

  • Beginner

Learn to extract and normalize content from a wide variety of document types to expand the information accessible to your LLM. Enrich your content with metadata and explore advanced document image analysis techniques to enhance retrieval augmented generation (RAG) results.

  • Data Preprocessing
  • Metadata Enrichment
  • Document Image Analysis
  • RAG System Enhancement
  • Unstructured Data Representation

Overview

This course will teach you how to preprocess data for LLM application development, focusing on working with different document types. You will learn to extract and normalize various documents into a common JSON format, enrich it with metadata, and apply techniques for document image analysis to preprocess PDFs, images, and tables. By the end of the course, you will be able to build a RAG bot capable of ingesting different documents and apply these skills to real-world scenarios.

  • Web Streamline Icon: https://streamlinehq.com
    Online
    course location
  • Layers 1 Streamline Icon: https://streamlinehq.com
    English
    course language
  • Self-paced
    course format
  • Live classes
    delivered online

Who is this course for?

Data Scientists

Individuals interested in processing and using diverse data types and formats to build high-performing LLM RAG systems.

Machine Learning Engineers

Professionals looking to enhance their RAG application and expand its versatility with unstructured data.

AI Enthusiasts

Anyone who is interested in learning advanced techniques for representing and processing unstructured data like text, images, and tables.

By joining this course, you will master the art of preprocessing unstructured data for LLM applications, learning to handle a variety of document types and formats. You will gain practical skills that are directly applicable to enhancing RAG systems, making you a valuable asset in the field of AI and machine learning.

Meet your instructor

  • Matt Robinson

    Head of Product, Unstructured.IO

    Matt Robinson is an instructor at DeepLearning.AI. He has no additional online summary available.

Upcoming cohorts

  • Dates

    start now

Free