DeepLearning.AI
Learn to extract and normalize content from a wide variety of document types to expand the information accessible to your LLM. Enrich your content with metadata and explore advanced document image analysis techniques to enhance retrieval augmented generation (RAG) results.
This course will teach you how to preprocess data for LLM application development, focusing on working with different document types. You will learn to extract and normalize various documents into a common JSON format, enrich it with metadata, and apply techniques for document image analysis to preprocess PDFs, images, and tables. By the end of the course, you will be able to build a RAG bot capable of ingesting different documents and apply these skills to real-world scenarios.
Data Scientists
Individuals interested in processing and using diverse data types and formats to build high-performing LLM RAG systems.
Machine Learning Engineers
Professionals looking to enhance their RAG application and expand its versatility with unstructured data.
AI Enthusiasts
Anyone who is interested in learning advanced techniques for representing and processing unstructured data like text, images, and tables.
By joining this course, you will master the art of preprocessing unstructured data for LLM applications, learning to handle a variety of document types and formats. You will gain practical skills that are directly applicable to enhancing RAG systems, making you a valuable asset in the field of AI and machine learning.
Matt Robinson
Head of Product, Unstructured.IO
Matt Robinson is an instructor at DeepLearning.AI. He has no additional online summary available.
Cost
Free
Duration
Dates
Location