Building and Evaluating a Retrieval Augmented Generation (RAG) Pipeline to Land a Job!

By Keith Bourne

Elevator Pitch

I recently completed an interview process, including a coding exercise, for a popular job in tech, Generative AI Data Scientist. In this coding workshop, we will use this experience for the backdrop to step through how to build and evaluate a RAG pipeline in a fun and education way!

Description

In today’s rapidly evolving tech landscape, Generative AI is at the forefront of innovation, creating exciting career opportunities for data scientists and engineers. This hands-on workshop offers a unique and practical approach to mastering one of the most in-demand skills in the field: building and evaluating a Retrieval Augmented Generation (RAG) pipeline.

Drawing from real-world experience, the workshop instructor will guide participants through the exact coding exercise used in their recent interview process for a Generative AI Data Scientist position at a leading tech company. This approach provides an unparalleled opportunity to:

  1. Gain practical, industry-relevant skills: Learn how to construct a RAG pipeline from scratch, mirroring the challenges faced in actual job interviews and real-world applications.

  2. Understand evaluation techniques: Master the art of assessing and fine-tuning your RAG system using ragas, a critical skill for both landing a job and excelling in the field.

  3. Peek behind the interview curtain: Get insider insights into what top companies are looking for in Generative AI talent, helping you prepare for your own career advancement.

  4. Engage in a fun, collaborative environment: Work alongside peers to solve problems, share ideas, and build your professional network.

  5. Bridge the gap between theory and practice: Apply your knowledge to a concrete, real-world scenario that goes beyond textbook examples.

Workshop Highlights

Throughout the workshop, participants will:

  • Set up a RAG pipeline using popular open-source tools and libraries
  • Learn best practices for data preparation and indexing
  • Implement and fine-tune retrieval mechanisms
  • Integrate retrieved information with large language models
  • Develop robust evaluation metrics to assess pipeline performance
  • Troubleshoot common issues and optimize system efficiency

Whether you’re a seasoned data scientist looking to pivot into Generative AI, a student preparing for future job prospects, or simply an enthusiast eager to explore this cutting-edge technology, this workshop offers invaluable experience that sets it apart from traditional conference sessions.

Don’t miss this chance to enhance your skills, boost your resume, and gain a competitive edge in the job market. Join us for an engaging, practical, and potentially career-changing workshop that will equip you with the tools to build, evaluate, and showcase your own RAG pipeline – a key to unlocking exciting opportunities in the world of Generative AI.

Technologies we will use: Python, Jupyter Notebook (Colab), Google Cloud Platform, Gemini, LangChain, ChromaDB, and more!

And just in case you were wondering, yes I did get the job!

Notes

Ideally, this would be at least two sessions in terms of time (i.e. 45 min X 2 -> 90 min). Having done this type of workshop in the past, I just think it is hard for attendees to grasp the concepts within a 45 min period. You tend to spend a lot of time up front getting everyone set up, and then you don’t have enough time to dive into the code with just 45 min. If that isn’t possible, I can make it work with 45 min, but I think the more time we can give the attendees, the better.

I have given similar code lab talks with GDG Ann Arbor, and we were able to get “credits” from Google that covered the costs of the workshop for participants. I am hoping to do the same here, since the technology we will be using (GCP, Gemini, Colab) is not free.