In this video we'll build a Retrieval Augmented Generation (RAG) pipeline to run locally from scratch.
There are frameworks to do this such as LangChain and LlamaIndex, however, building from scratch means that you'll know all the parts of the puzzle.
Specifically, we'll build NutriChat, a RAG pipeline that allows someone to ask questions of a 1200 page Nutrition Textbook PDF.
Code on GitHub - [ Ссылка ]
Whiteboard - [ Ссылка ]
Be sure to check out NVIDIA GTC, NVIDIA's GPU Technology Conference running from March 18-21. It's free to attend virtually! That's what I'm doing.
Sign up to GTC24 here: [ Ссылка ]
Other links:
Download Nutrify (take a photo of food and learn about it) - [ Ссылка ]
Learn AI/ML (beginner-friendly course) - [ Ссылка ]
Learn TensorFlow - [ Ссылка ]
Learn PyTorch - [ Ссылка ]
AI/ML courses/books I recommend - [ Ссылка ]
Read my novel Charlie Walks - [ Ссылка ]
Connect elsewhere:
Web - [ Ссылка ]
Twitter - [ Ссылка ]
Twitch - [ Ссылка ]
ArXiv channel (past streams) - [ Ссылка ]
Get email updates on my work - [ Ссылка ]
Timestamps:
0:00 - Intro/NVIDIA GTC
2:25 - Part 0: Resources and overview
8:33 - Part 1: What is RAG? Why RAG? Why locally?
12:26 - Why RAG?
19:31 - What can RAG be used for?
26:08 - Why run locally?
30:26 - Part 2: What we're going to build
40:40 - Original Retrieval Augmented Generation paper
46:04 - Part 3: Importing and processing a PDF document
48:29 - Code starts! Importing a PDF and making it readable
1:17:09 - Part 4: Preprocessing our text into chunks (text splitting)
1:28:27 - Chunking our sentences together
1:56:38 - Part 5: Embedding creation
1:58:15 - Incredible embeddings resource by Vicky Boykis
1:58:49 - Open-source embedding models
2:00:00 - Creating our embedding model
2:09:02 - Creating embeddings on CPU vs GPU
2:25:14 - Part 6: Retrieval (semantic search)
2:36:57 - Troubleshooting np.fromstring (live!)
2:41:41 - Creating a small semantic search pipeline
2:56:25 - Showcasing the speed of vector search on GPUs
3:20:02 - Part 7: Similarity measures between embedding vectors explained
3:35:58 - Part 8: Functionizing our semantic search pipeline (retrieval)
3:46:49 - Part 9: Getting a Large Langue Model (LLM) to run locally
3:47:51 - Which LLM should I use?
4:03:18 - Loading a LLM locally
4:30:48 - Part 10: Generating text with our local LLM
4:49:09 - Part 11: Augmenting our prompt with context items
5:16:04 - Part 12: Functionizing our text generation step (putting it all together)
5:33:57 - Summary and potential extensions
5:36:00 - Leave a comment if there's any other videos you'd like to see!
#ai #rag #GTC24
Ещё видео!