The total amount of digital data generated worldwide is increasing at a rapid rate. Simultaneously, approximately 80% (and growing) of this newly generated data is unstructured data – data that does not conform to a table- or object-based model.
Examples of unstructured data include text, images, protein structures, geospatial information, and IoT data streams. Despite this, the vast majority of companies and organizations do not have a way of storing and analyzing these increasingly large quantities of unstructured data. Embeddings – high-dimensional, dense vectors which represent the semantic content of unstructured data – can remedy this.
💼 Learn to build LLM-powered apps in just 40 hours with our Large Language Models bootcamp: [ Ссылка ]
In this tutorial, we’ll introduce embeddings and vector search from both an ML- and application-level perspective. This talk will include:
- A high-level overview of embeddings and discuss best practices around embedding generation and usage.
- Build two systems; semantic text search and reverse image search.
- See how we can put our application into production using Milvus - the world’s most popular open-source vector database.
--
Table of Contents:
02:02 – Unstructured data and embeddings
06:45 – Vector search overview
13:40 – Demo time
38:18 – Real-world use cases
About the Speaker: Frank Liu
Frank Liu is the Director of Operations & ML Architect at Zilliz, where he serves as a maintainer for the Towhee open-source project. Prior to Zilliz, Frank co-founded Orion Innovations, an ML-powered indoor positioning startup based in Shanghai, and worked as an ML engineer at Yahoo in San Francisco.
#vectorsearch #embeddings #vectordatabase #futureofdataandai
Ещё видео!