Working with Data Ingestion
The Data Ingestion section of Kosmoy Studio provides the tools you need to transform your unstructured data into a format suitable for use in your AI applications. Currently, the focus is on vectorization, a crucial step in building Retrieval Augmented Generation (RAG) systems. Key Features:- Vector Pipelines: Create and manage pipelines that transform unstructured data (PDFs and Office files) from your object stores into vector embeddings stored in your vector databases.
- Support for RAG: Prepare your data for use in RAG applications, enabling your AI Assistants to retrieve relevant information from your knowledge base and generate more accurate and context-aware responses.
- Source Data: Read PDF and Office files from a specified folder within a registered Object Store.
- Chunking: Divide the documents into smaller, manageable chunks based on a defined strategy.
- Vectorization: Generate vector embeddings for each chunk using a selected embeddings model.
- Storage: Store the generated vectors in a target collection within a registered Vector Database.
- Understanding Vector Pipelines and their role in RAG.
- Creating and configuring Vector Pipelines.
- Managing existing Vector Pipelines.