Documentation Index
Fetch the complete documentation index at: https://docs.kosmoy.com/llms.txt
Use this file to discover all available pages before exploring further.
Working with Data Ingestion
The Data Ingestion section of Kosmoy Studio provides the tools you need to transform your unstructured data into a format suitable for use in your AI applications. Currently, the focus is on vectorization, a crucial step in building Retrieval Augmented Generation (RAG) systems. Key Features:- Vector Pipelines: Create and manage pipelines that transform unstructured data (PDFs and Office files) from your object stores into vector embeddings stored in your vector databases.
- Support for RAG: Prepare your data for use in RAG applications, enabling your AI Assistants to retrieve relevant information from your knowledge base and generate more accurate and context-aware responses.
- Source Data: Read PDF and Office files from a specified folder within a registered Object Store.
- Chunking: Divide the documents into smaller, manageable chunks based on a defined strategy.
- Vectorization: Generate vector embeddings for each chunk using a selected embeddings model.
- Storage: Store the generated vectors in a target collection within a registered Vector Database.
- Understanding Vector Pipelines and their role in RAG.
- Creating and configuring Vector Pipelines.
- Managing existing Vector Pipelines.