How To Build Powerful RAG Applications Locally With RagBits

Retrieval-Augmented Generation (RAG) systems are becoming the backbone of modern AI applications, enabling more accurate, context-aware responses by combining language models with your own custom data. If you’ve been exploring ways to set up a local, production-ready RAG stack, RagBits might be exactly what you’re looking for.

In this post, we’ll walk through how to install and use RagBits, an open-source Python toolbox designed for building RAG applications. We’ll pair it with Ollama, a popular tool for running local large language models. Whether you’re looking to run multi-agent workflows, do document search, or spin up AI-powered assistants, RagBits offers an end-to-end, developer-friendly pipeline.

What Is RagBits?

RagBits is a lightweight, open-source toolkit aimed at making RAG application development fast and flexible. It supports:

Over 100+ LLMs through integration with LiteLLM
Pydantic-based type-safe schema validation
Built-in observability, testing, and monitoring
Parsing of 20+ document formats
Ray-based parallel processing for large-scale ingestion
Compatibility with chat UI interfaces and custom deployment flows

You can seamlessly swap between embedding models, language models, and vector stores—all while using local resources.

Sure! Here’s a complete Quickstart Guide with installation and first use of Ragbits, combining installation, setup, and running your first GenAI + RAG app.

Requirements

Ragbits is compatible with:

Python 3.9+
Local or remote LLMs via LiteLLM
Optional: GPU/CPU acceleration for better performance
Optional: Ollama for running local models

Make sure to install and configure your embedding and language models (like text-embedding-3-small and gpt-4.1-nano) using LiteLLM or Ollama before use.

1. Installation

Install the full Ragbits stack:

pip install ragbits

Alternatively, install just what you need:

pip install ragbits-core ragbits-document-search ragbits-chat

2. Define and Run an LLM Prompt

Create a simple script like qa_prompt.py:

import asyncio
from pydantic import BaseModel
from ragbits.core.llms import LiteLLM
from ragbits.core.prompt import Prompt

class QuestionAnswerPromptInput(BaseModel):
    question: str

class QuestionAnswerPromptOutput(BaseModel):
    answer: str

class QuestionAnswerPrompt(Prompt[QuestionAnswerPromptInput, QuestionAnswerPromptOutput]):
    system_prompt = "You are a question answering agent."
    user_prompt = "Question: {{ question }}"

llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True)

async def main():
    prompt = QuestionAnswerPrompt(QuestionAnswerPromptInput(question="What is RAG in AI?"))
    response = await llm.generate(prompt)
    print(response.answer)

if __name__ == "__main__":
    asyncio.run(main())

Run it:

python qa_prompt.py

3. Ingest & Search a Document

import asyncio
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.vector_stores import InMemoryVectorStore
from ragbits.document_search import DocumentSearch

embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
vector_store = InMemoryVectorStore(embedder=embedder)
document_search = DocumentSearch(vector_store=vector_store)

async def run():
    await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
    result = await document_search.search("What is the transformer model?")
    print(result)

if __name__ == "__main__":
    asyncio.run(run())

4. Create a RAG Pipeline (Prompt + Context)

from pydantic import BaseModel

class RAGInput(BaseModel):
    question: str
    context: list[str]

class RAGPrompt(Prompt[RAGInput, str]):
    system_prompt = "You are a QA agent. Use the provided context to answer."
    user_prompt = """
    Question: {{ question }}
    Context:
    {% for item in context %}{{ item }}{% endfor %}
    """

async def run_rag():
    question = "What are the key findings in the paper?"
    await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
    result = await document_search.search(question)

    prompt = RAGPrompt(RAGInput(
        question=question,
        context=[r.text_representation for r in result],
    ))

    response = await llm.generate(prompt)
    print(response)

asyncio.run(run_rag())

5. Optional: Launch Chat UI

from ragbits.chat.api import RagbitsAPI
from ragbits.chat.interface import ChatInterface

class MyChat(ChatInterface):
    async def setup(self):
        self.embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
        self.vector_store = InMemoryVectorStore(embedder=self.embedder)
        self.document_search = DocumentSearch(vector_store=self.vector_store)
        self.llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True)

        await self.document_search.ingest("web://https://arxiv.org/pdf/1706.03762")

    async def chat(self, message, history=None, context=None):
        result = await self.document_search.search(message)
        prompt = RAGPrompt(RAGInput(
            question=message,
            context=[r.text_representation for r in result]
        ))
        async for chunk in self.llm.generate_streaming(prompt):
            yield self.create_text_response(chunk)

if __name__ == "__main__":
    RagbitsAPI(MyChat).run()

This starts a full-stack chatbot backed by your document.

6. Scaffold a New Project (Optional)

Use the official starter template:

uvx create-ragbits-app

Helpful Links

Docs: https://github.com/TrenchesAI/ragbits
LiteLLM: https://github.com/BerriAI/litellm

Why This Matters

The ability to run fast, modular, and observable RAG systems locally means:

You own your data end to end
You can build private chatbots and assistants
You reduce latency, no API calls
Great for prototyping enterprise apps (support, internal docs, agents)

And thanks to RagBits’ integration with Pydantic, Ray, and Ollama, it’s not just another hobby tool—it’s production-ready.

Final Thoughts

RagBits stands out in the RAG space because it makes developer experience a priority. The design is modular, observability is built-in, and it plays well with other local-first AI tools like Ollama and LightLLM.

If you’ve been looking for a way to run your own mini ChatGPT trained on your documents, this is one of the most approachable ways to get started.

How to Build Powerful RAG Applications Locally with RagBits

What Is RagBits?

Requirements

1. Installation

2. Define and Run an LLM Prompt

3. Ingest & Search a Document

4. Create a RAG Pipeline (Prompt + Context)

5. Optional: Launch Chat UI

6. Scaffold a New Project (Optional)

Helpful Links

Why This Matters

Final Thoughts

Leave a ReplyCancel reply

About us

What Is RagBits?

Requirements

1. Installation

2. Define and Run an LLM Prompt

3. Ingest & Search a Document

4. Create a RAG Pipeline (Prompt + Context)

5. Optional: Launch Chat UI

6. Scaffold a New Project (Optional)

Helpful Links

Why This Matters

Final Thoughts

Related Posts

Best AI Agent Tools for n8n: Build Powerful Automations Without Code

How to Install Chain-of-Zoom AI for Stunning Super-Resolution Zoom

Leave a ReplyCancel reply