How to Build Powerful RAG Applications Locally with RagBits

Retrieval-Augmented Generation (RAG) systems are becoming the backbone of modern AI applications, enabling more accurate, context-aware responses by combining language models with your own custom data. If you’ve been exploring ways to set up a local, production-ready RAG stack, RagBits might be exactly what you’re looking for.

In this post, we’ll walk through how to install and use RagBits, an open-source Python toolbox designed for building RAG applications. We’ll pair it with Ollama, a popular tool for running local large language models. Whether you’re looking to run multi-agent workflows, do document search, or spin up AI-powered assistants, RagBits offers an end-to-end, developer-friendly pipeline.


What Is RagBits?

RagBits is a lightweight, open-source toolkit aimed at making RAG application development fast and flexible. It supports:

  • Over 100+ LLMs through integration with LiteLLM
  • Pydantic-based type-safe schema validation
  • Built-in observability, testing, and monitoring
  • Parsing of 20+ document formats
  • Ray-based parallel processing for large-scale ingestion
  • Compatibility with chat UI interfaces and custom deployment flows

You can seamlessly swap between embedding models, language models, and vector stores—all while using local resources.


Sure! Here’s a complete Quickstart Guide with installation and first use of Ragbits, combining installation, setup, and running your first GenAI + RAG app.

Requirements

Ragbits is compatible with:

  • Python 3.9+
  • Local or remote LLMs via LiteLLM
  • Optional: GPU/CPU acceleration for better performance
  • Optional: Ollama for running local models

Make sure to install and configure your embedding and language models (like text-embedding-3-small and gpt-4.1-nano) using LiteLLM or Ollama before use.


1. Installation

Install the full Ragbits stack:

pip install ragbits

Alternatively, install just what you need:

pip install ragbits-core ragbits-document-search ragbits-chat

2. Define and Run an LLM Prompt

Create a simple script like qa_prompt.py:

import asyncio
from pydantic import BaseModel
from ragbits.core.llms import LiteLLM
from ragbits.core.prompt import Prompt

class QuestionAnswerPromptInput(BaseModel):
    question: str

class QuestionAnswerPromptOutput(BaseModel):
    answer: str

class QuestionAnswerPrompt(Prompt[QuestionAnswerPromptInput, QuestionAnswerPromptOutput]):
    system_prompt = "You are a question answering agent."
    user_prompt = "Question: {{ question }}"

llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True)

async def main():
    prompt = QuestionAnswerPrompt(QuestionAnswerPromptInput(question="What is RAG in AI?"))
    response = await llm.generate(prompt)
    print(response.answer)

if __name__ == "__main__":
    asyncio.run(main())

Run it:

python qa_prompt.py

3. Ingest & Search a Document

import asyncio
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.vector_stores import InMemoryVectorStore
from ragbits.document_search import DocumentSearch

embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
vector_store = InMemoryVectorStore(embedder=embedder)
document_search = DocumentSearch(vector_store=vector_store)

async def run():
    await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
    result = await document_search.search("What is the transformer model?")
    print(result)

if __name__ == "__main__":
    asyncio.run(run())

4. Create a RAG Pipeline (Prompt + Context)

from pydantic import BaseModel

class RAGInput(BaseModel):
    question: str
    context: list[str]

class RAGPrompt(Prompt[RAGInput, str]):
    system_prompt = "You are a QA agent. Use the provided context to answer."
    user_prompt = """
    Question: {{ question }}
    Context:
    {% for item in context %}{{ item }}{% endfor %}
    """

async def run_rag():
    question = "What are the key findings in the paper?"
    await document_search.ingest("web://https://arxiv.org/pdf/1706.03762")
    result = await document_search.search(question)

    prompt = RAGPrompt(RAGInput(
        question=question,
        context=[r.text_representation for r in result],
    ))

    response = await llm.generate(prompt)
    print(response)

asyncio.run(run_rag())

5. Optional: Launch Chat UI

from ragbits.chat.api import RagbitsAPI
from ragbits.chat.interface import ChatInterface

class MyChat(ChatInterface):
    async def setup(self):
        self.embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
        self.vector_store = InMemoryVectorStore(embedder=self.embedder)
        self.document_search = DocumentSearch(vector_store=self.vector_store)
        self.llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True)

        await self.document_search.ingest("web://https://arxiv.org/pdf/1706.03762")

    async def chat(self, message, history=None, context=None):
        result = await self.document_search.search(message)
        prompt = RAGPrompt(RAGInput(
            question=message,
            context=[r.text_representation for r in result]
        ))
        async for chunk in self.llm.generate_streaming(prompt):
            yield self.create_text_response(chunk)

if __name__ == "__main__":
    RagbitsAPI(MyChat).run()

This starts a full-stack chatbot backed by your document.


6. Scaffold a New Project (Optional)

Use the official starter template:

uvx create-ragbits-app

Helpful Links

Why This Matters

The ability to run fast, modular, and observable RAG systems locally means:

  • You own your data end to end
  • You can build private chatbots and assistants
  • You reduce latency, no API calls
  • Great for prototyping enterprise apps (support, internal docs, agents)

And thanks to RagBits’ integration with Pydantic, Ray, and Ollama, it’s not just another hobby tool—it’s production-ready.


Final Thoughts

RagBits stands out in the RAG space because it makes developer experience a priority. The design is modular, observability is built-in, and it plays well with other local-first AI tools like Ollama and LightLLM.

If you’ve been looking for a way to run your own mini ChatGPT trained on your documents, this is one of the most approachable ways to get started.

Leave a Reply

x
Advertisements