Building a Minimal RAG Model for Question Answering

The Mini-RAG project is an implementation of the Retrieval-Augmented Generation (RAG) model for question-answering applications. To enhance my understanding of Python and AI models, I followed the tutorial Mini-RAG – From Notebooks to Production by Abu Bakr Soliman. To simplify and align with my perspective, I refactored parts of the source code. In this guide, I will walk you through setting up and deploying Mini-RAG, detailing its structure, API endpoints, and the additional improvements I made to enhance its functionality such as CI/CD pipeline.

I used a local setup of Ollama using Docker Compose. You can check the repository here for Ollama deployment, and below is a simple diagram of the project.

You can check the source code from here, it is fully commented.

Installation

Step 1: Create a Virtual Environment

Clone the GitHub repository from the above link, and create a virtual environment to isolate the dependencies using the following commands:

virtualenv mini-rag
source mini-rag/bin/activate

Step 2: Set Up Environment Variables

Copy the example of the environment variables file and update the necessary variables:

cp .env.example .env

Edit the .env file to include your custom configuration.

Step 3: Install Dependencies

Install the required Python packages:

pip install -r requirements.txt

Step 4: Run the FastAPI Server

Start the application using Uvicorn:

uvicorn main:app --reload --host 0.0.0.0 --port 5000

Now, you are ready to test the API via http://localhost:5000/api/v1

Docker Setup

To containerize the application, follow these steps:

  • Install Docker and Docker Compose.
  • From the root directory, configure environment variables using the following commands:
cd docker
cp .env.example .env

Update the .env file as needed.

  • Start the Docker containers:
docker compose up -d

Project Structure

The Mini-RAG application is organized as follows:

  • Entry Point
    • Purpose: The entry point of the application.
    • File: src/main.py
  • Assets
    • Purpose: Storing the application assets.
    • Directory: src/assets
  • Controllers
    • Purpose: Handling the main functions of the application.
    • Directory: src/controllers
  • Configuration
    • Purpose: Handling the application configuration.
    • Directory: src/helpers
    • Environment variables: src/.env file
    • Method: pydantic_settings
  • Models
    • Purpose: Handling the data models like database schemes and enumerations.
    • Directory: src/models
  • Routes
    • Purpose: Handling the different routes of the application such as upload, process and nlp routes.
    • Directory: src/routes
  • Database Operations
    • Purpose: Handling the implementation of database logic such as creating or deleting data chunks.
    • Directory: src/services
  • LLM Operations
    • Purpose
      • Creating `interface` for different LLM providers such as OpenAi and Cohere implementing setting the generation and embedding models, and other required methods.
      • Creating interface for Vector databases such as Qdrant implementing the different database operations.
    • Directory: src/stores
  • Logging
    • Purpose: Handling the logging implementation across the application.
    • Directory: src/utils
  • Dependencies
    • Purpose: Containing the packages required by the application.
    • Directory: src/requirements.txt
  • Dockering the application
    • Purpose: Handling creating a Dockerfile for the application.
    • File: src/Dockerfile
  • Docker Compose
    • Purpose: Handling create a Docker Compose file for the applicaion including the API, MongoDB, MongoExpress and Qdrant.
    • Directory: docker
  • CI/CD Pipeline
    • Purpose
      • GitHub action workflows for Code linting.
        • File: .github/workflows/pylint.yml
      • Building and deploying the application.
        • File: src/Jenkinsfile
      • Code analysis using SonarQube.
        • File: src/Jenkinsfile-SQ
  • VScode extensions
    • Purpose: Recommended extensions such as pylint.
    • File: .vscode/extensions.json
  • VScode settings
    • Purpose: Handling adding some pylint and vscode settings.
    • File: .vscode/settings.json

API Endpoints

Mini-RAG exposes several endpoints to perform various operations:

  • Base Information
    • Purpose: Act as an informative endpoint that retrieves some information about the API.
    • Route: /api/v1
  • File Upload
    • Purpose: Upload a new file in a specific project, and retrieves the file ID.
    • Route: /api/v1/data/upload/{project_id}
  • Process One File
    • Purpose: Process a file in a specific project using its ID, and retrieves the number of inserted chunks.
    • Route: /api/v1/data/process/{project_id}
  • Process All Files
    • Purpose: Process all files in a specific project, and retrieves the number of inserted chunks, and the number of processed files.
    • Route: /api/v1/data/processall/{project_id}
  • Insert Chunks into Vector DB
    • Purpose: Insert the created chunks from a specific project into the Vector DB Qdrant, and retrieves the number of inserted items.
    • Route: /api/v1/nlp/index/push/{project_id}
  • Index Information
    • Purpose: retrieve the index information for a specific project from the Vector DB Qdrant.
    • Route: /api/v1/nlp/index/info/{project_id}
  • Search into the Vector DB
    • Purpose: Perform a semantic search in the vector DB Qdrant for a specific project, and retrieves the results with its score.
    • Route: /api/v1/nlp/index/search/{project_id}
  • Answer a question using the RAG approach
    • Purpose: Retrieves relevant documents for a specific project from the vector DB collection and uses a language model to generate an answer based on the retrieved documents.
    • Route: /api/v1/nlp/index/answer/{project_id}

Pre-Commit Checks

To ensure code quality and security, integrate TruffleHog pre-commit hooks:

  • Install the pre-commit package:
pip install pre-commit
pre-commit install
  • Add files or directories to exclude in the exclude.txt file.
  • Test the configuration locally:
pre-commit run --all-files

CI/CD Pipeline

  • GitHub action workflows for Code linting.
    • File: .github/workflows/pylint.yml
  • Building and deploying the application.
    • File: src/Jenkinsfile
  • Code analysis using SonarQube.
    • File: src/Jenkinsfile-SQ

Enhancements

  • Adding a CI/CD pipeline including static analysis using SonarQube.
  • Using MinIO for uploads (I think it would be better to use NFS).
  • Using UUID to generate a unique file ID.
  • Using Qdrant container instead of using a regular directory.
  • Refactoring the source code to align with my perspective.

References

Share your passion