Building a Minimal RAG Model for Question Answering – Mahmoud Anwer

The Mini-RAG project is an implementation of the Retrieval-Augmented Generation (RAG) model for question-answering applications. To enhance my understanding of Python and AI models, I followed the tutorial Mini-RAG – From Notebooks to Production by Abu Bakr Soliman. To simplify and align with my perspective, I refactored parts of the source code. In this guide, I will walk you through setting up and deploying Mini-RAG, detailing its structure, API endpoints, and the additional improvements I made to enhance its functionality such as CI/CD pipeline.

I used a local setup of Ollama using Docker Compose. You can check the repository here for Ollama deployment, and below is a simple diagram of the project.

You can check the source code from here, it is fully commented.

Installation

Step 1: Create a Virtual Environment

Clone the GitHub repository from the above link, and create a virtual environment to isolate the dependencies using the following commands:

virtualenv mini-rag
source mini-rag/bin/activate

Step 2: Set Up Environment Variables

Copy the example of the environment variables file and update the necessary variables:

cp .env.example .env

Edit the .env file to include your custom configuration.

Step 3: Install Dependencies

Install the required Python packages:

pip install -r requirements.txt

Step 4: Run the FastAPI Server

Start the application using Uvicorn:

uvicorn main:app --reload --host 0.0.0.0 --port 5000

Now, you are ready to test the API via http://localhost:5000/api/v1

Docker Setup

To containerize the application, follow these steps:

Install Docker and Docker Compose.
From the root directory, configure environment variables using the following commands:

cd docker
cp .env.example .env

Update the .env file as needed.

Start the Docker containers:

docker compose up -d

Project Structure

The Mini-RAG application is organized as follows:

Entry Point
- Purpose: The entry point of the application.
- File: src/main.py
Assets
- Purpose: Storing the application assets.
- Directory: src/assets
Controllers
- Purpose: Handling the main functions of the application.
- Directory: src/controllers
Configuration
- Purpose: Handling the application configuration.
- Directory: src/helpers
- Environment variables: src/.env file
- Method: pydantic_settings
Models
- Purpose: Handling the data models like database schemes and enumerations.
- Directory: src/models
Routes
- Purpose: Handling the different routes of the application such as upload, process and nlp routes.
- Directory: src/routes
Database Operations
- Purpose: Handling the implementation of database logic such as creating or deleting data chunks.
- Directory: src/services
LLM Operations
- Purpose
  - Creating `interface` for different LLM providers such as OpenAi and Cohere implementing setting the generation and embedding models, and other required methods.
  - Creating interface for Vector databases such as Qdrant implementing the different database operations.
- Directory: src/stores
Logging
- Purpose: Handling the logging implementation across the application.
- Directory: src/utils
Dependencies
- Purpose: Containing the packages required by the application.
- Directory: src/requirements.txt
Dockering the application
- Purpose: Handling creating a Dockerfile for the application.
- File: src/Dockerfile
Docker Compose
- Purpose: Handling create a Docker Compose file for the applicaion including the API, MongoDB, MongoExpress and Qdrant.
- Directory: docker
CI/CD Pipeline
- Purpose
  - GitHub action workflows for Code linting.
    - File: .github/workflows/pylint.yml
  - Building and deploying the application.
    - File: src/Jenkinsfile
  - Code analysis using SonarQube.
    - File: src/Jenkinsfile-SQ
VScode extensions
- Purpose: Recommended extensions such as pylint.
- File: .vscode/extensions.json
VScode settings
- Purpose: Handling adding some pylint and vscode settings.
- File: .vscode/settings.json

API Endpoints

Mini-RAG exposes several endpoints to perform various operations:

Base Information
- Purpose: Act as an informative endpoint that retrieves some information about the API.
- Route: /api/v1
File Upload
- Purpose: Upload a new file in a specific project, and retrieves the file ID.
- Route: /api/v1/data/upload/{project_id}
Process One File
- Purpose: Process a file in a specific project using its ID, and retrieves the number of inserted chunks.
- Route: /api/v1/data/process/{project_id}
Process All Files
- Purpose: Process all files in a specific project, and retrieves the number of inserted chunks, and the number of processed files.
- Route: /api/v1/data/processall/{project_id}
Insert Chunks into Vector DB
- Purpose: Insert the created chunks from a specific project into the Vector DB Qdrant, and retrieves the number of inserted items.
- Route: /api/v1/nlp/index/push/{project_id}
Index Information
- Purpose: retrieve the index information for a specific project from the Vector DB Qdrant.
- Route: /api/v1/nlp/index/info/{project_id}
Search into the Vector DB
- Purpose: Perform a semantic search in the vector DB Qdrant for a specific project, and retrieves the results with its score.
- Route: /api/v1/nlp/index/search/{project_id}
Answer a question using the RAG approach
- Purpose: Retrieves relevant documents for a specific project from the vector DB collection and uses a language model to generate an answer based on the retrieved documents.
- Route: /api/v1/nlp/index/answer/{project_id}

Pre-Commit Checks

To ensure code quality and security, integrate TruffleHog pre-commit hooks:

Install the pre-commit package:

pip install pre-commit
pre-commit install

Add files or directories to exclude in the exclude.txt file.
Test the configuration locally:

pre-commit run --all-files

CI/CD Pipeline

GitHub action workflows for Code linting.
- File: .github/workflows/pylint.yml
Building and deploying the application.
- File: src/Jenkinsfile
Code analysis using SonarQube.
- File: src/Jenkinsfile-SQ

Enhancements

Adding a CI/CD pipeline including static analysis using SonarQube.
Using MinIO for uploads (I think it would be better to use NFS).
Using UUID to generate a unique file ID.
Using Qdrant container instead of using a regular directory.
Refactoring the source code to align with my perspective.

References

For a more detailed walkthrough, refer to the Mini-RAG video series: Mini-RAG – From Notebooks to Production

Ollama: https://ollama.com/
Qdrant: https://qdrant.tech/documentation/
FastAPI: https://fastapi.tiangolo.com/
MongoDB: https://www.mongodb.com/