The Mini-RAG project is an implementation of the Retrieval-Augmented Generation (RAG) model for question-answering applications. To enhance my understanding of Python and AI models, I followed the tutorial Mini-RAG – From Notebooks to Production by Abu Bakr Soliman. To simplify and align with my perspective, I refactored parts of the source code. In this guide, I will walk you through setting up and deploying Mini-RAG, detailing its structure, API endpoints, and the additional improvements I made to enhance its functionality such as CI/CD
pipeline.
I used a local setup of Ollama
using Docker Compose
. You can check the repository here for Ollama
deployment, and below is a simple diagram of the project.

You can check the source code from here, it is fully commented.
Installation
Step 1: Create a Virtual Environment
Clone the GitHub repository from the above link, and create a virtual environment to isolate the dependencies using the following commands:
virtualenv mini-rag
source mini-rag/bin/activate
Step 2: Set Up Environment Variables
Copy the example of the environment variables file and update the necessary variables:
cp .env.example .env
Edit the .env
file to include your custom configuration.
Step 3: Install Dependencies
Install the required Python packages:
pip install -r requirements.txt
Step 4: Run the FastAPI Server
Start the application using Uvicorn
:
uvicorn main:app --reload --host 0.0.0.0 --port 5000
Now, you are ready to test the API via http://localhost:5000/api/v1
Docker Setup
To containerize the application, follow these steps:
- Install Docker and Docker Compose.
- From the root directory, configure environment variables using the following commands:
cd docker
cp .env.example .env
Update the .env
file as needed.
- Start the
Docker
containers:
docker compose up -d
Project Structure
The Mini-RAG application is organized as follows:
- Entry Point
- Purpose: The entry point of the application.
- File:
src/main.py
- Assets
- Purpose: Storing the application assets.
- Directory:
src/assets
- Controllers
- Purpose: Handling the main functions of the application.
- Directory:
src/controllers
- Configuration
- Purpose: Handling the application configuration.
- Directory:
src/helpers
- Environment variables:
src/.env
file - Method:
pydantic_settings
- Models
- Purpose: Handling the data models like database schemes and enumerations.
- Directory:
src/models
- Routes
- Purpose: Handling the different routes of the application such as upload, process and
nlp
routes. - Directory:
src/routes
- Purpose: Handling the different routes of the application such as upload, process and
- Database Operations
- Purpose: Handling the implementation of database logic such as creating or deleting data chunks.
- Directory:
src/services
- LLM Operations
- Purpose
- Creating `interface` for different LLM providers such as
OpenAi
andCohere
implementing setting the generation and embedding models, and other required methods. - Creating
interface
for Vector databases such asQdrant
implementing the different database operations.
- Creating `interface` for different LLM providers such as
- Directory:
src/stores
- Purpose
- Logging
- Purpose: Handling the logging implementation across the application.
- Directory:
src/utils
- Dependencies
- Purpose: Containing the packages required by the application.
- Directory:
src/requirements.txt
- Dockering the application
- Purpose: Handling creating a
Dockerfile
for the application. - File:
src/Dockerfile
- Purpose: Handling creating a
- Docker Compose
- Purpose: Handling create a
Docker Compose
file for the applicaion including the API,MongoDB
,MongoExpress
andQdrant
. - Directory:
docker
- Purpose: Handling create a
- CI/CD Pipeline
- Purpose
- GitHub action workflows for Code linting.
- File:
.github/workflows/pylint.yml
- File:
- Building and deploying the application.
- File:
src/Jenkinsfile
- File:
- Code analysis using
SonarQube
.- File:
src/Jenkinsfile-SQ
- File:
- GitHub action workflows for Code linting.
- Purpose
- VScode extensions
- Purpose: Recommended extensions such as
pylint
. - File:
.vscode/extensions.json
- Purpose: Recommended extensions such as
- VScode settings
- Purpose: Handling adding some
pylint
andvscode
settings. - File:
.vscode/settings.json
- Purpose: Handling adding some
API Endpoints
Mini-RAG exposes several endpoints to perform various operations:
- Base Information
- Purpose: Act as an informative endpoint that retrieves some information about the API.
- Route:
/api/v1
- File Upload
- Purpose: Upload a new file in a specific project, and retrieves the file ID.
- Route:
/api/v1/data/upload/{project_id}
- Process One File
- Purpose: Process a file in a specific project using its ID, and retrieves the number of inserted chunks.
- Route:
/api/v1/data/process/{project_id}
- Process All Files
- Purpose: Process all files in a specific project, and retrieves the number of inserted chunks, and the number of processed files.
- Route:
/api/v1/data/processall/{project_id}
- Insert Chunks into Vector DB
- Purpose: Insert the created chunks from a specific project into the Vector DB
Qdrant
, and retrieves the number of inserted items. - Route:
/api/v1/nlp/index/push/{project_id}
- Purpose: Insert the created chunks from a specific project into the Vector DB
- Index Information
- Purpose: retrieve the index information for a specific project from the Vector DB
Qdrant
. - Route:
/api/v1/nlp/index/info/{project_id}
- Purpose: retrieve the index information for a specific project from the Vector DB
- Search into the Vector DB
- Purpose: Perform a semantic search in the vector DB
Qdrant
for a specific project, and retrieves the results with its score. - Route:
/api/v1/nlp/index/search/{project_id}
- Purpose: Perform a semantic search in the vector DB
- Answer a question using the RAG approach
- Purpose: Retrieves relevant documents for a specific project from the vector DB collection and uses a language model to generate an answer based on the retrieved documents.
- Route:
/api/v1/nlp/index/answer/{project_id}
Pre-Commit Checks
To ensure code quality and security, integrate TruffleHog
pre-commit hooks:
- Install the pre-commit package:
pip install pre-commit
pre-commit install
- Add files or directories to exclude in the
exclude.txt
file. - Test the configuration locally:
pre-commit run --all-files
CI/CD Pipeline
- GitHub action workflows for Code linting.
- File:
.github/workflows/pylint.yml
- File:
- Building and deploying the application.
- File:
src/Jenkinsfile
- File:
- Code analysis using
SonarQube
.- File:
src/Jenkinsfile-SQ
- File:
Enhancements
- Adding a
CI/CD
pipeline including static analysis usingSonarQube
. - Using
MinIO
for uploads (I think it would be better to useNFS
). - Using
UUID
to generate a unique file ID. - Using
Qdrant
container instead of using a regular directory. - Refactoring the source code to align with my perspective.
References
- For a more detailed walkthrough, refer to the Mini-RAG video series: Mini-RAG – From Notebooks to Production
- Ollama: https://ollama.com/
- Qdrant: https://qdrant.tech/documentation/
- FastAPI: https://fastapi.tiangolo.com/
- MongoDB: https://www.mongodb.com/