Minimal RAG PDF Reader

Introduction

This repository contains a begginer implemention of a very basic Retrieval Augmented Generated (RAG) LLM for PDFs. It is meant as a simple exercise with RAGs, but also demonstrates my attempts at creating a minimal RAG implementation that runs locally without API usage during execution.

Setup

This document is mainly meant for personal use, and thusly there will not be extensive explanation or instruction for how to setup this repository. Those familiary with git and python should be well versed in these procedures.

Cloning the repo:

git clone <this_url> && \
cd minimal_rag_pdf

Starting the environment:

python -m venv .venv && \
source .venv/bin/activate

Upgrading pip

python -m pip install --upgrade pip

CUDA GCC Version mismatch solve:

There is potentially a mismatch when setting up llama-cpp-python on different systems. Please refer to their documentation.

The following environment variable declarations and subsequent installation with proper flags is what got it working on my personal machine. Note that depending on your system the compile time can take a while:

CC=gcc-14 CXX=g++-14 \
CUDACXX=/opt/cuda/bin/nvcc \
CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-14" \
python -m pip install llama-cpp-python --no-cache-dir --force-reinstall

Installing requirements:

python -m pip install -r requirements.txt

Environment variables:

cp env.sample .env

Note that if you don't use the exact same LLM model, embedding model, and PDF that I used, this application will not work without you changing the environment variable names.

Downloading the models

You can use more powerful models than the ones I used if you so choose, but if you want to just run what I tried, you can find the instructions here. Please note that I have a very low end GPU and low end CPU, so I could only use very low parameter LLMs.

Head over to HuggingFace and download the bartowski/Qwen2.5-Coder-7B-Instruct-abliterated-GGUF LLM model, and the CompendiumLabs/bge-small-en-v1.5-q8_0 embedding model.

Note that these models should be placed in the models folder. If it doesn't exist, go ahead and make it:

mkdir models

And if you used a different LLM model and/or embedding model, make sure to change the name(s) in the .env file.

Finding PDFs

I made this script just as a novelty, and currently it only reads a single PDF as data for the RAG. If you want to replicate what I did exactly, I ended up feeding the RAG the Linux Essentials Study Guide from LPI. Any PDF that you do want to use should be placed in the documents folder. Again, if it doesn't exist, go ahead and make it:

mkdir documents

And if you used a different PDF document, make sure to change the name in the .env file.

Running the application

python main.py

The first time running the application, it will populate the sqlite DB with the vectorized embeddings, so just let it do its thing. After that initial populating of the database, it should run much faster (especially with GPU acceleration).

Notes/Disclaimer

It's worth noting this is a very very basic RAG application. It uses sqlite-vec instead of ChromaDB just as an exploration into alternatives. It doesn't utilize LangChain or LlamaIndex or bring in a bunch of APIs. It does utilize LLama CPP via llama-cpp-python to bring in the LLM and embedding models, as well as pypdf to read the PDF file.

This project is not meant to be utilized in any commercial way, but is purely educational in purpose.

4.1 KiB Raw Blame History