4.1 KiB
Minimal RAG PDF Reader
Introduction
This repository contains a begginer implemention of a very basic Retrieval Augmented Generated (RAG) LLM for PDFs. It is meant as a simple exercise with RAGs, but also demonstrates my attempts at creating a minimal RAG implementation that runs locally without API usage during execution.
Setup
This document is mainly meant for personal use, and thusly there will not be extensive explanation or instruction for how to setup this repository. Those familiary with git and python should be well versed in these procedures.
Cloning the repo:
git clone <this_url> && \
cd minimal_rag_pdf
Starting the environment:
python -m venv .venv && \
source .venv/bin/activate
Upgrading pip
python -m pip install --upgrade pip
CUDA GCC Version mismatch solve:
There is potentially a mismatch when setting up llama-cpp-python on different
systems. Please refer to their
documentation.
The following environment variable declarations and subsequent installation with proper flags is what got it working on my personal machine. Note that depending on your system the compile time can take a while:
CC=gcc-14 CXX=g++-14 \
CUDACXX=/opt/cuda/bin/nvcc \
CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-14" \
python -m pip install llama-cpp-python --no-cache-dir --force-reinstall
Installing requirements:
python -m pip install -r requirements.txt
Environment variables:
cp env.sample .env
Note that if you don't use the exact same LLM model, embedding model, and PDF that I used, this application will not work without you changing the environment variable names.
Downloading the models
You can use more powerful models than the ones I used if you so choose, but if you want to just run what I tried, you can find the instructions here. Please note that I have a very low end GPU and low end CPU, so I could only use very low parameter LLMs.
Head over to HuggingFace and download the bartowski/Qwen2.5-Coder-7B-Instruct-abliterated-GGUF LLM model, and the CompendiumLabs/bge-small-en-v1.5-q8_0 embedding model.
Note that these models should be placed in the models folder. If it doesn't
exist, go ahead and make it:
mkdir models
And if you used a different LLM model and/or embedding model, make sure to
change the name(s) in the .env file.
Finding PDFs
I made this script just as a novelty, and currently it only reads a single PDF
as data for the RAG. If you want to replicate what I did exactly, I ended up
feeding the RAG the Linux Essentials Study Guide from
LPI. Any PDF that you
do want to use should be placed in the documents folder. Again, if it doesn't
exist, go ahead and make it:
mkdir documents
And if you used a different PDF document, make sure to change the name in the
.env file.
Running the application
python main.py
The first time running the application, it will populate the sqlite DB with the vectorized embeddings, so just let it do its thing. After that initial populating of the database, it should run much faster (especially with GPU acceleration).
Notes/Disclaimer
It's worth noting this is a very very basic RAG application. It uses sqlite-vec instead of ChromaDB just as an exploration into alternatives. It doesn't utilize LangChain or LlamaIndex or bring in a bunch of APIs. It does utilize LLama CPP via llama-cpp-python to bring in the LLM and embedding models, as well as pypdf to read the PDF file.
This project is not meant to be utilized in any commercial way, but is purely educational in purpose.