minimal_rag_pdf/README.md

# Minimal RAG PDF Reader

## Introduction

This repository contains a begginer implemention of a very basic Retrieval
Augmented Generated (RAG) LLM for PDFs. It is meant as a simple exercise with
RAGs, but also demonstrates my attempts at creating a minimal RAG implementation
that runs locally without API usage during execution.

## Setup

This document is mainly meant for personal use, and thusly there will not be
extensive explanation or instruction for how to setup this repository. Those
familiary with git and python should be well versed in these procedures.

**Cloning the repo:**

```sh
git clone <this_url> && \
cd minimal_rag_pdf
```

**Starting the environment:**

```sh
python -m venv .venv && \
source .venv/bin/activate
```

**Upgrading pip**

```sh
python -m pip install --upgrade pip
```

**CUDA GCC Version mismatch solve:**

There is potentially a mismatch when setting up `llama-cpp-python` on different
systems. Please refer to their
[documentation](https://llama-cpp-python.readthedocs.io/en/latest/).

The following environment variable declarations and subsequent installation with
proper flags is what got it working on my personal machine. Note that depending
on your system the compile time can take a while:

```sh
CC=gcc-14 CXX=g++-14 \
CUDACXX=/opt/cuda/bin/nvcc \
CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-14" \
python -m pip install llama-cpp-python --no-cache-dir --force-reinstall
```

**Installing requirements:**

```sh
python -m pip install -r requirements.txt
```

**Environment variables:**

```sh
cp env.sample .env
```

Note that if you don't use the exact same LLM model, embedding model, and PDF
that I used, this application will not work without you changing the environment
variable names.

## Downloading the models

You can use more powerful models than the ones I used if you so choose, but if
you want to just run what I tried, you can find the instructions here. Please
note that I have a very low end GPU and low end CPU, so I could only use very
low parameter LLMs.

Head over to [HuggingFace](https://huggingface.co/) and download the
[bartowski/Qwen2.5-Coder-7B-Instruct-abliterated-GGUF](https://huggingface.co/bartowski/Qwen2.5-Coder-7B-Instruct-abliterated-GGUF/blob/main/Qwen2.5-Coder-7B-Instruct-abliterated-Q4_K_L.gguf)
LLM model, and the
[CompendiumLabs/bge-small-en-v1.5-q8_0](https://huggingface.co/CompendiumLabs/bge-small-en-v1.5-gguf/blob/main/bge-small-en-v1.5-q8_0.gguf)
embedding model.

Note that these models should be placed in the `models` folder. If it doesn't
exist, go ahead and make it:

```sh
mkdir models
```

And if you used a different LLM model and/or embedding model, make sure to
change the name(s) in the `.env` file.

## Finding PDFs

I made this script just as a novelty, and currently it only reads a single PDF
as data for the RAG. If you want to replicate what I did exactly, I ended up
feeding the RAG the Linux Essentials Study Guide from
[LPI](https://learning.lpi.org/en/learning-materials/010-160/). Any PDF that you
do want to use should be placed in the `documents` folder. Again, if it doesn't
exist, go ahead and make it:

```sh
mkdir documents
```

And if you used a different PDF document, make sure to change the name in the
`.env` file.

## Running the application

```sh
python main.py
```

The first time running the application, it will populate the sqlite DB with the
vectorized embeddings, so just let it do its thing. After that initial
populating of the database, it should run much faster (especially with GPU
acceleration).

## Notes/Disclaimer

It's worth noting this is a very very basic RAG application. It uses
[sqlite-vec](https://www.sqlite.ai/sqlite-vector) instead of ChromaDB just as an
exploration into alternatives. It doesn't utilize LangChain or LlamaIndex or
bring in a bunch of APIs. It does utilize [LLama CPP](https://llama-cpp.com/)
via [llama-cpp-python](https://llama-cpp-python.readthedocs.io/en/latest/) to
bring in the LLM and embedding models, as well as
[pypdf](https://pypdf.readthedocs.io/en/stable/) to read the PDF file.

This project is not meant to be utilized in any commercial way, but is purely
educational in purpose.