130 lines
4.1 KiB
Markdown
130 lines
4.1 KiB
Markdown
# Minimal RAG PDF Reader
|
|
|
|
## Introduction
|
|
|
|
This repository contains a begginer implemention of a very basic Retrieval
|
|
Augmented Generated (RAG) LLM for PDFs. It is meant as a simple exercise with
|
|
RAGs, but also demonstrates my attempts at creating a minimal RAG implementation
|
|
that runs locally without API usage during execution.
|
|
|
|
## Setup
|
|
|
|
This document is mainly meant for personal use, and thusly there will not be
|
|
extensive explanation or instruction for how to setup this repository. Those
|
|
familiary with git and python should be well versed in these procedures.
|
|
|
|
**Cloning the repo:**
|
|
|
|
```sh
|
|
git clone <this_url> && \
|
|
cd minimal_rag_pdf
|
|
```
|
|
|
|
**Starting the environment:**
|
|
|
|
```sh
|
|
python -m venv .venv && \
|
|
source .venv/bin/activate
|
|
```
|
|
|
|
**Upgrading pip**
|
|
|
|
```sh
|
|
python -m pip install --upgrade pip
|
|
```
|
|
|
|
**CUDA GCC Version mismatch solve:**
|
|
|
|
There is potentially a mismatch when setting up `llama-cpp-python` on different
|
|
systems. Please refer to their
|
|
[documentation](https://llama-cpp-python.readthedocs.io/en/latest/).
|
|
|
|
The following environment variable declarations and subsequent installation with
|
|
proper flags is what got it working on my personal machine. Note that depending
|
|
on your system the compile time can take a while:
|
|
|
|
```sh
|
|
CC=gcc-14 CXX=g++-14 \
|
|
CUDACXX=/opt/cuda/bin/nvcc \
|
|
CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/gcc-14" \
|
|
python -m pip install llama-cpp-python --no-cache-dir --force-reinstall
|
|
```
|
|
|
|
**Installing requirements:**
|
|
|
|
```sh
|
|
python -m pip install -r requirements.txt
|
|
```
|
|
|
|
**Environment variables:**
|
|
|
|
```sh
|
|
cp env.sample .env
|
|
```
|
|
|
|
Note that if you don't use the exact same LLM model, embedding model, and PDF
|
|
that I used, this application will not work without you changing the environment
|
|
variable names.
|
|
|
|
## Downloading the models
|
|
|
|
You can use more powerful models than the ones I used if you so choose, but if
|
|
you want to just run what I tried, you can find the instructions here. Please
|
|
note that I have a very low end GPU and low end CPU, so I could only use very
|
|
low parameter LLMs.
|
|
|
|
Head over to [HuggingFace](https://huggingface.co/) and download the
|
|
[bartowski/Qwen2.5-Coder-7B-Instruct-abliterated-GGUF](https://huggingface.co/bartowski/Qwen2.5-Coder-7B-Instruct-abliterated-GGUF/blob/main/Qwen2.5-Coder-7B-Instruct-abliterated-Q4_K_L.gguf)
|
|
LLM model, and the
|
|
[CompendiumLabs/bge-small-en-v1.5-q8_0](https://huggingface.co/CompendiumLabs/bge-small-en-v1.5-gguf/blob/main/bge-small-en-v1.5-q8_0.gguf)
|
|
embedding model.
|
|
|
|
Note that these models should be placed in the `models` folder. If it doesn't
|
|
exist, go ahead and make it:
|
|
|
|
```sh
|
|
mkdir models
|
|
```
|
|
|
|
And if you used a different LLM model and/or embedding model, make sure to
|
|
change the name(s) in the `.env` file.
|
|
|
|
## Finding PDFs
|
|
|
|
I made this script just as a novelty, and currently it only reads a single PDF
|
|
as data for the RAG. If you want to replicate what I did exactly, I ended up
|
|
feeding the RAG the Linux Essentials Study Guide from
|
|
[LPI](https://learning.lpi.org/en/learning-materials/010-160/). Any PDF that you
|
|
do want to use should be placed in the `documents` folder. Again, if it doesn't
|
|
exist, go ahead and make it:
|
|
|
|
```sh
|
|
mkdir documents
|
|
```
|
|
|
|
And if you used a different PDF document, make sure to change the name in the
|
|
`.env` file.
|
|
|
|
## Running the application
|
|
|
|
```sh
|
|
python main.py
|
|
```
|
|
|
|
The first time running the application, it will populate the sqlite DB with the
|
|
vectorized embeddings, so just let it do its thing. After that initial
|
|
populating of the database, it should run much faster (especially with GPU
|
|
acceleration).
|
|
|
|
## Notes/Disclaimer
|
|
|
|
It's worth noting this is a very very basic RAG application. It uses
|
|
[sqlite-vec](https://www.sqlite.ai/sqlite-vector) instead of ChromaDB just as an
|
|
exploration into alternatives. It doesn't utilize LangChain or LlamaIndex or
|
|
bring in a bunch of APIs. It does utilize [LLama CPP](https://llama-cpp.com/)
|
|
via [llama-cpp-python](https://llama-cpp-python.readthedocs.io/en/latest/) to
|
|
bring in the LLM and embedding models, as well as
|
|
[pypdf](https://pypdf.readthedocs.io/en/stable/) to read the PDF file.
|
|
|
|
This project is not meant to be utilized in any commercial way, but is purely
|
|
educational in purpose.
|