Python to R RAG Conversion Guide

This document explains the conversion from the Python LangChain-based RAG script to R using ragnar and ellmer packages.

Key Package Mappings

Python Package	R Package	Purpose
`langchain_community.document_loaders`	`ragnar::read_as_markdown()`	Document loading
`langchain_experimental.text_splitter.SemanticChunker`	`ragnar::markdown_chunk()`	Semantic chunking
`langchain_openai.embeddings.OpenAIEmbeddings`	`ragnar::embed_openai()`	Embeddings
`langchain_core.vectorstores.InMemoryVectorStore`	`ragnar::ragnar_store_create()`	Vector storage
`langchain_openai.ChatOpenAI`	`ellmer::chat_openai()`	Chat interface
`chatlas.ChatOllama`	`ellmer::chat_ollama()`	Local Ollama models

Main Differences

1. Document Loading

Python:

from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader(pdf_path)
pages = [page.page_content for page in loader.lazy_load()]

library(ragnar)
pdf_markdown <- read_as_markdown(pdf_path)

2. Chunking

Python:

from langchain_experimental.text_splitter import SemanticChunker
text_splitter = SemanticChunker(OpenAIEmbeddings(model="text-embedding-3-large"))
chunks = text_splitter.create_documents(pages)

chunks <- markdown_chunk(pdf_markdown)

3. Vector Store Creation

Python:

from langchain_core.vectorstores import InMemoryVectorStore
vectorstore = InMemoryVectorStore.from_texts(
    texts,
    embedding=OpenAIEmbeddings(model="text-embedding-3-large")
)

store <- ragnar_store_create(
  "store.duckdb",
  embed = \(x) embed_openai(x, model = "text-embedding-3-large")
)
ragnar_store_insert(store, chunks)
ragnar_store_build_index(store)

Note: R uses DuckDB for persistent storage instead of in-memory storage.

4. Retrieval

Python:

retriever = vectorstore.as_retriever(
    search_type="similarity", 
    search_kwargs={"score_threshold": 0.7, "k": 3}
)
retrieved_documents = retriever.invoke(user_query)

retrieved_chunks <- ragnar_retrieve(store, user_query, top_k = 3)

Note: ragnar_retrieve() combines both VSS (vector similarity) and BM25 text search.

5. Chat Interface

Python:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o", temperature=0)
structured_prompt = ChatPromptTemplate.from_template(prompt_template)
chain = structured_prompt | llm | StrOutputParser()
response = chain.invoke({"retrieved_documents": docs, "user_query": query})

library(ellmer)

chat <- chat_openai(
  model = "gpt-4o",
  system_prompt = "Your system prompt here"
)

# Manual retrieval approach
response <- chat$chat(formatted_prompt)

# Or use tool registration for automatic retrieval
ragnar_register_tool_retrieve(chat, store, top_k = 3)
response <- chat$chat(user_query)

6. Text Cleaning Functions

Python:

def clean_hyphenated_linebreaks(text):
    return re.sub(r'(\w+)-\n(\w+)', r'\1\2', text)

def fix_ligatures(text):
    ligature_map = {'ﬁ': 'fi', 'ﬂ': 'fl', ...}
    for ligature, replacement in ligature_map.items():
        text = text.replace(ligature, replacement)
    return text

library(stringr)

clean_hyphenated_linebreaks <- function(text) {
  str_replace_all(text, "(\\w+)-\\n(\\w+)", "\\1\\2")
}

fix_ligatures <- function(text) {
  ligature_map <- c(
    "\ufb01" = "fi",  # ﬁ
    "\ufb02" = "fl",  # ﬂ
    ...
  )
  for (i in seq_along(ligature_map)) {
    text <- str_replace_all(text, names(ligature_map)[i], ligature_map[i])
  }
  return(text)
}

Key Advantages of ragnar + ellmer

Integrated workflow: ragnar provides end-to-end RAG pipeline
Persistent storage: Uses DuckDB for efficient, persistent vector storage
Hybrid search: Combines VSS and BM25 automatically
Tool integration: ragnar_register_tool_retrieve() lets LLMs retrieve on-demand
Tidyverse-friendly: Works well with dplyr/tidyverse workflows

Environment Setup

Make sure you have your OpenAI API key set:

# In your .Renviron file (edit with usethis::edit_r_environ())
OPENAI_API_KEY="your-api-key-here"

Installation

install.packages("ragnar")
install.packages("ellmer")
install.packages("stringr")

Running the Script

source("main.R")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python to R RAG Conversion Guide

Key Package Mappings

Main Differences

1. Document Loading

2. Chunking

3. Vector Store Creation

4. Retrieval

5. Chat Interface

6. Text Cleaning Functions

Key Advantages of ragnar + ellmer

Environment Setup

Installation

Running the Script

FilesExpand file tree

README_conversion.md

Latest commit

History

README_conversion.md

File metadata and controls

Python to R RAG Conversion Guide

Key Package Mappings

Main Differences

1. Document Loading

2. Chunking

3. Vector Store Creation

4. Retrieval

5. Chat Interface

6. Text Cleaning Functions

Key Advantages of ragnar + ellmer

Environment Setup

Installation

Running the Script