PDFChat - Intelligent Document Conversation Assistant

PDFChat is an advanced document interaction tool that transforms the way users engage with PDF documents. By leveraging the power of LLMs and vector databases, it enables natural conversation-style interactions with document content, making information retrieval more intuitive and efficient.

🌟 Features

Interactive Chat Interface: Natural conversation-style interaction with your documents
Multi-Document Support: Process and analyze multiple PDF files simultaneously
Intelligent Context Retention: Maintains conversation history for more coherent interactions
Real-time Processing: Dynamic document processing with progress indicators
User-Friendly Interface: Clean, intuitive Streamlit-based UI
Vector-Based Search: Efficient document querying using FAISS vector storage

🔧 Technical Architecture

The application is built with several key components:

Document Processing Pipeline:
- PDF text extraction using PyPDF2
- Text chunking with RecursiveCharacterTextSplitter
- Vector embedding generation using OpenAI Embeddings
- Vector storage using FAISS
Conversation Chain:
- GPT-4 language model integration
- Conversation buffer memory for context retention
- ConversationalRetrievalChain for managing Q&A flow
User Interface:
- Streamlit-based frontend
- File upload functionality
- Real-time chat interface
- Processing status indicators

Add the following to your .env file:

OPENAI_API_KEY=your_openai_api_key_here

🎮 Usage

You can use the application in two ways:
- Upload custom PDF documents using the sidebar
- Use the quick access feature to analyze the pre-configured CV
After processing the documents, interact with the content through the chat interface

💡 Code Structure and Examples

PDF Text Extraction

Extracts text from PDF files

def get_pdf_text(pdf_docs):
    text = ""
    if isinstance(pdf_docs, str):  # For file paths
        try:
            pdf_reader = PdfReader(pdf_docs)
            for page in pdf_reader.pages:
                text += page.extract_text()
        except Exception as e:
            st.error(f"Error reading PDF file: {str(e)}")
    else:  # For uploaded files
        for pdf in pdf_docs:
            try:
                pdf_reader = PdfReader(pdf)
                for page in pdf_reader.pages:
                    text += page.extract_text()
            except Exception as e:
                st.error(f"Error reading PDF file: {str(e)}")
    return text

Text Chunking

Splits text into manageable chunks with overlap

def get_text_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(
        separators=[" ", ",", "\n"],
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )
    chunks = text_splitter.split_text(text)
    return chunks

Vector Store Creation

Creates and manages vector embeddings

def get_vectorstore(chunks, openai_api_key):
    embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
    vectorstore = FAISS.from_texts(texts=chunks, embedding=embeddings)
    return vectorstore

Conversation Chain Setup

Sets up the conversation pipeline

def get_conversation_chain(vectorstore):
    llm = ChatOpenAI(model='gpt-4')
    memory = ConversationBufferMemory(
        memory_key='chat_history',
        return_messages=True
    )
    conversation_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=vectorstore.as_retriever(),
        memory=memory,
        max_tokens_limit=4000
    )
    return conversation_chain

Chat Handler

Processes user inputs and manages chat history

def handle_user_question(prompt):
    response = st.session_state.conversation({'question': prompt})
    st.session_state.chat_history = response['chat_history']

    for i, message in enumerate(st.session_state.chat_history):
        if i % 2 == 0:
            with st.chat_message('user'):
                st.markdown(message.content)
        else:
            with st.chat_message('assistant'):
                st.markdown(message.content)

Complete Application Flow Example

# Usage in Streamlit app
if st.button('Process'):
    with st.spinner('Processing...'):
        # Get pdf text
        raw_text = get_pdf_text(pdf_docs)

        # Break into chunks
        text_chunks = get_text_chunks(raw_text)

        # Vector Store
        vectorstore = get_vectorstore(text_chunks, openai_api_key)

        # Create Conversation Chain
        st.session_state.conversation = get_conversation_chain(vectorstore)
        st.success("Documents processed successfully!")

⚠️ Important Notes

The application requires a valid OpenAI API key
Large PDF files may take longer to process
The chat history is maintained only for the current session
The model uses GPT-4 for optimal performance

🔒 Security Considerations

API keys are managed through environment variables
Document processing is done locally
No permanent storage of uploaded documents
Session-based conversation history

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
billy_koech_cv.pdf		billy_koech_cv.pdf
htmlTemplates.py		htmlTemplates.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDFChat - Intelligent Document Conversation Assistant

🌟 Features

🔧 Technical Architecture

🎮 Usage

💡 Code Structure and Examples

PDF Text Extraction

Text Chunking

Vector Store Creation

Conversation Chain Setup

Chat Handler

Complete Application Flow Example

⚠️ Important Notes

🔒 Security Considerations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDFChat - Intelligent Document Conversation Assistant

🌟 Features

🔧 Technical Architecture

🎮 Usage

💡 Code Structure and Examples

PDF Text Extraction

Text Chunking

Vector Store Creation

Conversation Chain Setup

Chat Handler

Complete Application Flow Example

⚠️ Important Notes

🔒 Security Considerations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages