Skip to content

Pallavrai/DocCraft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📝 DocCraft: AI-Powered Document Formatter

Transform your unformatted documents into professionally styled documents using the power of AI. DocCraft learns formatting patterns from your reference documents and applies them intelligently to your raw content.

🚀 Features

  • AI-Powered Style Learning: Uses Google Gemini AI to understand and replicate document formatting patterns
  • Intelligent Block Classification: Automatically categorizes text blocks (headings, paragraphs, lists, etc.)
  • Seamless DOCX Processing: Works with Microsoft Word documents (.docx format)
  • User-Friendly Interface: Built with Streamlit for an intuitive web-based experience
  • Batch Processing: Format entire documents in seconds
  • Style Preservation: Maintains font sizes, bold, italic, underline, and other formatting attributes

🛠️ How It Works

  1. Upload Raw Document: Provide your unformatted DOCX file
  2. Upload Reference Document: Provide a formatted DOCX file as a style template
  3. AI Analysis: DocCraft analyzes the formatting patterns in your reference document
  4. Smart Application: The AI applies similar formatting to your raw document
  5. Download Result: Get your professionally formatted document instantly

📋 Prerequisites

  • Python 3.13+
  • Google Gemini API key
  • Google Cloud Service Account (optional, for enhanced features)

⚡ Quick Start

1. Clone the Repository

git clone <repository-url>
cd DocCraft

2. Install Dependencies

pip install -r requirements.txt

3. Set Up Environment Variables

Create a .env file in the project root:

export GEMINI_API_KEY=your_gemini_api_key_here
export GOOGLE_SERVICE_ACCOUNT_JSON='{"type": "service_account", ...}'

4. Run the Application

streamlit run app.py

5. Open Your Browser

Navigate to http://localhost:8501 to start using DocCraft!

🔧 Installation

Using pip

pip install -e .

Dependencies

  • streamlit - Web interface
  • python-docx - DOCX file processing
  • langchain - AI framework
  • langchain-google-genai - Google Gemini integration
  • google-api-python-client - Google API client
  • python-dotenv - Environment variable management
  • watchdog - File monitoring

📖 Usage Examples

Basic Usage

from doccraft import DocCraft

# Initialize DocCraft
formatter = DocCraft(api_key="your_gemini_key")

# Format a document
formatted_doc = formatter.format_document(
    raw_file="unformatted.docx",
    reference_file="template.docx"
)

# Save the result
formatted_doc.save("formatted_output.docx")

Web Interface

  1. Start the Streamlit app: streamlit run app.py
  2. Upload your raw DOCX file
  3. Upload your reference/template DOCX file
  4. Click "Format and Download DOCX"
  5. Download your formatted document

🎯 Use Cases

  • Academic Papers: Apply consistent formatting to research documents
  • Business Reports: Maintain corporate style guidelines across documents
  • Legal Documents: Ensure uniform formatting for legal briefs and contracts
  • Technical Documentation: Standardize formatting for manuals and guides
  • Content Migration: Convert documents between different style formats

⚙️ Configuration

Environment Variables

Variable Description Required
GEMINI_API_KEY Your Google Gemini API key Yes
GOOGLE_SERVICE_ACCOUNT_JSON Google Cloud service account JSON Optional

Supported Formats

  • Input: Microsoft Word (.docx)
  • Output: Microsoft Word (.docx)
  • Styling: Font size, bold, italic, underline, colors, alignment

🔒 API Rate Limits

DocCraft includes built-in rate limiting and retry logic for Google Gemini API:

  • Automatic retry on rate limit exceeded
  • 60-second backoff on resource exhaustion
  • Optimized prompt engineering to minimize API calls

🚨 Troubleshooting

Common Issues

API Key Not Found

Error: Gemini API key not found
Solution: Ensure GEMINI_API_KEY is set in your environment variables

File Upload Issues

Error: Could not process DOCX file
Solution: Ensure the file is a valid .docx format (not .doc)

Memory Issues with Large Documents

Solution: Break large documents into smaller sections

🛣️ Roadmap

  • Support for PDF files
  • Advanced style customization
  • Bulk document processing
  • Integration with Google Docs
  • Custom style templates
  • API endpoint for programmatic access

🤝 Contributing

This is a proprietary project. For collaboration opportunities or feature requests, please contact the project maintainer.

📄 License

This project is proprietary software. See LICENSE file for details.

🆘 Support

For support, feature requests, or licensing inquiries:

🏆 Why DocCraft?

  • Time-Saving: Format documents in seconds, not hours
  • Consistency: Ensure uniform styling across all documents
  • AI-Powered: Leverage cutting-edge AI for intelligent formatting
  • Professional: Create polished, professional-looking documents
  • Easy to Use: No technical expertise required

Made with ❤️ for document formatting excellence

About

Transform your unformatted documents into professionally styled documents using the power of AI. DocCraft learns formatting patterns from your reference documents and applies them intelligently to your raw content.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages