This project converts PDF files into audiobooks with synchronized subtitles in .vtt format. It uses FastAPI for the backend and Microsoft's Edge TTS for text-to-speech conversion.
- Extracts text from PDF files.
- Converts extracted text into high-quality audio files (
.mp3). - Generates subtitle files (
.vtt) for the audio to provide synchronized captions. - Supports asynchronous processing for efficient and fast performance.
- Automatically cleans up temporary files after processing.
-
Clone the repository:
git clone https://github.com/your-repo/Text2Audio-Subtitles.git cd Text2Audio-Subtitles -
Install the required dependencies:
pip install -r .\requirements.txt
-
Start the FastAPI server:
python app.py
-
Open your browser and navigate to
http://127.0.0.1:8000/docsto access the Swagger UI. -
Use the
/convert_to_audiobook/endpoint to upload a PDF file. The server will process the file and generate the audiobook (.mp3) and subtitle (.vtt) files.
- The generated
.mp3and.vttfiles will be saved in theaudiobooksdirectory. - Temporary files (e.g., uploaded PDFs) will be automatically deleted after processing.
- Python 3.8 or higher
- Dependencies listed in
requirements.txt
- FastAPI for providing a modern web framework.
- Microsoft Edge TTS for text-to-speech capabilities.
- PyMuPDF (fitz) for PDF text extraction.