Skip to content

Retrieval: Check GRIN transfer to download material (Google books) #3

@liseli

Description

@liseli

Why is this use case interesting for this application? - Testing alternatives to retrieve datasets?

Harvard created a data pipeline (https://github.com/institutional/institutional-books-1-pipeline) and an associated tool for obtaining materials from Google Books (https://www.institutional.org/posts/grin-transfer).

We do not have access to the Google Books, but we could implement the same approach accessing the Hugging Face dataset instead.

  • GRIN Transfer: Download books
    • See how well it works
    • What does the output look like?
    • How long does the OCR cleanup process take?
    • Could it make sense to use FastAPI here, or does LangChain have a data pipeline to access this resource?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions