Skip to content

Latest commit

 

History

History
208 lines (157 loc) · 7.99 KB

File metadata and controls

208 lines (157 loc) · 7.99 KB

NotebookLM Source-Pack Integration Plan

Planning document for making NotebookLM Source Pack Builder an optional Memory Stack component without making Google/NotebookLM authentication mandatory.

Goal

NotebookLM Automation remains its own project/board. Hermes Memory Stack should package only the safe integration surface:

  • an opt-in installer path;
  • public docs explaining when to enable it;
  • script-only cron/update guidance;
  • strict cookie/auth boundaries;
  • compact llmwiki inventory update rules.

The default Memory Stack install must continue to work without Google login, NotebookLM, browser cookies, or any external API key.

Component boundary

Memory Stack owns:

  • optional installer prompts and helper-script placement;
  • safe example config for public documentation roots;
  • cron installer wrapper for already-authenticated users;
  • docs, safety policy, and uninstall behavior;
  • routing metadata sync into llmwiki.

The independent NotebookLM Automation / Source Pack Builder project owns:

  • independent project GitHub repo: https://github.com/yelixir-dev/notebooklm-source-pack-builder.git;
  • builder installation/update from GitHub, never from a private local working tree;
  • crawling and extraction implementation;
  • bundle generation and manifest logic;
  • NotebookLM upload/deploy/refresh CLI behavior;
  • tests for source-pack builder internals;
  • source-specific pack recipes.

Memory Stack should call the builder as a dependency or copied helper only after the builder has a stable CLI contract. It should not embed private notebooks, local source packs, Google auth state, or Yorha-specific corpus data.

Proposed installer path

Current NotebookLM install remains off by default:

Install optional NotebookLM CLI support? [y/N]

Add a second-level prompt only if NotebookLM support was selected:

Install optional NotebookLM source-pack helper? [y/N]

If selected, the installer should:

  1. check https://github.com/yelixir-dev/notebooklm-source-pack-builder.git on the selected ref (main by default);
  2. install or update nlm-pack with uv tool install "git+...@main" --force when the GitHub HEAD changed or the CLI is missing;
  3. copy a safe example config to ~/.hermes/notebooklm-source-packs/examples/ or keep it in the repo under config/;
  4. copy a cron wrapper script to ~/.hermes/scripts/;
  5. print login and verification instructions instead of attempting to type credentials;
  6. activate upload/cron only if notebooklm auth check --test --json succeeds;
  7. otherwise leave scripts installed but inactive and print the exact post-login command.

Suggested public repo files:

config/notebooklm-source-pack.example.yaml
scripts/install-notebooklm-source-pack-sync.sh

Suggested local runtime layout:

~/.notebooklm-source-packs/<pack>/
  config.yaml
  manifest.json
  history.jsonl
  bundles/
  snapshots/
~/.hermes/scripts/notebooklm_source_pack_<pack>_refresh.sh

These runtime files must stay outside the public repository. The repo .gitignore also ignores local .notebooklm-source-packs/ / notebooklm-source-packs/ directories in case a maintainer creates a pack scratch area inside the checkout by mistake.

CLI contract expected from Source Pack Builder

Memory Stack should depend on the GitHub-published builder, not Yorha's internal working tree. The default install/update source is:

uv tool install "git+https://github.com/yelixir-dev/notebooklm-source-pack-builder.git@main" --force

The stack-owned installer and cron wrapper should compare GitHub HEAD with a local recorded SHA under ~/.hermes/notebooklm-source-packs/; when GitHub changes, update nlm-pack before running source-pack sync/refresh.

After installation/update, Memory Stack should depend on a small, stable command surface:

nlm-pack sync <pack-name> --config <config.yaml> --no-upload --json
nlm-pack deploy <pack-name> --config <config.yaml> --json
nlm-pack refresh <pack-name> --config <config.yaml> --versioned --upload --json

Minimum behavior required before Memory Stack packages it:

  • uses explicit --notebook <id> / stored notebook_id; does not rely on shared notebooklm use state;
  • keeps manifests secret-free: URL, title, hash, source IDs, timestamps, status only;
  • bundles many small pages into Markdown sources to avoid NotebookLM source-count churn;
  • supports --no-upload local dry-run mode;
  • supports versioned non-destructive refresh for docs where old commands matter;
  • exits quietly on no-op refresh so Hermes script-only cron can stay silent.

Example source-pack config shape

name: hermes-agent-docs
title: Hermes Agent Docs
root_url: https://hermes-agent.nousresearch.com/docs
scope:
  same_domain: true
  path_prefixes:
    - /docs
  max_pages: 100
  max_depth: 4
  exclude:
    - "*/api/private/*"
notebooklm:
  notebook_id: null
  retention: versioned
  warn_source_count: 45
llmwiki_inventory:
  enabled: true
  target: ~/wiki/_meta/notebooklm-inventory.md
schedule:
  enabled: false
  cron: "0 9 * * 1"

Cron/update strategy

Use Hermes script-only cron (no_agent=True) for refresh checks. The script owns the exact output:

  • empty stdout on no material change;
  • concise stdout only when a new snapshot/source was added or attention is needed;
  • non-zero exit on real failure so Hermes alerts the user.

Recommended wrapper behavior:

#!/usr/bin/env bash
set -euo pipefail
export PATH="$HOME/.local/bin:/opt/homebrew/bin:/usr/local/bin:$PATH"
nlm-pack refresh <pack-name> \
  --config "$HOME/.notebooklm-source-packs/<pack>/config.yaml" \
  --versioned \
  --upload \
  --json
python3 "$HOME/.hermes/scripts/sync_notebooklm_inventory.py"

Refresh policy:

  1. discover docs with llms-full.txt, llms.txt, sitemap, static sidebar/nav extraction, then bounded crawl;
  2. extract clean Markdown with a stable extractor;
  3. normalize non-content noise before hashing;
  4. compare content hashes against manifest.json / history.jsonl;
  5. if unchanged, no-op quietly;
  6. if changed, write a dated snapshot and upload the new bundle;
  7. warn near source limits and suggest archive notebooks;
  8. run NotebookLM inventory sync after successful upload.

Safety boundaries

Hard rules:

  • Never commit or copy Google cookies, storage_state.json, browser profiles, OAuth tokens, API keys, private notebooks, or local source bodies into this repo.
  • Never make NotebookLM, Google login, browser automation, or source-pack refresh part of the default install path.
  • Never upload private/local files to Google-hosted NotebookLM unless the user explicitly asks for that specific corpus.
  • Treat notebooklm-py as unofficial consumer automation that can break or rate-limit.
  • Keep source-pack configs public-safe by default: public docs URL roots only, no credentials.
  • Uninstall should remove stack-owned helper scripts/cron registrations only; it should not delete NotebookLM accounts, notebooks, cookies, or unrelated source-pack data unless explicitly requested.

llmwiki inventory update rules

The llmwiki inventory is a routing map, not a mirror. It may store:

  • notebook title;
  • notebook ID;
  • source count;
  • source title, URL/type/status;
  • short routing summary and keywords;
  • last sync time.

It must not store:

  • Google cookies or auth state;
  • full NotebookLM source bodies;
  • private local file contents;
  • secrets or personal memory.

After every successful deploy/refresh, run the existing inventory sync so future agents can discover which NotebookLM notebook to query. Source-grounded answers should still query NotebookLM directly after routing.

Implementation card backlog

  1. Add docs and config examples for source-pack integration.
  2. Add installer prompt and copy/install script path, still default-off.
  3. Add source-pack cron installer wrapper with auth check and no-agent semantics.
  4. Update uninstall to remove only stack-owned source-pack cron wrappers.
  5. Add smoke tests: shell syntax, example config parse, no-auth path, authenticated path mocked.
  6. Run reviewer pass before merge, with special attention to auth/cookie boundaries.