AI router status strip + HF token wiring + LiteRT/LEAP/share-intent fixes by biafra23 · Pull Request #12 · biafra23/URLVault

biafra23 · 2026-04-29T10:04:20Z

Summary

Started as a drop-in replacement for the AI-router debug Toast and grew, while running the demo on real devices, into a series of related fixes for the local-AI flow. All commits are independent and could be merged separately, but they share enough state (router events, MainActivity wiring, the AddEditBookmark flow) that splitting into multiple PRs would just mean repeated context for reviewers.

1. AI router status strip (replaces debug Toast)

The Toast spammed on every generateXxx call and obscured the bookmark form so the user couldn't interact while it was visible. Replaced with a slim auto-hiding strip pinned to the bottom of the screen:

shows <action>: <model>… with a spinner while inference runs,
shows <action> via <model> — 1.23 s on completion,
shows <action>: no model ready (<reason>) on failure,
auto-hides 3.5 s after Completed, 5 s after NoProvider; stays put while Running.

LocalModelRouter now wraps each provider call with monotonic-clock timing (System.nanoTime()) and emits a new RouteEvent.Completed { durationMs, success } from a try/finally so the strip never gets stuck on Running, even when the provider throws (notably CancellationException, which runCatching re-raises). The _events SharedFlow uses BufferOverflow.DROP_OLDEST so a slow consumer never silently drops the latest event and leaves the UI inconsistent.

The strip lives as the last child of the outer Column with the active screen wrapped in a weighted Box, so when AnimatedVisibility expands the strip from 0-height the screen above is pushed up — the Save / Cancel button row stays visible and tappable. Nav bar inset is consumed once on the outer Column via navigationBarsPadding(), so neither the screen's BottomAppBar nor the strip applies its own gesture-pill padding (which previously caused the BottomAppBar's box to grow visibly when the strip slid in).

DEBUG-only, matching the gating of the previous Toast.

2. Gated-model downloads now work — 401 fix + token UI + CI/local plumbing

Two real problems made every Gemma / FunctionGemma download fail with HTTP 401:

No way to enter a token. LocalModelPreferences.loadHfToken() and the authTokenProvider lambda in DI both existed, but no UI ever wrote a token, so the downloader's Authorization header was always absent.
Authorization leaked to the CDN. Hugging Face 302s gated downloads to a pre-signed cas-bridge.xethub.hf.co URL, which rejects the extra Authorization: Bearer … header with 401. The downloader was unconditionally re-attaching the Bearer token on every redirect hop.

Fixes:

ModelDownloadManager tracks the original request host and only attaches Authorization while still on it. Cross-origin redirects drop the header (matches browser/curl behaviour). Same-host check applied to both openWithRedirects and discoverTotalBytes.
New HuggingFaceTokenRow in SettingsScreen lets the user paste a token. Stored in EncryptedSharedPreferences via the existing saveHfToken. Row surfaces three states (Saved, Using token bundled with this build, Required).
Build-time fallback baked into BuildConfig.HF_TOKEN_DEFAULT. Three sources, in precedence order: user-entered → HF_TOKEN env var → hfToken property in local.properties (gitignored). Non-token characters are filtered at build time so the generated string literal is always safe. LocalModelPreferences.loadHfToken() falls back to BuildConfig when nothing user-set; loadUserHfToken() / hasBuildTimeHfToken() accessors let the UI distinguish the two for honest copy.
CI: .github/workflows/build.yml and release.yml now read HF_TOKEN from a GitHub repo secret. Builds still succeed with the secret absent.
README: new "Hugging Face Token (gated models)" section explaining all three sources, the per-repo licence acknowledgement step, the security caveat about reverse-engineering the APK, and the cross-origin auth scrubbing.

3. LiteRT-LM runtime fixes

Two device-discovered issues, both surfaced via adb logcat:

Text-only bundles failed on every backend. EngineConfig was passing visionBackend = backend and audioBackend = backend, telling the SDK to enable those modalities for every load. Text-only bundles (FunctionGemma 270M, Gemma 3 270M, Qwen3 0.6B…) lack a TF_LITE_VISION_ENCODER section, and initialize() then threw NOT_FOUND: TF_LITE_VISION_ENCODER not found in the model. on every backend in the NPU → GPU → CPU walk. Fix: pass null for both. None of today's catalog entries are multi-modal; revisit per-entry when a real Gemma 4 E2B multi-modal bundle gets wired up.
Pixel 7a / Tensor G2: "Can not find OpenCL library on this device". GPU Engine.initialize() succeeds, but Session.generateContent then throws because the LiteRT-LM Top-K sampler dlopens OpenCL even on the WebGPU path, and Tensor doesn't ship OpenCL drivers. Self-heal in runCollect: catch the failure, blocklist the broken backend for the rest of the process, reload the same model on the next candidate (typically CPU here), retry generation once. Mutex held throughout. Subsequent calls skip the broken backend entirely. Recovery detection is narrow today (only matches opencl); widen as new device-specific failures show up.

4. LEAP runtime fixes

Degenerate sampler output. Observed for a page where LFM2-Extract had little to extract from, the bridge returned {"description":":\",\",\",\",\",\","..."} — JSON shape valid (the grammar constraint did its job), value garbage (the sampler picked whatever satisfies the grammar when cornered). New looksDegenerate heuristic checks letter ratio, distinct-char count, and minimum length; rejected output throws and the UI surfaces "AI description generation failed" instead of persisting :","",… as the bookmark's description. Applied to both description and title (tags pass through length/charset filters that would already catch this).
Strengthened description prompt. Explicitly requires natural-language sentences and forbids punctuation-only output; offers the exact canonical fallback No summary available. when the supplied text doesn't have enough to extract, so the model has a way out instead of being cornered by the grammar. Detection is the safety net; the prompt change is what we hope reduces the failure rate.

5. Bookmark-flow behaviour fixes

Tags fire even when description fails. Previously tags were chained only on description-success; with the new degenerate-rejection landing on the failure branch (and observed: same URL where LEAP's description sampler went sideways produced perfectly clean tags), the user was getting nothing. Tags are an independent extraction; they now run from URL+title alone on description error.
Share-intent re-triggers AI when aiCoreEnabled flips on after a startup race. When a URL is shared, MainActivity composes with aiCoreEnabled briefly false because anyProviderReady is computed via produceState on Dispatchers.Default and hasn't finished by the time LaunchedEffect(prefilledUrl) fires. That sent the first trigger down the legacy regex-extraction branch, and the single-key LaunchedEffect never re-ran when AI later flipped on. Re-keyed on aiCoreEnabled and forced re-trigger on flip; the legacy result-handling effects already gate their applies on if (!aiCoreEnabled), so racing legacy results are discarded harmlessly when AI takes over.

6. Debug provenance tag

DEBUG builds now append a synthetic last tag of the form <sdk>:<model>:<duration> to every successful generateTags call, so a glance at the saved bookmark tells you which SDK / model variant produced those tags and how long it took:

aicore:gemini-nano-active:1.23s
llama:lfm2-1.2b-q4_k_m:4.56s
leap:lfm2-1.2b-extract:2.34s
liteRt:qwen3-0.6b:850ms

ms under 1 s, two-decimal seconds above. Locale-independent (no String.format). Stripped at the router level in release so synced Bitwarden bookmarks never carry the marker into production. Only affects the bookmark-form path — ModelComparisonRunner is unaffected (its UI already labels each row).

7. UI polish

Icons.Default.Sync and Icons.Default.Settings instead of 🔄 / ⚙️ emoji in BookmarkListScreen. Both gain proper contentDescription for a11y. Required compose.materialIconsExtended in shared/commonMain (R8 strips unused).
Comparison runtime label says LiteRT-LM instead of the leaked enum constant MEDIAPIPE. Same mapping as the existing runtimeSectionLabel in Settings. The enum constant kept for now (renaming touches every provider class and the persisted catalog format).
System back gesture/button now mirrors the in-screen back arrows: Comparison → Settings, Settings/AddEdit → List. Disabled on List (root) so the OS default still closes the app. Without the explicit BackHandler, the Activity finished and the user was dropped to the home screen, which was reported as "back exits the app".

Test plan

Status strip

Suggest tags / share a URL — strip slides up showing tags: <model>… with spinner; on completion switches to tags via <model> — N ms/s and fades after 3.5 s.
Save / Cancel buttons stay visible and tappable while the strip is showing; the BottomAppBar's box does not grow when the strip slides in.
Cancel a generation mid-flight (e.g. backgrounding the app) — strip transitions out of Running into a Completed (failed) state instead of hanging.

HF token / gated downloads

Add a valid hf_… token in Settings → Local AI Models → "Hugging Face token". Verify it persists across an app restart (saved row reads Saved: hf_xxx…XXXX).
Acknowledge the licence on https://huggingface.co/litert-community/Gemma3-1B-IT (and the other two listed in README) on the HF web UI.
Hit Download on Google Gemma 3 1B IT (LiteRT-LM, int4). Confirm the request to huggingface.co carries Bearer auth, the redirect to cas-bridge.xethub.hf.co does not, and the file lands in Ready state with a .complete sidecar.
Build with HF_TOKEN=… set in the env: BuildConfig.HF_TOKEN_DEFAULT should be the token string. Without the env var: empty string. With hfToken=… in local.properties: same as env.
Ungated control: download Alibaba Qwen3 0.6B (LiteRT-LM) with no token configured — should still succeed.

LiteRT-LM

Load any text-only LiteRT bundle (FunctionGemma 270M, Gemma 3 270M, Qwen3 0.6B). Should no longer fail with NOT_FOUND: TF_LITE_VISION_ENCODER.
On Pixel Tensor (e.g. Pixel 7a): first generation logs Recovering from GPU runtime failure (… Can not find OpenCL library …) — blocklisting and reloading on next backend, then succeeds on CPU. Subsequent generations skip GPU directly.

LEAP

Generate description for a thin / extraction-hostile URL. Either real prose appears, or the description field shows an inline error and tags still populate. Logcat shows description rejected as degenerate for the rejection path.
Generate tags on the same URL — should always populate even when description fails.

Provenance tag

Saved bookmark has one extra tag of the form <sdk>:<model>:<duration>. Release builds (manually assembleRelease) do not include it.

UI polish

Top app bar shows the standard Material Sync and Settings icons (not emoji). TalkBack reads "Sync with Bitwarden" and "Settings".
Comparison row labels read AICore / llama.cpp / Leap / LiteRT-LM (no MEDIAPIPE).
System back from Settings → returns to bookmark list. From Comparison → returns to Settings. From AddEdit → returns to bookmark list. From bookmark list → closes the app.

The Toast spammed the screen on every generateXxx call and obscured the bookmark form so the user couldn't interact while it was visible. Replace it with a slim Material strip pinned to the bottom of the screen that: - shows "<action>: <model>…" with a spinner while inference runs - shows "<action> via <model> — 1.23 s" on completion - shows "<action>: no model ready (<reason>)" on failure - auto-hides 3.5 s after Completed, 5 s after NoProvider To time the call, LocalModelRouter now wraps each provider call with a wall-clock measurement and emits a new RouteEvent.Completed with durationMs + success. Picked is still emitted up front for the in-flight state. Layout: the strip lives as the last child of the outer Column with a weighted Box wrapping the active screen, so when AnimatedVisibility expands the strip from 0-height the screen above is pushed up — the Save / Cancel button row stays visible and responsive. The outer Column consumes the navigation bar inset once via navigationBarsPadding(), so neither the screen's BottomAppBar nor the strip applies its own gesture-pill padding (which previously caused the BottomAppBar's own box to grow when the strip slid in). DEBUG-only, matching the gating of the previous Toast.

gemini-code-assist

Code Review

This pull request replaces debug Toast notifications with a custom, auto-hiding AiActivityStatusLine component to provide non-obstructive feedback on AI operations. The LocalModelRouter was refactored to track and emit execution durations via a new Completed event, and the MainActivity layout was updated to accommodate the new status strip. Feedback suggests switching from System.currentTimeMillis() to System.nanoTime() in the timing logic to ensure more accurate and monotonic duration measurements.

gemini-code-assist · 2026-04-29T10:10:20Z

+        pick: PickResult,
+        block: () -> Result<T>,
+    ): Result<T> {
+        val t0 = System.currentTimeMillis()


Using System.currentTimeMillis() for measuring elapsed time is susceptible to system clock changes. Consider using System.nanoTime() for more accurate and monotonic duration measurements.

Suggested change

val t0 = System.currentTimeMillis()

val t0 = System.nanoTime()

Addressed in 48a8e57 — switched to System.nanoTime() (and divide by 1_000_000 for ms).

gemini-code-assist · 2026-04-29T10:10:20Z

+                readiness = pick.readiness,
+                providerId = provider.id,
+                providerName = provider.displayName,
+                durationMs = System.currentTimeMillis() - t0,


Convert the nanosecond difference to milliseconds correctly after using System.nanoTime().

Suggested change

durationMs = System.currentTimeMillis() - t0,

durationMs = (System.nanoTime() - t0) / 1_000_000,

Done in 48a8e57.

Copilot

Pull request overview

Replaces the DEBUG-only AI-router Toast with a slim bottom “status strip” UI, and extends LocalModelRouter event reporting to include per-call completion timing and success/failure so the UI can show which provider served a request and how long it took.

Changes:

Added AiActivityStatusLine + AiActivityState in shared UI to render an animated bottom status strip.
Extended LocalModelRouter with a new RouteEvent.Completed and refactored emission logic around Picked / None / Completed.
Updated MainActivity to consume router events, manage auto-hide timing, and adjust layout/insets so the strip pushes content up instead of obscuring bottom actions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
`shared/src/commonMain/kotlin/com/jaeckel/urlvault/ui/AiActivityStatusLine.kt`	Introduces the status strip composable and its UI state model.
`androidApp/src/main/kotlin/com/jaeckel/urlvault/android/ai/LocalModelRouter.kt`	Adds `Completed` route events and timing around `generateTags/Description/Title`.
`androidApp/src/main/kotlin/com/jaeckel/urlvault/android/MainActivity.kt`	Replaces Toast with the new strip, adds auto-hide behavior, and adjusts layout/insets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T10:12:13Z

+        action: String,
+        provider: LocalModelProvider,
+        pick: PickResult,
+        block: () -> Result<T>,


runTimed takes a non-suspending block: () -> Result<T>, but all LocalModelProvider.generateXxx() methods are suspend. As written, the calls like { provider.generateTags(...) } won't type-check (a suspending lambda can't be passed to a non-suspending parameter). Make block a suspend () -> Result<T> (and likely crossinline) so runTimed can wrap suspending provider calls.

Suggested change

block: () -> Result<T>,

crossinline block: suspend () -> Result<T>,

The current form is intentional: runTimed is suspend inline, which is exactly the construct that lets a non-suspending lambda parameter call a suspending function — the lambda body is inlined into this suspend function's body and runs in a suspending context. The existing code compiles and runs (the PR has been exercised end-to-end). Adding crossinline suspend would also be valid but would force a different call shape; since neither correctness nor non-local returns motivate the change, leaving as-is.

Copilot · 2026-04-29T10:12:14Z

+        val t0 = System.currentTimeMillis()
+        val result = block()
+        _events.tryEmit(
+            RouteEvent.Completed(
+                action = action,
+                activeIds = pick.activeIds,
+                readiness = pick.readiness,
+                providerId = provider.id,
+                providerName = provider.displayName,
+                durationMs = System.currentTimeMillis() - t0,
+                success = result.isSuccess,
+            ),
+        )
+        return result


RouteEvent.Completed is documented as firing after the provider call "returns or throws", but runTimed only emits after block() returns normally. If the provider implementation throws before returning a Result, no Completed event will be emitted (leaving the UI in Running). Either update the KDoc to match current behavior or wrap the call in try/catch/finally so Completed is always emitted (setting success=false and converting exceptions into Result.failure if needed).

Good catch. Fixed in 48a8e57 — wrapped in try/finally so Completed is always emitted, including on CancellationException (which runCatching re-raises). Status strip no longer gets stuck in Running on cancel.

Copilot · 2026-04-29T10:12:14Z

+        val t0 = System.currentTimeMillis()
+        val result = block()
+        _events.tryEmit(
+            RouteEvent.Completed(
+                action = action,
+                activeIds = pick.activeIds,
+                readiness = pick.readiness,
+                providerId = provider.id,
+                providerName = provider.displayName,
+                durationMs = System.currentTimeMillis() - t0,
+                success = result.isSuccess,


runTimed uses System.currentTimeMillis() to compute durations; this is not monotonic and can jump if the system clock changes (NTP/manual), producing negative or incorrect durations. Since this is Android-only code, prefer a monotonic source like SystemClock.elapsedRealtime() (or Kotlin TimeSource.Monotonic) for durationMs.

Addressed in 48a8e57 with System.nanoTime(). SystemClock.elapsedRealtime() and TimeSource.Monotonic would also work; nanoTime keeps the dependency surface smallest.

Copilot · 2026-04-29T10:12:14Z

+    private fun emitPicked(action: String, provider: LocalModelProvider, pick: PickResult) {
+        _events.tryEmit(
+            RouteEvent.Picked(
+                action = action,
+                activeIds = pick.activeIds,
+                readiness = pick.readiness,
+                providerId = provider.id,
+                providerName = provider.displayName,
+                reason = pick.reason,
+            ),
+        )
+    }


_events.tryEmit(...) return values are ignored. With two events per call now (Picked + Completed), the SharedFlow's buffer can overflow more easily; if Completed is dropped, the debug status line may remain stuck in Running. Consider using the suspending emit(...), increasing buffer, or configuring onBufferOverflow = DROP_OLDEST so UI state reliably progresses to Completed/None.

Fixed in 48a8e57 — _events now uses BufferOverflow.DROP_OLDEST. tryEmit always succeeds, the consumer eventually catches up to current state, and a slow collector can no longer leave the strip stuck mid-call.

Copilot · 2026-04-29T10:12:15Z

+ * slides away. Place it at the bottom of a Box that wraps your screen
+ * content; it will draw above everything else when [state] is not Hidden.


The KDoc says to place this strip "at the bottom of a Box" and that it "will draw above everything else", but in this PR it's intentionally used as the last child in a Column (so it pushes content up instead of overlaying it). Update the documentation to reflect the intended/actual usage so callers don't place it in an overlaying Box and reintroduce the obscuring behavior.

Suggested change

* slides away. Place it at the bottom of a Box that wraps your screen

* content; it will draw above everything else when [state] is not Hidden.

* slides away. Add it as the last child in your screen layout (for example,

* at the bottom of a `Column`) so it takes up layout space and pushes content

* up rather than overlaying it. Do not place it in an overlaying `Box` if

* you want to avoid obscuring content when [state] is not Hidden.

Right, the KDoc was stale relative to the layout that actually shipped. Rewritten in 48a8e57 to describe the in-flow Column placement and explicitly warn against an overlaying Box (which would reintroduce the original Toast-obscures-buttons problem).

Three connected changes for downloading gated LiteRT-LM / Gemma bundles: 1. Stop sending Authorization on cross-origin redirects in ModelDownloadManager. HF 302s gated downloads to a pre-signed cas-bridge.xethub.hf.co URL that already authenticates via query-string token; an extra Bearer header makes the CDN return 401. Track the original host and only attach the header while the request is on it. 2. Add a Settings UI for the HF token (HuggingFaceTokenRow in SettingsScreen). The infrastructure (LocalModelPreferences.saveHfToken / loadHfToken, EncryptedSharedPreferences) already existed but no screen ever wrote the value. The row also surfaces a build-time fallback so users of CI builds don't need to paste anything. 3. Bake an optional default into the APK from `HF_TOKEN`: - androidApp/build.gradle.kts reads HF_TOKEN env var or the `hfToken` property in <repo>/local.properties (gitignored) and emits BuildConfig.HF_TOKEN_DEFAULT, with a non-token-character filter so the generated string literal is always safe. - LocalModelPreferences.loadHfToken() now falls back to the BuildConfig field when the user hasn't saved one. - build.yml and release.yml workflows pass HF_TOKEN through from a GitHub repo secret. Both still build with the secret absent. Plus two unrelated polish items folded in: - BookmarkListScreen: replace the 🔄 / ⚙️ emoji in the top app bar with Icons.Default.Sync / Icons.Default.Settings (added compose.materialIconsExtended to shared/commonMain). - README: new "Hugging Face Token (gated models)" section explaining the user-entered, env-var, and local.properties paths plus the one-time per-repo licence acknowledgement step.

…OLDEST - runTimed now uses System.nanoTime() instead of System.currentTimeMillis() so wall-clock jumps (NTP / manual changes) can't produce negative or inflated durationMs. Pointed out by Gemini and Copilot independently. - runTimed wraps the call in try/finally so RouteEvent.Completed is always emitted, including when block() throws (notably CancellationException, which runCatching re-raises). Without this, a cancelled call left the status strip stuck in "Running…" until the next event fired. Matches the KDoc claim that Completed fires after the call "returns or throws". - _events now uses BufferOverflow.DROP_OLDEST. With Picked + Completed pairs per call, a slow / backgrounded collector could overflow the 16-slot buffer; tryEmit dropping the latest event would leave the UI stuck. With DROP_OLDEST, tryEmit always succeeds and the consumer eventually catches up to current state. - AiActivityStatusLine KDoc rewritten to match the actual usage in MainActivity (last child of a Column, content above given weight(1f)) rather than the original "in an overlaying Box" advice that would reintroduce the obscuring behaviour the strip was meant to fix. The Copilot suggestion to make `block: suspend () -> Result<T>` (and crossinline) was checked but not applied — `runTimed` is `suspend inline` already, which is exactly the construct that lets a non-suspending lambda parameter call suspending provider methods (the body is inlined into the suspend caller). The existing form compiles and runs correctly.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T10:56:35Z

      - name: Build Android debug APK
+        env:
+          # Optional Hugging Face read-token baked into the APK at build
+          # time; needed to download gated LiteRT-LM / Gemma model bundles
+          # without the user pasting one into Settings. Repository secret;
+          # build still succeeds if absent (token defaults to empty).
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: ./gradlew :androidApp:assembleDebug --no-daemon



build.yml injects HF_TOKEN into the debug build and then uploads the resulting APK as a GitHub Actions artifact. If the repo secret is configured, that artifact will contain the token (and artifacts are typically downloadable by anyone with access to the workflow run), which is an easy unintended disclosure path. Consider only setting HF_TOKEN for trusted/internal builds (e.g., tag/release workflow) or gating this env var + artifact upload behind github.ref == 'refs/heads/main' / a manual workflow dispatch.

Copilot · 2026-04-29T10:56:36Z

+                action = action,
+                activeIds = pick.activeIds,
+                readiness = pick.readiness,
+                reason = pick.reason,
+            ),
+        )
+    }
+
+    /**
+     * Times [block] and emits a [RouteEvent.Completed] regardless of how it
+     * exits — normal `Result` (success or failure), or thrown exception
+     * (notably coroutine cancellation, which `runCatching` re-raises). Without
+     * the try/finally, a cancellation would leave the UI strip stuck in
+     * "Running…" forever.
+     *
+     * `inline` is what lets the non-suspending `block` parameter actually call
+     * suspending provider methods — the lambda body is inlined into this


runTimed takes a non-suspending block: () -> Result<T>, but it's invoked with provider.generateXxx(...) which are suspend functions on LocalModelProvider. This won't compile; make the lambda suspend () -> Result<T> (and call it accordingly). Also consider emitting RouteEvent.Completed in a try/finally so exceptions don’t skip the completion event, and use a monotonic clock (e.g., SystemClock.elapsedRealtime()) for duration measurement instead of currentTimeMillis().

Copilot · 2026-04-29T10:56:36Z

+            var draft by remember(token) { mutableStateOf(token) }
+            OutlinedTextField(
+                value = draft,
+                onValueChange = { draft = it },
+                label = { Text("hf_… (read access)") },
+                singleLine = true,
+                modifier = Modifier.fillMaxWidth(),


The Hugging Face token is currently shown in clear text when the row is expanded. Since it’s a credential, it should be masked by default (e.g., PasswordVisualTransformation) with an explicit show/hide toggle, similar to the Bitwarden master password field above, to reduce accidental disclosure (screenshots, shoulder surfing, screen recording).

Suggested change

var draft by remember(token) { mutableStateOf(token) }

OutlinedTextField(

value = draft,

onValueChange = { draft = it },

label = { Text("hf_… (read access)") },

singleLine = true,

modifier = Modifier.fillMaxWidth(),

var draft by remember(token) { mutableStateOf(token) }

var showToken by remember { mutableStateOf(false) }

OutlinedTextField(

value = draft,

onValueChange = { draft = it },

label = { Text("hf_… (read access)") },

singleLine = true,

modifier = Modifier.fillMaxWidth(),

visualTransformation = if (showToken) {

VisualTransformation.None

} else {

PasswordVisualTransformation()

},

keyboardOptions = KeyboardOptions(keyboardType = KeyboardType.Password),

trailingIcon = {

IconButton(onClick = { showToken = !showToken }) {

Text(if (showToken) "Hide" else "Show")

}

},

Copilot · 2026-04-29T10:56:36Z

        }.apply()
    }

+    /** True iff the APK was built with a non-empty `HF_TOKEN` env var. */


KDoc for hasBuildTimeHfToken() says it’s true only when the APK was built with a non-empty HF_TOKEN env var, but the build logic also sources hfToken from local.properties. Update the comment to reflect that this checks whether BuildConfig.HF_TOKEN_DEFAULT is non-empty regardless of source.

Suggested change

/** True iff the APK was built with a non-empty `HF_TOKEN` env var. */

/** True iff [BuildConfig.HF_TOKEN_DEFAULT] is non-empty, regardless of source. */

Copilot · 2026-04-29T10:56:37Z

+ * Slim auto-hiding strip rendered at the bottom of the app. Designed as the
+ * non-obstructive replacement for the debug Toast spam: a single line that
+ * slides up while AI work is in flight, then briefly shows the timing, then
+ * slides away.
+ *
+ * Add it as the **last child of your screen's Column** (with the screen


The KDoc says to place this status line at the bottom of a Box and that it “will draw above everything else”, but the PR description/MainActivity placement uses it as the last child of a Column so it pushes content upward instead of overlaying it. Update the documentation to match the intended/actual layout behavior so future callers don’t accidentally reintroduce overlap.

Every entry in ModelCatalog is text-only, but the bridge was passing `visionBackend = backend` and `audioBackend = backend` to EngineConfig. The SDK then probes for vision/audio sections in the .litertlm bundle during initialize() and aborts text-only models with: Failed to create engine: NOT_FOUND: TF_LITE_VISION_ENCODER not found in the model. Affected every text-only LiteRT-LM model — FunctionGemma 270M, Gemma 3 270M, Qwen3 0.6B, etc. — on every backend (NPU → GPU → CPU all fail the same way), so load() always threw. Set visionBackend / audioBackend to null. When a true multi-modal Gemma 4 E2B bundle lands later, set them per-entry instead of bridge- wide.

Observed in the wild: for a page where LFM2-Extract had little to extract from, the bridge returned {"description":":\",\",\",\",\",\",\",\",\",\",\",\",..."} JSON shape is valid (the grammar constraint did its job) but the value is degenerate punctuation repeated until the maxLength budget runs out — the sampler picks whatever satisfies the grammar when the model has nothing meaningful to say. Two defences: 1. `looksDegenerate` heuristic — checks letter ratio, distinct-char count, and minimum length. Rejected output throws and the UI surfaces "AI description generation failed" instead of persisting `:","",...` as the bookmark's description. Applied to both description and title (tags pass through length / charset filters that would already catch this). 2. Strengthened the description prompt to: - explicitly require natural-language sentences and forbid punctuation-only output; - give an exact canonical fallback ("No summary available.") for when the supplied text doesn't have enough to extract, so the model has a way out instead of being cornered by the grammar. Note that detection is the safety net; the prompt change is what we hope reduces the failure rate. The pre-existing comment already warned about this exact mode — it had regressed for thin pages.

The description-success LaunchedEffect previously chained tag generation only on success. With degenerate-output rejection now landing on the failure branch (and observed: same URL where LEAP's description sampler went sideways produced perfectly clean tags), falling back to "no tags either" leaves the user with nothing. Tags are an independent extraction and don't need the description as context. On description error, kick them off from URL + title alone.

When a URL is shared, MainActivity composes with aiCoreEnabled briefly false because `anyProviderReady` is computed via produceState on Dispatchers.Default and hasn't finished by the time LaunchedEffect(prefilledUrl) fires in AddEditBookmarkScreen. That sent the first trigger down the legacy metadata-extraction branch, and the single-key LaunchedEffect never re-ran when aiCoreEnabled later flipped to true — leaving the user with regex-extracted tags and no AI output. Two prior fixes contributed to widening the race window: - tightening produceState keys (removed downloadStates) so the read is deferred to the first true state change rather than every tick, - moving the readiness probe off the main thread. Both correct individually; together they make the startup gap visible. Add aiCoreEnabled to the LaunchedEffect keys and pass force=true so the second invocation bypasses the aiTriggeredForUrl dedup. The legacy result-handling LaunchedEffects already gate their applies on `if (!aiCoreEnabled)`, so once AI takes over the racing legacy results are discarded harmlessly.

Adds a synthetic last tag of the form `dbg:<provider-id>@<HH:mm:ss>` to every successful generateTags call in DEBUG builds, so a glance at the saved bookmark tells you which SDK / model variant produced those tags and when — useful when comparing LEAP vs llama.cpp vs LiteRT vs AICore live during the talk demo. Examples in saved bookmarks: dbg:leap:lfm2-1.2b-extract@14:13:14 dbg:litertlm:qwen3-0.6b@14:14:02 dbg:mlkit:gemini-nano-active@14:14:48 Stripped at the router level in release builds so synced Bitwarden entries don't carry `dbg:…` tags into production. Only affects the bookmark form path (BookmarkViewModel → router → provider). The ModelComparisonRunner calls providers directly and is unaffected — its UI already labels each row by provider.

Was: dbg:leap:lfm2-1.2b-extract@14:13:14 (wall-clock time of generation) Now: dbg:leap@2.34s (how long generation took) Short SDK names map ModelRuntime to: ML_KIT -> aicore LLAMA_CPP -> llama LEAP -> leap MEDIAPIPE -> liteRt Duration formatting: ms under 1s, two-decimal seconds above. Hand- rolled rather than String.format to avoid host-locale dependence.

The ProviderResultCard rendered `result.runtime.name`, leaking the enum constant. `ModelRuntime.MEDIAPIPE` is a historical leftover from when the LiteRT-LM bundle was loaded via MediaPipe-LLM; the runtime today is LiteRT-LM, so display it as such. Added a `runtimeLabel()` helper that mirrors the same mapping already in SettingsScreen.

Without a BackHandler, the Activity finishes on system back and the user is dropped out to the home screen — not what tapping the in-screen back arrow does. Mirror the same per-screen routing as the arrow buttons: Comparison -> Settings, Settings/AddEdit -> List. Disabled on List so the OS default (close the app) still applies at the root.

Symptom on Pixel 7a (Tensor G2): the GPU backend's `Engine.initialize()` succeeds fine, then the first `Session.generateContent` call throws com.google.ai.edge.litertlm.LiteRtLmJniException: Failed to generate content: UNKNOWN: Can not find OpenCL library on this device Cause: LiteRT-LM's Top-K sampler dlopens OpenCL even when the engine runs through WebGPU, and Tensor doesn't ship OpenCL drivers. The sampler-factory log shows it falling back to a statically-linked OpenCL sampler, which then fails the same way. Generation is fatal; load was happy. Fix: catch the failure in runCollect, blocklist the broken backend for the lifetime of the process, reload the same model on the next candidate (NPU → GPU → CPU strategy means we land on CPU here), and retry generation once. We hold the mutex throughout so no other caller can race in mid-recovery. Refactored: - `load()` body extracted into `loadInternalLocked()` so recovery can re-enter it without dropping the mutex, - `runCollect()` now wraps `runCollectOnce()` with the catch+retry, - `runtimeBlockedBackends` set is filtered out at the start of `loadInternalLocked()` and surfaces a clear "all backends blocked" error if we exhaust them. Detection is narrow today (`isRecoverableRuntimeError` only matches 'opencl'); widen as new device-specific failures show up.

Previous OpenCL self-heal (b8cc107) closed the broken GPU engine and reloaded on the next backend in the same mutex hold, then retried the failed generate call. Engine.close() doesn't release the GPU pipeline's native memory synchronously, so the in-flight reload of the CPU engine held both pipelines in RAM briefly. Pixel 7a / Tensor G2 peaked at 5957197824 B (~5.96 GB), the LMK reaped the app: am_pss : [12767, ..., 5957197824, ...] killinfo: [12767, ...] Process com.jaeckel.urlvault (pid 12767) has died: fg TOP User saw it as "LiteRt now crashes the app" with no FATAL exception in logcat — classic LMK signature. New behaviour: catch the OpenCL-style runtime failure, blocklist the broken backend for the session, release the engine, **fail this call**. The very next entry-point call (provider.generateXxx → bridge.load) sees engine == null and runs a fresh load through the strategy with the blocklist applied — single-pipeline peak, no concurrent two-engine window. UX is one failed generation followed by a working one.

The SDK auto-selects the Top-K sampler based on engine backend; there's no public API knob to override it. With Backend.GPU, the SDK pulls the WebGPU/OpenCL sampler chain, which on Pixel Tensor (7a / 8 / 9) all ends at OpenCL — including the static fallback baked into liblitertlm_jni.so. Generation throws `Can not find OpenCL library on this device` even when the engine itself is happily running through WebGPU. Deprioritising GPU below CPU in the strategy makes the *first* call on Tensor land directly on `Backend.CPU`, which uses the CPU sampler and works. The previous order (NPU → GPU → CPU) made the first call always fail before recovery kicked in (and earlier in this branch, recovery itself OOMed the process at peak memory). Cost: on devices where GPU sampling works (older Snapdragon w/ proper OpenCL drivers), we miss the GPU speedup. Acceptable — GPU is a hypothetical performance win for the model graph; correctness on widely- deployed Pixel hardware is more important. The runtime self-heal in runCollect still catches the OpenCL error if a user forces GPU via a custom strategy.

Two changes after running on a Pixel 10 Pro Fold and seeing 70 s tag generation, with no easy way to tell whether GPU acceleration was even in play: 1. Restore the original NPU → GPU → CPU strategy. The earlier reorder (3300669) was an over-correction for the Pixel 7a OpenCL bug. With the OOM-free recovery now in place (3f75958, blocklist + throw, no inline retry) the trade-off goes the other way: G2 takes one failed first call before settling on CPU; G5 / 10 Pro Fold lands on GPU directly and gets the speedup. Always-CPU was strictly worse for the device class with working OpenCL/WebGPU. 2. Surface the actual loaded backend in the debug provenance tag for LiteRT-LM. The saved bookmark now carries liteRt[GPU]:gemma-3-1b-it-int4:2.34s instead of liteRt:gemma-3-1b-it-int4:70.20s so "is this NPU/GPU/CPU?" is answerable at a glance, no logcat needed. AICore / llama.cpp / Leap don't expose a comparable backend distinction in this app, so the suffix only fires for LiteRT. Plumbing: LiteRtLmNativeBridge gains `currentBackendLabel()` (default null), LiteRtLmSdkBridge returns its `currentBackend` field, LiteRtLmModelProvider exposes a public method that calls into the bridge, and the router casts to LiteRtLmModelProvider to read it when building the tag.

Subsequent shares of the same URL produced two debug provenance tags on the saved bookmark, with different durations — proof of two real generation runs, not a duplicate state callback. Cause: the share-intent LaunchedEffect was keyed on `(prefilledUrl, aiCoreEnabled)` and called `triggerAiForUrl(force=true)`. The `force=true` was needed for the startup race (legacy first, AI second when the readiness probe settled) but it bypassed the URL dedup *unconditionally* — any unrelated recomposition that flipped `aiCoreEnabled` (or any other state Compose decided to invalidate the effect on) re-fired the whole AI flow. Description ran twice → each description-success chained tag generation → two `router.generateTags` calls → two debug tags. Fix: track the *mode* the URL was last triggered in (`"ai"` / `"legacy"`) alongside the URL itself, and dedup on the pair. Same URL + same mode = no-op. Legacy → AI is a real mode change and falls through (preserves the startup-race fix). The share-intent effect drops `force = true`; the dedup now does the right thing on its own. Other call sites that explicitly clear `aiTriggeredForUrl = null` to allow a re-trigger continue to work — null != targetUrl short-circuits the first AND clause, so the mode comparison never gates them.

Earlier (3300669 → 49a004e) I bounced this back and forth: deprioritised GPU after the Pixel 7a / Tensor G2 OpenCL failure, then re-prioritised it on the assumption that newer Tensor (G5 / Pixel 10 Pro Fold) might have proper OpenCL drivers. Turns out it doesn't — the user reports the same `Can not find OpenCL library on this device` on the Fold, so the LiteRT-LM 0.10.x GPU sampler is broken on every Pixel Tensor we've tested. Final strategy (and the right one until/unless we get a working-OpenCL device to test on): NPU → CPU → GPU. NPU rarely works (vendor libs aren't packaged in the LiteRT-LM AAR for any Pixel we've seen) and falls through to CPU, which has a CPU sampler and works correctly. GPU stays in the list as a last resort for hypothetical future devices, gated behind the runtime self-heal that blocklists it on the OpenCL error if anyone forces GPU via a custom strategy. Saves ~5–10 s of cold-start time on every first generation on the user's two Pixel devices.

biafra23 requested a review from Copilot April 29, 2026 10:08

Copilot started reviewing on behalf of biafra23 April 29, 2026 10:08 View session

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

Copilot AI reviewed Apr 29, 2026

View reviewed changes

biafra23 changed the title ~~AI router status strip replaces debug Toast~~ AI router status strip + HF token wiring + gated-model 401 fix Apr 29, 2026

biafra23 requested a review from Copilot April 29, 2026 10:50

Copilot started reviewing on behalf of biafra23 April 29, 2026 10:51 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

biafra23 added 11 commits April 29, 2026 13:01

Drop dbg: prefix from debug provenance tag

527f0fa

Include model name in debug provenance tag

7e91f44

biafra23 changed the title ~~AI router status strip + HF token wiring + gated-model 401 fix~~ AI router status strip + HF token wiring + LiteRT/LEAP/share-intent fixes Apr 29, 2026

biafra23 added 5 commits April 29, 2026 15:13

biafra23 mentioned this pull request Apr 29, 2026

Skip LLM for description when the page already provides one #13

Open

5 tasks

	val t0 = System.currentTimeMillis()
	val t0 = System.nanoTime()

	durationMs = System.currentTimeMillis() - t0,
	durationMs = (System.nanoTime() - t0) / 1_000_000,

	block: () -> Result<T>,
	crossinline block: suspend () -> Result<T>,

		* slides away. Place it at the bottom of a Box that wraps your screen
		* content; it will draw above everything else when [state] is not Hidden.

- * slides away. Place it at the bottom of a Box that wraps your screen
- * content; it will draw above everything else when [state] is not Hidden.
+ * slides away. Add it as the last child in your screen layout (for example,
+ * at the bottom of a `Column`) so it takes up layout space and pushes content
+ * up rather than overlaying it. Do not place it in an overlaying `Box` if
+ * you want to avoid obscuring content when [state] is not Hidden.

	/** True iff the APK was built with a non-empty `HF_TOKEN` env var. */
	/** True iff [BuildConfig.HF_TOKEN_DEFAULT] is non-empty, regardless of source. */

Conversation

biafra23 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. AI router status strip (replaces debug Toast)

2. Gated-model downloads now work — 401 fix + token UI + CI/local plumbing

3. LiteRT-LM runtime fixes

4. LEAP runtime fixes

5. Bookmark-flow behaviour fixes

6. Debug provenance tag

7. UI polish

Test plan

Status strip

HF token / gated downloads

LiteRT-LM

LEAP

Provenance tag

UI polish

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

biafra23 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

biafra23 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

biafra23 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

biafra23 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

biafra23 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

biafra23 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

biafra23 Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

biafra23 commented Apr 29, 2026 •

edited

Loading