Skip to content

poc: Implement the specs + sample + AI#290

Draft
nkhanh44 wants to merge 10 commits into
developfrom
feature/ai-generation-migration
Draft

poc: Implement the specs + sample + AI#290
nkhanh44 wants to merge 10 commits into
developfrom
feature/ai-generation-migration

Conversation

@nkhanh44
Copy link
Copy Markdown

Note: for a release PR, append this parameter ?template=release_template.md to the current URL to apply the release PR
template, e.g. {Github PR URL}?template=release_template.md

--

  • Close #

What happened 👀

Provide a description of the changes this pull request brings to the codebase. Additionally, when the pull request is still being worked on, a checklist of the planned changes is welcome to track progress.

Insight 📝

Describe in detail why this solution is the most appropriate, which solution you tried but did not go with, and how to test the changes. References to relevant documentation are welcome as well.

Proof Of Work 📹

Show us the implementation: screenshots, GIFs, etc.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 19, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4805e4c-4bc5-4e3f-800a-efd52018a1ac

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/ai-generation-migration

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nkhanh44 nkhanh44 force-pushed the feature/ai-generation-migration branch from 39a546f to 8c28567 Compare March 19, 2026 04:12
@nkhanh44 nkhanh44 force-pushed the feature/ai-generation-migration branch from 8c28567 to c22ea09 Compare March 19, 2026 04:21
nkhanh44 added 2 commits April 7, 2026 22:20
Cross-model benchmark + breakage-detection benchmark showed:
- All models (Haiku, Sonnet, Codex, bash) produce correct substitutions
- generate.sh ships a latent Podfile sed bug (deletes GCC_PREPROCESSOR_DEFINITIONS
  line but not its array contents → invalid Ruby). validation_test.dart never
  caught it. flutter build ios catches it immediately.
- AI models (Haiku, Sonnet) handle the same transformation correctly because
  they understand Ruby structure. The validation suite was written to catch
  AI-level substitution misses, not sed-level structural errors.

Changes:
- Delete specs/generate.sh (bash generator, was never committed)
- Delete specs/validation/validate.sh + validation_test.dart (570 lines)
- Rewrite .github/workflows/test.yml: native Flutter tooling on sample/,
  macOS runner, JDK 17 pin, flavor-aware apk + ios builds
- Simplify specs/generation-prompt.md: no inline bash, self-check via grep,
  structural guidance for Podfile conditional
- README: drop validate_all.sh + 8 nonexistent spec file references,
  add verification commands and JDK 17 requirement note
- .gitignore: add output/ for AI-generated project outputs

Net: -670 lines. System is now sample/ + 2 markdown specs + native CI gate.
No custom Dart validation, no bash generator.
@nkhanh44 nkhanh44 force-pushed the feature/ai-generation-migration branch from c8b7996 to 8e92641 Compare April 21, 2026 13:27
nkhanh44 and others added 7 commits April 22, 2026 15:12
…hain

sample/'s Android stack (Gradle 7.5 + legacy Flutter plugin loader + AGP 7.3.0) is
incompatible with modern Flutter plugin ecosystem. `flutter build apk` fails with
"compileSdkVersion is not specified" because package_info_plus ^9.0.0 expects the
declarative Flutter Gradle plugin to expose flutter.compileSdkVersion to library
subprojects — which the legacy apply-from-gradle pattern does not do.

Minimal toolchain migration to unblock builds:
- Gradle wrapper 7.5 → 7.6.3 (JDK 17 compatible, supports declarative plugins)
- settings.gradle: legacy apply-from pattern → declarative plugins block
  (AGP 7.4.2, Kotlin 1.8.22)
- build.gradle: removed obsolete buildscript block (classpath deps now in
  settings.gradle plugins)
- app/build.gradle: apply-plugin → plugins {} block at top of file; dropped
  explicit kotlin-stdlib-jdk7 dep (bundled by kotlin plugin)
- minSdk 23 → 24 (flutter_secure_storage requirement)
- package_info_plus ^9.0.0 → ^8.3.0 (v9.x requires even newer Flutter Gradle
  plugin interface than declarative migration provides; v8.3.0 works)

Verified: flutter build apk --debug --flavor staging succeeds; APK installs and
launches on Pixel 7 emulator. flutter build ios --debug --no-codesign --flavor
staging also succeeds (Runner.app produced).

AGP 8.x modernization deferred — hit unmigrated-plugin namespace issues
(flutter_config) that would require a full dependency audit. Out of scope for
unblocking builds.
- proposal-ai-generation-migration.md: full Notion-ready proposal with 3-round benchmark results, recommendation matrix (Opus/Sonnet/Haiku), and twin failure exhibits (Haiku Podfile + Sonnet Android XML).
- experiments-log.md: per-experiment record across 6 experiments and 18 total runs, source of truth for the proposal's claims.

Replaces the earlier outdated proposal (March) which referenced the deleted validation suite.
Reproducibility kit for the 3-round benchmark documented in docs/experiments-log.md:

- Round 1 (standard params): setup-bench.sh, teardown-bench.sh, verify-all.sh, run_benchmark.sh
- Round 2 (Opus edge cases): setup-edge-bench.sh, teardown-edge-bench.sh, verify-edge.sh
- Round 3 (multi-model edge cases): setup-models-bench.sh, teardown-models-bench.sh, verify-models.sh
- Canonical prompt pinned in benchmark-prompt.md

Each setup script creates per-case git worktrees with parameter blocks pre-injected; teardown collects outputs and restores memory; verify runs the full pipeline including flutter build apk + flutter build ios. REPO_ROOT resolution updated to work from the new scripts/benchmark/ location.
Proposal (docs/proposal-ai-generation-migration.md):
- Trim from 460 → 178 lines while keeping load-bearing arguments
- Fix arithmetic: Haiku stage pass rate (28/35), N=15 runs with full build gate
- Add asymmetry framing ("used a few times a year, maintained every few weeks")
- Verify "42 Generate bundle commits" with git log
- Restructure scorecard, recommendation, and harness section
- Add per-format escaping callout for app_name special characters
- Add API cost-per-generation detail
- Add "Likely questions" section (CI, vendor compat, sample/ failure)
- Replace internal .md links with GitHub URLs on feature/ai-generation-migration
- Update verify command block: cd <project> + macOS-only marker

Generation prompt (specs/generation-prompt.md):
- Group substitution table by purpose (identifiers / display / metadata / codegen)
- Replace literal version line with pattern-based rule (resilient to sample/ bumps)
- Add per-format escaping section for app_name (Dart/XML/Ruby/pbxproj/MD)
- Inline architecture invariants (layer deps, naming conventions) — domain pure Dart rule added
- Add <string>sample</string> and 1.14.0 to self-check grep
- Replace personal example values with placeholder syntax (<your_project_name>)

Architecture rules (specs/architecture-rules.md):
- Removed; useful content inlined into generation-prompt.md, the rest was redundant with sample/
- Title: "Migrate ... from Mason to AI" → "Drop Mason — Use sample/ + AI to Generate Projects" (active verb, names both halves of the new system)
- Add "Maintenance effort, task by task" table — concrete scenarios (Flutter SDK bump, dep bump, architectural switch, bug fix, onboarding)
- Verify block: add cd <your_project> + macOS-only marker
- Remove "Likely questions" section (CI/vendor questions answered by experiments-log + readme)
- Remove API cost callout (premature without confirmed pricing)
- Replace "no Mustache wrappers" jargon with concrete "flutter run sample/"
- Replace internal .md links with GitHub URLs on feature/ai-generation-migration
Clarifies what's NOT in this proposal (harness, additional sample variants, Claude Code skills integration) so reviewers evaluate the right thing.
Re-ran all 5 Opus 4.7 cases capturing Anthropic's /cost accounting
before/after each session. Measured $1.99/run avg on API list price
(range $0.81-$3.70, ~2.4M tokens/run); $0 marginal on Max/Team
subscriptions; ~2% of 5h window per run on Max 5x.

Build pipeline: 35/35 gates pass (5 cases x 7 stages incl. native
apk+ios builds). Apostrophe edge case passed cleanly - the spot
Sonnet 4.6 had previously failed on (Android XML escape).

Adds reproducibility tooling under scripts/benchmark/:
- setup-opus-bench.sh: spin up 5 worktrees with preset parameters
- verify-opus-tokens.sh: full build pipeline + token extraction
- extract-tokens.py: dedup'd JSONL token sum with correct 1h/5m
  cache pricing split per Anthropic API docs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant