Skip to content

fdelbrayelle/ai-watchtower

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

133 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”­ AI Watchtower

AI Watchtower β€” interactive honeycomb tech radar

Interactive tech radar for AI-augmented software engineering. Spot what matters, track what you've read, build your own template for a career path, project, or mission.

πŸš€ Run locally

Requires Node.js β‰₯ 22.12.0

git clone https://github.com/fdelbrayelle/ai-watchtower.git
cd ai-watchtower/web
npm install
npm run dev        # β†’ http://localhost:4321

npm run dev and npm run build automatically re-extract all resources from this README β€” add a link here and it appears in the app on next run.

☁️ Deploy to Vercel

Import the repo in Vercel, set Root Directory to web, and deploy. Every push to main rebuilds and redeploys automatically.


Software engineering was never just about writing code β€” and the agentic era makes that clearer than ever. Architecture, product thinking, code review, testing strategy, technical writing: these skills now define the engineer's value more than keystrokes ever did.

The Software Engineer is becoming a Product Engineer. When agents handle execution, the engineer's critical value shifts to the decisions surrounding the code: upstream (what to build, why, for whom, with what constraints) and downstream (is it correct, secure, maintainable, observable?). This is governance and judgment β€” scoping requirements, choosing trade-offs, validating outputs, and owning outcomes end to end. The title may stay the same, but the job description is now that of a product engineer in the broadest sense.

Vibe Coding vs. AI-Augmented Software Engineering β€” Vibe coding means describing what you want in natural language and letting the AI generate the result with minimal oversight β€” fast, creative, great for prototypes and throwaway scripts. AI-augmented software engineering is the opposite mindset: the engineer stays in the driver's seat, using AI to accelerate exploration, drafting, and iteration while retaining full responsibility for architecture, correctness, and maintainability. This radar focuses on the latter. The goal is not to remove the engineer from the loop, but to make the loop faster and the engineer more effective.

AI will transform jobs β€” and create new ones. Yes, AI will destroy certain jobs. But more precisely, it will transform them β€” and create entirely new roles that don't exist yet, just as the smartphone revolution created "mobile developer," "growth hacker," and "UX researcher" β€” jobs no one imagined in 2001. This is Schumpeterian growth in action: innovation destroys the old to make room for the new. Joseph Schumpeter called it creative destruction β€” the engine of capitalism where obsolete industries, skills, and roles are continuously replaced by more productive ones. AI prompt engineers, agent orchestrators, AI auditors, and synthetic data curators are early examples. The net effect on employment depends on how fast we adapt, retrain, and build the new ecosystem.

But we're not there yet β€” real-world constraints slow the revolution:

  • Energy: Training and running frontier models demands staggering compute power. Each major AI datacenter requires near-dedicated nuclear plant capacity β€” a direct collision course with climate and energy crises.
  • Adoption is still niche: As of 2025, only ~23% of U.S. adults have used ChatGPT, and 65% of organizations report using generative AI regularly β€” but intensive, agentic usage remains a tiny fraction of 8+ billion humans. Early adopters are not the norm. 84% of humanity has never used AI. This chart (February 2026 data) shows 8.1 billion people as dots β€” each dot represents 3.2 million humans. The grey fills almost the entire frame. If you've ever used ChatGPT, even once, you're among the 16% who've tried AI at all. If you pay $20/month for it, you're in the top 0.3%. If you use AI for coding, you're in the top 0.04%.
  • High-potential sectors lag behind: The legal sector, despite being one of the most automatable knowledge domains, reports only ~35% of lawyers using AI in practice (ABA 2024 TechReport). Medicine, education, and government show similar gaps.
  • Regulation divergence: Europe is regulating aggressively with the EU AI Act, which risks constraining innovation. Meanwhile, the US and China are racing toward AGI with lighter guardrails β€” creating a global asymmetry in AI capability and deployment.

This is a curated tech radar for AI-augmented software engineering. Tools, frameworks, protocols, methodologies, and best practices β€” one place to track what matters when AI writes the code and you own everything around it.

πŸ“Œ = Unread


🎯 What to Focus On Now

With 80%+ of code now AI-generated, the engineer's value shifts from writing code to shaping what gets built, how it holds together, and whether it works.

Inputs β€” What you shape before the agent writes code:

  • Product Thinking β€” Own the "what" and "why" before the agent writes the "how"
  • Software Architecture β€” The "how": system design, boundaries, and trade-offs that agents can't decide alone

Outputs β€” What you verify after the agent writes code:

Transverse β€” Skills that apply across the entire lifecycle:

⚠️ Bottlenecks β€” Where the pipeline stalls:

  • Upstream: Product must feed the backlog with clear business needs and prioritized requests β€” without this, agents spin on low-value work. FOMO-driven adoption ("competitors are shipping faster") compounds the problem by flooding the pipeline with half-baked specs.
  • Downstream: The human review layer can't scale at the same pace as AI output. Code review and QA fatigue set in fast. It's hard to say "stop" to agentic work at end of day. Constant context switching erodes focus, developers lose meaning in the work, and the risk of burnout becomes real. Mario Zechner makes the case for slowing the fuck down β€” autonomous agents create brittle systems with compounding errors; keep humans in control of architecture, use agents only for scoped evaluable tasks.

The radar below tracks the tools and practices for each of these areas.


πŸ’‘ Product Thinking

Own the "what" and "why" before the agent writes the "how".

The Product Manager Role

The PM is the bridge between Business (company objectives), UX/Design (user needs), and Technology (feasibility). Not a decision dictator β€” an alignment enabler who ensures the team builds the right thing, for the right user, at the right time.

Core missions:

  • Discovery β€” Understand user problems via interviews, data analysis, and competitive research
  • Strategy β€” Define the product vision and prioritize for maximum impact
  • Delivery β€” Partner with devs and designers to ship concrete features
  • Analysis β€” Track KPIs post-launch and adjust course

Key deliverables by phase:

Strategy & Vision

DeliverablePurpose
Product Vision BoardProduct intent, target audience, and value proposition
Product RoadmapMacro view (often quarterly) of upcoming features and themes
KPI DashboardTrack performance (retention, conversion, etc.)

Discovery & Design

DeliverablePurpose
PersonasProfiles of target users and their pain points
PRD (Product Requirements Document)The "Why" and "What" of a feature before development starts
User Journey / Story MapMap of the user's path through the product

Delivery

DeliverablePurpose
BacklogOrdered list of all remaining tasks and features
User Stories"As a [user], I want [action] so that [benefit]"
Release NotesInternal/external communication on what shipped

The PM never works alone β€” wireframes involve the Product Designer, feasibility involves the Lead Tech. The PM's job is to keep the whole coherent.


πŸ—οΈ Software Architecture

The "how" that shapes what the agent builds β€” system design, boundaries, and trade-offs that can't be delegated to a prompt.

Data Engineering & Science

Roadmaps, machine learning, and data career paths.

AI is the umbrella β€” not the model. Artificial Intelligence encompasses Machine Learning (ML), which encompasses Deep Learning (DL), which encompasses the specific model architectures we use today: SLMs (Small Language Models), LLMs (Large Language Models), vision models, etc. LLMs are built on the attention mechanism introduced in Attention Is All You Need (Vaswani et al., 2017), which uses learned weights to let the model focus on relevant parts of the input β€” the foundation of the Transformer architecture. Agents don't replace any of these layers β€” they orchestrate them, chaining models, tools, and memory into goal-driven workflows. Understanding this hierarchy matters: not every problem needs a frontier LLM, and not every AI system is an agent.

  • πŸ“Œ πŸ“š AI Engineering (book) β€” Chip Huyen β€” Building production AI-powered applications with foundation models: evaluation, RAG, fine-tuning, and deployment
  • πŸ“š Fundamentals of Data Engineering (book) β€” Joe Reis, Matt Housley β€” Data pipelines, storage, ingestion, orchestration, and the data engineering lifecycle
  • πŸ“š Machine Learning avec Scikit-Learn (book) β€” AurΓ©lien GΓ©ron β€” Hands-on ML with Scikit-Learn
  • πŸ“š Deep Learning avec Keras et TensorFlow (book) β€” AurΓ©lien GΓ©ron β€” Deep learning with Keras and TensorFlow

Roadmaps

Basic Maths for AI

Understanding AI under the hood requires two pillars: linear algebra and probability/statistics.

Linear algebra is the language of data. Every dataset is a matrix, every feature is a vector, and every model transformation (rotation, scaling, projection) is a matrix operation.

A vector is a list of numbers representing a point or direction in space. In AI, vectors are everywhere: a word embedding like [0.2, -0.5, 0.8] places a word in a 3D semantic space. Similar words end up as nearby vectors β€” "king" and "queen" are close, "king" and "banana" are far. This is how models understand meaning: not through definitions, but through geometric proximity. Real embeddings use hundreds of dimensions (e.g., OpenAI's text-embedding-3-small produces 1536-dimensional vectors), but the principle is the same. The dot product of two vectors measures their alignment: high dot product = similar direction = similar meaning. This is the core operation behind cosine similarity in vector search (RAG, recommendation systems) and attention scores in transformers. Vector addition enables analogies: the classic king - man + woman β‰ˆ queen works because semantic relationships are encoded as directional offsets in vector space.

Key concepts beyond vectors: matrix multiplication (the core of neural network forward passes β€” each layer is a matrix multiply + activation), eigenvalues/eigenvectors (behind PCA dimensionality reduction), and tensor operations (multi-dimensional arrays powering deep learning frameworks like PyTorch and TensorFlow). Example: when a transformer model computes attention scores, it's performing softmax(QK^T / √d) Γ— V β€” pure matrix math.

Probability & statistics drive how models learn and predict. Key concepts: Bayes' theorem (the foundation of updating beliefs with evidence β€” spam filters, medical diagnosis), probability distributions (normal, Bernoulli, softmax outputs), conditional probability (P(A|B) β€” "given this input, what's the likely output?"), maximum likelihood estimation (how models fit parameters to data), loss functions and gradient descent (cross-entropy, MSE β€” measuring and minimizing prediction error). Example: a language model predicting the next token is outputting a probability distribution over the entire vocabulary, trained by minimizing cross-entropy loss.

Learning

  • Clean & Analyze Your Dataset β€” OpenClassrooms data cleaning course
  • Tools: Jupyter Notebook, Kaggle, Hugging Face, Matplotlib, NumPy, Pandas

✍️ Code Generation / Writing

AI writes 80%+ of the code, but the software engineer can still have added value on up to 20% of code writing on their own β€” judgment calls, edge cases, glue code, and craft that agents miss.

Language Ecosystems

AI-era tooling and best practices for Java and Python.

AI for Java

Spring AI, LangChain4J, and the Java AI ecosystem.

Python Ecosystem

Python fundamentals, frameworks, and best practices for the AI-era developer.

Core Python
Web Frameworks

Software Craftsmanship

AI accelerates output, but craft still matters. Build agents or skills specialized in proven engineering disciplines to keep quality high at scale.

  • TDD (Test-Driven Development) β€” Create agents that write failing tests first, then generate the minimal code to pass. The red-green-refactor loop works even better when the agent handles the boilerplate and you review the design.
  • BDD (Behavior-Driven Development) β€” Use skills that generate Gherkin scenarios from user stories, then wire them to step definitions. Keeps acceptance criteria executable and traceable.
  • DDD (Domain-Driven Design) β€” Encode bounded contexts, aggregates, and ubiquitous language in project instructions so agents produce code that respects domain boundaries instead of creating a big ball of mud.
  • Clean Architecture β€” Enforce hexagonal / ports-and-adapters patterns through CLAUDE.md rules or custom agents that validate dependency direction (domain β†’ application β†’ infrastructure, never the reverse).
  • Other patterns β€” Onion Architecture, CQRS, Event Sourcing β€” codify these as agent constraints or review skills so generated code stays structurally sound.

πŸ€– Agentic Orchestration

Designing, chaining, and supervising AI agents β€” platforms, protocols, and tools. Apply the KISS principle relentlessly: don't scatter across dozens of tools, frameworks, and methodologies. Pick a minimal, proven stack and think simple. The best agent architecture is the one you can reason about, debug, and explain β€” not the one with the most moving parts.

Key Concepts

Agent: The full system that receives a goal, reasons about it, uses tools, checks results, and loops until done. It combines an LLM with tool access, memory, and control flow.

LLM / Model: The reasoning engine inside the agent. It decides what to do next, but by itself it only generates text. Examples: Claude Opus 4.6, GPT 5.4, Gemini 2.5 Pro.

Tools: The actions available to the agent β€” read files, edit code, run commands, search the web, call APIs, etc. Tools are what let an agent act on the world instead of just talking about it.

Skills: Reusable playbooks that tell the agent how to handle a class of tasks well, often by combining tools in a structured way (e.g., a "commit" skill that stages, commits, and pushes).

Subagents: Specialized helper agents called by the main agent for focused tasks. They work in isolated contexts, then return results. Useful for parallelizing work or keeping the main context window clean.

Memory: Persistent context that guides future sessions:

  • CLAUDE.md / project instructions: human-written rules, conventions, architecture decisions
  • Project/local memory: repo-specific context (what's in progress, what was decided)
  • User/global memory (e.g., ~/.claude/): personal defaults across all projects

These usually encode: What (facts, rules, conventions), Why (rationale, constraints), and How (architecture, workflows, patterns).

Hooks: Shell commands that fire automatically in response to agent events (before/after tool calls, on notifications, etc.). They let you enforce rules, run linters, trigger builds, or inject context β€” without the agent needing to know about them.

Human in the loop: The human gives goals, answers questions, approves risky actions, reviews outputs, and redirects the agent when needed. The agent proposes; the human disposes.

Plan mode: A read-only phase where the agent explores the codebase, understands the problem, and proposes a plan before making changes. Reduces wasted work and misaligned edits.

Typical agentic flow:

  1. Explore β€” read code, search, understand context
  2. Plan β€” propose an approach
  3. Execute β€” make changes, run commands
  4. Verify β€” run tests, check results
  5. Get human feedback β€” review, approve, or redirect
  6. Iterate if needed

Maturity Levels

AI adoption maturity model for development teams β€” adapted from Dan Shapiro's framework. Useful for locating where a team stands, anticipating its trajectory, and making deliberate choices rather than reacting to hype or management pressure.

  • Level 1 β€” Autocomplete (~2023): AI suggests completions in the developer's immediate context. The developer stays in control. Where most organizations started, back in the early GitHub Copilot days.
  • Level 2 β€” Coding assistants (~2024): AI executes multi-step tasks across files and tools β€” Claude Code, Cursor, Windsurf.
  • Level 3 β€” Autonomous dev agents (~2025): AI handles the full cycle, from backlog ticket to deployment. The human defines requirements and validates outputs β€” supervised engineering. Most organizations are crossing this threshold now.
  • Level 4 β€” Collaborative agent networks (~2026): Multiple specialized agents work together on design, code, tests, and deployment. Humans orchestrate. Typical usage with BMAD, BEADS, LIZA. Very few organizations have genuinely reached this level.
  • Level 5 β€” Software factory (~2028?): Organizations describe desired business outcomes, and entire systems emerge from agent collaboration. Humans focus on strategy and product vision. Still largely theoretical, but perhaps a closer horizon than we think.

Between level 2 and level 3, something fundamental shifts: the developer stops being the one who builds and becomes the one who verifies. This changes the nature of the craft β€” which skills matter, where responsibility moves, and what new risks emerge.

Where is your organization today β€” and can it move to the next level?

  • AI Codebase Maturity Model β€” Framework for assessing how ready a codebase is for AI-augmented development: structure, testability, documentation, and automation readiness πŸ“Œ Unread

Agents & Frameworks

Protocols

MCP (Model Context Protocol)

The open standard for connecting AI models to external tools and data sources.

RAG

  • RAG is Dead, Long Live RAG β€” Rather than being killed by larger context windows, RAG has evolved into a sophisticated system that makes intelligent, conditional decisions about whether and how to retrieve information
  • 🎬 Is RAG Still Needed?

Vector Databases

Methodologies

  • Agentic SDLC Handbook β€” Practical handbook for applying AI agents across the full software development lifecycle πŸ“Œ Unread
  • BMAD Method β€” Breakthrough Method for Agile AI Development πŸ“Œ Unread
  • Beads β€” AI coding assistant framework by Steve Yegge πŸ“Œ Unread
  • VibeKanban β€” AI-native project management πŸ“Œ Unread
  • Get Shit Done β€” Pragmatic AI development methodology πŸ“Œ Unread

Harness Engineering

The harness is the scaffolding that wraps a model and turns it into an agent: it controls the execution loop, routes tool calls, enforces permissions, manages context windows, and handles retries and escalation. Harness Engineering is the discipline of designing, operating, and optimizing that layer β€” distinct from prompt engineering (what you say) or model selection (which model you use). As agents grow more autonomous and run at scale, the harness becomes the main lever for reliability, cost control, and safety. The concept of an AI factory extends this further: a harness-driven pipeline where agents are orchestrated like industrial processes, with defined inputs, outputs, quality gates, and throughput metrics.

Product as a Service

Managed agent offerings where the execution infrastructure, scheduling, and lifecycle management are handled by the vendor.

  • Managed Agents β€” Anthropic's approach to building and operating agents at scale πŸ“Œ Unread
  • Dispatch β€” Anthropic's multi-agent orchestration platform πŸ“Œ Unread
  • Multica β€” Managed multi-agent platform for running and orchestrating AI agents at scale πŸ“Œ Unread

Orchestration

Frameworks for composing, routing, and coordinating multiple agents or tool calls.

  • OpenClaw β€” Open-source AI agent framework
  • Agno β€” Open-source Python framework for building, deploying, and managing secure multi-agent AI systems πŸ“Œ Unread
  • NanoClaw β€” Lightweight agent runtime
  • NemoClaw β€” NVIDIA's agent framework

Harness Tools

Tools that operate at the harness layer itself: controlling the execution loop, parallelizing sessions, and managing agent lifecycles.

  • Emdash β€” Desktop app to run multiple AI coding agents in parallel, each in an isolated Git worktree, with issue tracker integration and built-in diff/commit UI πŸ“Œ Unread
  • Paperclip β€” Orchestrate multiple Claude Code sessions/agents in parallel πŸ“Œ Unread

Claude Code

Best practices, monitoring, and plugins for Claude Code.

Mastery Levels

Six levels of Claude Code usage, from basic prompting to fully autonomous systems β€” 🎬 FR video:

  • Level 1 β€” Prompt : Use Claude Code as a terminal-based ChatGPT. Ask questions, get answers. No project context.
  • Level 2 β€” Planner : Add a CLAUDE.md with project context. Claude understands the codebase and plans before acting.
  • Level 3 β€” Context : Leverage memory, conventions, and project files. Claude works with persistent, structured knowledge.
  • Level 4 β€” Tools : Connect MCP servers, bash commands, and external integrations. Claude acts on the world.
  • Level 5 β€” Multi-Agent : Orchestrate subagents for parallel, specialized work. Claude delegates and coordinates.
  • Level 6 β€” Autonomous : 24/7 systems where agents run unsupervised, triggered by events, without human in the loop.

Learn

Tools

  • πŸ“Œ claude-desktop-debian β€” Unofficial Claude Desktop app support for Debian-based Linux distributions
  • Claude Swarm Monitor β€” Monitor Claude Code swarms
  • Claude Octopus β€” Multi-agent orchestrator coordinating Claude, Codex, and Gemini CLIs πŸ“Œ Unread
  • CC Workflow Studio β€” Claude Code observability
  • Ralph Claude Code β€” Claude Code assistant πŸ“Œ Unread
  • ExitBox β€” Security sandbox for Claude Code
  • AI-RSK β€” Security gate for AI-generated code, blocks builds until vulnerabilities are fixed
  • claude-statusline β€” Configure Claude Code's status line to show usage limits, current directory, and git info πŸ“Œ Unread

Tips

  • Claude Code Tips β€” Practical tips collection πŸ“Œ Unread
  • Prefer Skills or CLI over MCP when possible β€” it is usually cheaper in tokens.
  • Run /compact around 60–70% context usage. Run /clear around 80–90%, or start a fresh session.
  • Check the current memory state with /memory (auto-memory and auto-dream can be enabled there).
  • Start a new session for a new topic. Do not keep piling unrelated work into one chat.
  • Use /loop for periodic reminders or cron-like tasks. Example: /loop 20m run "echo kindly reminder to look 20 seconds at 20 meters to save your view"
  • Resume a previous session with /resume or claude --resume.
  • Use /btw to chat with Claude Code while it is working.
  • Use Ctrl + G to edit your prompt in your default editor (EDITOR and VISUAL env vars must be set in ~/.bashrc or ~/.zshrc).
  • Switch Plan Mode to Accept Edits with Shift + Tab.
  • Check usage with /usage.
  • For parallel work, use Git worktrees: run parallel sessions with claude --worktree feature-auth.
  • Sandboxes: Claude Code can run in sandboxed environments for isolation and security. This is the safer alternative to --dangerously-skip-permissions or full auto mode β€” use sandboxes when you need unattended execution without bypassing permission checks.
  • Remote Control: Use the Remote Control API to programmatically interact with Claude Code sessions β€” send messages, monitor state, and build custom integrations on top of running instances. πŸ“Œ Unread
  • Advisor Strategy: Use /advisor to invoke a stronger reviewer model mid-session β€” it sees your full conversation history and can catch mistakes, suggest better approaches, or validate your plan before you commit to it.

Plugins

  • Code Review β€” Anthropic's official code review plugin
  • Code Simplifier β€” Anthropic's official code simplification plugin
  • Frontend Design β€” Anthropic's official frontend design plugin
  • Ralph Loop β€” Anthropic's official loop/iteration plugin
  • Context7 β€” Up-to-date docs and code examples for any library, pulled straight into your prompt
  • Superpowers β€” Agentic skills framework & software development methodology
  • Hookify β€” Official plugin to manage Claude Code hooks visually
  • MemPalace β€” Local-first AI memory system: stores conversations verbatim, organizes them spatially for high-accuracy retrieval πŸ“Œ Unread
  • Oh My Claude Code β€” Plugin to orchestrate Claude Code
  • Codex β€” OpenAI Codex CLI plugin for Claude Code πŸ“Œ Unread
  • UI/UX Pro Max Skill β€” UI/UX design skill for Claude Code πŸ“Œ Unread
  • Paperasse β€” Skills for French administrative paperwork ("paperasse") πŸ“Œ Unread

Code Assistants & AI Editors

IDEs, copilots, and AI-powered coding tools.

  • Best AI Code Editors (2025) β€” Comprehensive comparison
  • Claude AI β€” Anthropic's AI assistant
  • Cursor β€” AI-first code editor
  • Continue β€” Open-source AI code assistant
  • Continue + Ollama β€” Running Continue with local models
  • Supermaven β€” Fast AI code completion
  • DevoxxGenie β€” AI plugin for IntelliJ IDEA
  • Junie β€” JetBrains' AI coding agent
  • Lovable β€” AI-powered full-stack app builder
  • Mammouth AI β€” AI coding assistant
  • Kimi Code β€” Moonshot AI's coding assistant πŸ“Œ Unread
  • OpenCode β€” Open-source AI coding platform πŸ“Œ Unread
  • OpenCode Worktree β€” Worktree support (alternative: claude --worktree feature-auth) πŸ“Œ Unread
  • OCX β€” Extends OpenCode capabilities πŸ“Œ Unread

UX/UI Design

AI-powered design-to-code tools and collaborative design platforms.

  • Claude Design β€” Anthropic Labs' design tool πŸ“Œ Unread
  • Figma to Code β€” Convert Figma designs to code
  • Google Stitch β€” Google's AI-powered design-to-code tool πŸ“Œ Unread
  • Paper β€” Collaborative design tool for building interfaces πŸ“Œ Unread
  • getdesign.md β€” Aggregates design system docs and patterns from top brands (Stripe, Figma, Apple…) for rapid AI-assisted UI development πŸ“Œ Unread

Generative AI Patterns & Learning

Architecture patterns, training resources, and foundational learning.

JEPA & World Models

Current LLMs master syntax but lack the common sense and physical intuition a 4-year-old has from experiencing the world β€” what Moravec's paradox captures: trivial for children, algorithmically hard for machines. LLMs memorize statistical patterns; children build world models.

JEPA (Joint Embedding Predictive Architecture), proposed by Yann LeCun, is a framework for learning like biological intelligence. Instead of predicting raw pixels or tokens, JEPA predicts in representation space β€” abstract representations of how the world evolves. This sidesteps the intractability of pixel-level prediction (the world is too chaotic) and focuses on underlying structure. Learning is mostly self-supervised β€” watching hours of video and sensory data, like humans do β€” not from labeled text.

The goal is a shift from generative AI that recites to planning AI that understands and acts: world models that anticipate "if I take action A in situation B, I get result C"; System 2 reasoning that imagines and evaluates multiple futures before acting; hierarchical abstraction that combines long-horizon goals (get to the airport) with micro-decisions (take a step, raise an arm); and objective-driven control guided by cost minimization within strict safety guardrails. LeCun's bet: this will happen in open, collaborative ecosystems β€” not closed labs.

Why it matters for engineers: if world models succeed, future AI may reason about cause and effect, plan multi-step actions, and generalize from far less data β€” closing the gap between "has read everything" and "understands anything."

Energy-Based Models

Energy-Based Models (EBMs) are an alternative framework where the model learns to assign low energy to correct configurations and high energy to incorrect ones β€” instead of predicting the next token, the model scores how "right" a given state of the world is. EBMs can capture complex dependencies without requiring explicit probability normalization, making them more flexible than standard generative models.

Both AGI and energy-based models will be especially transformative for physical agentics β€” i.e., robotics. This is where Moravec's paradox becomes relevant: tasks that are trivial for humans (walking, grasping, navigating a room) are incredibly hard for machines, while tasks that are hard for humans (chess, calculus, code generation) are comparatively easy for AI. World models and EBMs aim to close this gap by giving machines an intuitive understanding of physics.

Scaling & Moore's Law for AI

Just as Moore's Law predicted exponential growth in transistor density, a similar dynamic applies to AI: models double in capability on roughly predictable timelines through scaling laws β€” more compute, more data, and better architectures yield predictably better performance. At the current trajectory, models are expected to multiply their capabilities enough to completely replace pure execution tasks by ~2030, while judgment, governance, and creative direction remain human territory for longer.

Developer Tooling & Infrastructure

Docker, terminals, browser automation, and other tools for AI-augmented workflows.

Docker & Infrastructure

  • Docker Model Runner β€” Run AI models directly in Docker
  • Portless β€” Replaces port numbers with stable, named .localhost URLs for local development β€” automatic HTTPS, no port juggling πŸ“Œ Unread

Terminal Tools

  • Warp β€” AI-powered terminal
  • Ghostty β€” Fast, feature-rich, GPU-accelerated terminal emulator with platform-native UI πŸ“Œ Unread
  • Zellij β€” Modern terminal workspace (Rust)
  • tmux β€” Classic terminal multiplexer

Browser Automation & Misc

  • Scrapling β€” AI-adapted web scraping
  • Trigger.dev β€” Background jobs and workflow automation
  • Computer Use (Anthropic) β€” Let Claude control a computer β€” click, type, navigate, and take screenshots πŸ“Œ Unread
  • Perplexity Computer β€” Perplexity's computer-using agent for browser tasks πŸ“Œ Unread
  • Operator (OpenAI) β€” OpenAI's web-browsing agent that autonomously completes multi-step tasks (shopping, form filling, booking) inside a browser πŸ“Œ Unread
  • Agent Browser β€” Browser automation CLI for AI agents

AI Native Landscape

Overview of the AI-native development ecosystem.

  • AI Native Dev Landscape β€” Interactive landscape of AI-native tools
  • AI Native Applications β‰  Chatbot Wrappers β€” Building an "AI native" application isn't about bolting a chatbot or a GPT-powered feature onto an existing product. It means rethinking the product from the ground up around AI capabilities: the UX adapts to probabilistic outputs instead of deterministic flows, the data model is designed for embeddings and retrieval, the architecture assumes agents as first-class actors, and the value proposition simply couldn't exist without AI at its core. A chatbot skin on a CRUD app is AI-adjacent, not AI-native. The same applies to the landscape itself: AI-native ecosystems replace entire categories (CI, observability, testing, IDEs) with tools that are built around AI reasoning β€” not traditional tools with an AI add-on.
  • Design for agent users, not just human users β€” Until 2022, every product was designed exclusively for human users. Today, agents are users too β€” they call your APIs, read your documentation, navigate your interfaces. If your system isn't legible to agents (structured data, clear semantics, machine-readable endpoints), you're designing for half the audience.

12-Factor AI Native

Inspired by the 12-Factor App methodology for cloud-native, imagine the equivalent principles for AI-native applications. See also 12-Factor Agents (⭐ 19k) β€” a complementary set of 12 implementation-level principles for building production-ready LLM agents (own your prompts, own your context window, stateless reducer pattern, etc.).

  1. Prompt as Code β€” Prompts are versioned, reviewed, and deployed like source code
  2. Model Portability β€” No hard coupling to a single model provider; swap models without rewriting the app
  3. Context as Config β€” Context (system prompts, RAG sources, memory) is injected, not hardcoded
  4. Stateless Inference β€” Each request is self-contained; session state lives outside the model call
  5. Explicit Token Budget β€” Token usage is a first-class resource with limits, monitoring, and optimization
  6. Observability by Default β€” Every LLM call is traced, logged, and measurable (latency, cost, quality)
  7. Graceful Degradation β€” Fallback chains across models/providers; the app survives an outage or rate limit
  8. Eval-Driven Development β€” Automated evals replace unit tests for non-deterministic AI behavior
  9. Human-in-the-Loop Boundaries β€” Clearly defined gates where human review is required vs. autonomous
  10. Guardrails as Infrastructure β€” Safety, compliance, and content filters are infra concerns, not afterthoughts
  11. Disposable Agents β€” Agents are ephemeral and reproducible; no precious long-running state
  12. Cost-Aware Routing β€” Route to the cheapest model/tool that meets the quality bar (CLI > MCP > RAG > full context)

Psychology, Culture & AI

Thought pieces on how AI is reshaping developer culture and the software industry.

Theory

  • Cognitive Surrender β€” Psychologists' term for immediately deferring to an AI without engaging System 1 or System 2 thinking β€” a "System 0". A CRT study found 50% of participants consulted AI right away, 87% adopted its answer, and those who did were more confident (77% vs 65%) despite missing the point of the question. πŸ“š Shaw et al (2026). Thinkingβ€”Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender.
  • Cognitive Biases β€” Humans have them, and so do AI agents β€” biases in training data, prompt framing, and model architecture create systematic blind spots that mirror (and amplify) human cognitive biases.
  • The Great Wounds to Human (and Developer) Ego β€” Science has systematically dismantled human centralism: Copernicus (we're not the center of the universe), Darwin (we're animals, not divine creations), Freud (we're not masters of our own minds), and now AI (intelligence and creativity can be replicated by machines). The same lesson applies to developers: you are not your code. See The 10 Commandments of Egoless Programming.
  • Brooks' Law in the AI era β€” "Adding manpower to a late software project makes it later" (Fred Brooks, 1975). The same applies to AI agents: spinning up more agents on a complex task doesn't linearly speed things up. Each new agent increases coordination overhead, context-sharing costs, and the risk of conflicting changes β€” just like adding people to a team mid-project.
  • Jevons Paradox & the AI explosion β€” When AI makes coding dramatically cheaper and faster, we don't write less code β€” we write far more. Just as cheaper coal in the 19th century led to more coal consumption, not less, cheaper software production leads to an explosion of software, features, and technical debt. Efficiency gains get reinvested into ever-expanding scope. Β· β–Ά video
  • Dunbar's number for AI agents β€” Dunbar's number (~150) describes the cognitive limit of relationships a person can maintain. In AI-augmented teams, a similar limit emerges: there's a ceiling to how many agents, tools, and AI-mediated workflows a developer can effectively orchestrate before losing situational awareness and coherent decision-making.
  • Conway's Law & AI systems β€” "Organizations design systems that mirror their own communication structure." AI systems are no exception: they often reproduce the organizational flaws, communication silos, and structural blind spots of the companies that build them.
  • Murphy's Law & black-box AI β€” "Anything that can go wrong will go wrong." Because AI operates as a black box, if there is a hidden way for a model to fail or hallucinate, it eventually will β€” and the opacity makes it harder to predict when.
  • Goodhart's Law & metric-driven AI β€” "When a measure becomes a target, it ceases to be a good measure." Give an AI a specific metric to optimize (clicks, engagement, conversion) and it may ignore human ethics or intent to make that number go up β€” gaming the metric at the expense of the goal.
  • Peter Principle & AI overpromotion β€” "People rise to their level of incompetence." We risk "promoting" AI to high-stakes roles (legal decisions, medical diagnosis, autonomous weapons) that exceed its actual understanding and competence β€” confusing fluent output with genuine expertise.
  • Dunning-Kruger Effect & AI overconfidence β€” AI models often deliver incorrect answers with extreme confidence, and users with limited domain knowledge can't tell the difference. The result: humans overestimate the machine's true intelligence, and the machine has no mechanism to signal its own uncertainty.

πŸ“ Technical Writing

Specs, prompts, and docs are the new source code β€” prompt-driven, spec-driven, and context-driven development.

From Prompt Engineering to Context Engineering

Prompt engineering β€” crafting individual instructions to steer a model β€” was the first lever developers pulled. It still matters, but it is no longer enough. Context engineering is the broader discipline: deliberately shaping everything the model sees at inference time β€” the system prompt, retrieved documents, conversation history, tool outputs, memory summaries, and structural formatting. The goal is to give the model exactly the right information, in the right form, at the right moment, so it can reason well without guessing or hallucinating.

Core techniques:

  • Retrieval-Augmented Generation (RAG) β€” pull in relevant documents or facts at query time rather than baking knowledge into the model. Evolved from one-shot fixed pipelines (RAG, 2020-2023) β†’ agent-decided multi-hop retrieval (Agentic RAG, 2023-2024) β†’ agent-built context from scattered sources across databases, filesystems, and memory (Agentic Search / Context Engineering, 2025+).
  • Memory management β€” decide what to keep, compress, or forget across turns to stay within context limits without losing continuity.
  • Structured context injection β€” use XML tags, JSON schemas, or delimiters to separate instructions, facts, and examples so the model can parse them reliably.
  • Few-shot priming β€” embed representative examples directly in the context to steer style, format, and reasoning patterns.
  • Tool-result framing β€” shape how tool outputs are presented back to the model to maximize signal and minimize noise.
  • Context compression β€” summarize long histories or large documents before inserting them, cutting token spend while preserving meaning.

Relation to the Inference Economy: context engineering is inseparable from cost. Every token in the context window is billed; bloated or poorly structured context inflates cost and degrades quality (more noise, more distraction for the model). Tight, well-engineered context reduces latency, lowers spend, and often improves output β€” making context engineering one of the highest-ROI optimizations in any production AI system. See the Inference Economy section for complementary techniques.


πŸ’° Inference Economy

Save tokens, use simple scripts or local SLMs when a frontier model isn't needed. Optimize cost, latency, and routing across models.

  • πŸŽ₯ Tokens Rationing in the Inference Economy (πŸ‡«πŸ‡· video) β€” Whether tokens will cost less or more in the future remains an open question
  • Use English prompts β€” LLMs are predominantly trained on English data, so English prompts yield better instruction-following and reasoning. Non-English languages also tokenize less efficiently (e.g. French, Hindi, Arabic often use 1.5–3Γ— more tokens for the same meaning), directly inflating cost and latency
  • CLI is cheaper than MCP β€” CLI tool calls have less token overhead than MCP protocol exchanges; prefer CLI/skills when possible for lower inference cost
  • Good RAG beats large context stuffing β€” A well-tuned RAG pipeline retrieving only what's needed can outperform naively filling a 1M-token context window, both in cost and in result quality (less noise, more relevant context)
  • Stateful agents beat stateless ones for long tasks β€” Stateless LLM calls re-send the full context every turn; stateful agents (e.g. with KV cache, persistent memory, or session continuity) pay that cost once and reuse it, yielding lower token spend and latency at scale
  • Script or batch over per-prompt repetition β€” If you find yourself asking the same thing repeatedly, or need many similar outputs (e.g. translating a list, generating N variants, processing a dataset), write a script or generate outside Claude Code entirely. Interactive prompting has per-message startup cost, no parallelism, and burns session tokens. A script runs once, is reproducible, and scales.

Token Optimization

  • Claude Mem β€” Cross-session memory plugin for Claude Code; persists context across conversations to avoid re-explaining it each time
  • RTK β€” Input token reduction tool (standalone Rust binary, zero dependencies): filters and compresses Claude Code's tool call outputs before they re-enter context. After install, to upgrade: rerun curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh + rtk init -g to activate hook-based usage, then verify with rtk gain
  • caveman β€” Output token reduction skill: cuts LLM output tokens ~65% by making Claude respond in terse caveman-style speech while maintaining technical accuracy πŸ“Œ Unread
  • code-review-graph β€” Local knowledge graph for Claude Code; persistent codebase map so Claude reads only what matters β€” 6.8Γ— fewer tokens on reviews, up to 49Γ— on daily tasks
  • Claudette β€” Token reduction via MCP
  • Serena β€” Language-server-powered code intelligence MCP, gives agents precise context to save tokens πŸ“Œ Unread
  • TOON β€” Token-Oriented Object Notation β€” compact encoding that cuts ~40% tokens vs JSON for LLM payloads
  • Context Mode β€” Sandboxes raw output into SQLite instead of context window β€” 98% context reduction on logs and GitHub data πŸ“Œ Unread
  • Claude Token Optimizer β€” Setup prompts that optimize any project's docs and context β€” 90% token savings πŸ“Œ Unread
  • Token Optimizer β€” Finds invisible ghost tokens eating context quality; diagnoses and fixes context decay πŸ“Œ Unread
  • Token Optimizer MCP β€” Adds aggressive caching and compression to MCP tool responses β€” 95%+ token reduction πŸ“Œ Unread
  • Claude Context β€” Zilliz hybrid vector search MCP; makes entire codebase the context for 40% less cost πŸ“Œ Unread
  • Claude Token Efficient β€” Drop-in CLAUDE.md file enforcing strict terseness with zero code changes πŸ“Œ Unread
  • Token Savior β€” Symbol-based code navigation MCP with persistent memory β€” 97% reduction on code navigation πŸ“Œ Unread

Usage & Cost Tracking

  • Opcode β€” Track AI spending and usage across tools
  • Oh My Hi β€” Visual dashboard that parses Claude Code harness config and usage data into an interactive HTML analytics interface πŸ“Œ Unread
  • ccusage β€” Track Claude Code token usage and costs across sessions, with per-project and per-model breakdowns πŸ“Œ Unread

Claude Code Token Hygiene

  • Usage limit β‰  length limit (source): Usage limits are your conversation budget β€” how many messages you can send before a cooldown; determined by conversation length, complexity, features used, and model choice; shared across all Claude surfaces (claude.ai, Claude Code, Claude Desktop); resets on a scheduled basis. Length limits are Claude's context window (200K tokens standard, 500K on some Enterprise plans) β€” how much information Claude can hold in one chat; resets by starting a new conversation or via automatic context summarization. Don't confuse a length limit ("conversation too long") with a usage limit ("rate limited").
  • 5-hour sessions: Claude usage/session limits reset every 5 hours (official Anthropic source: About Claude's Pro Plan Usage and About Claude's Max Plan Usage). Start your first session early (~7 am) β€” if the limit hits, you can take lunch around noon and start a fresh 5-hour session for the afternoon.
  • Startup overhead: Each claude invocation consumes tokens just to initialize/load context. You can verify this with /context. Use /insights to get a breakdown of token usage by category (tools, system prompt, conversation) β€” helps identify what's burning the most tokens in a session.
  • Repo switching cost: Working across many repositories increases token usage due to repeated context loading and memory/context switching.
  • Reasoning level: Avoid unnecessarily high thinking/reasoning levels when a simpler mode is enough. Default to high effort mode β€” it gives the best quality/cost balance across most tasks.
  • Model choice + /plan: In Claude Code, using Sonnet instead of Opus can save a lot of tokens when the task does not need the stronger model. Use /model opusplan to automatically use Opus 4.6 only during plan mode and fall back to Sonnet 4.6 for execution (docs). Always use /plan for large tasks (e.g. implementing a feature) where some research is needed β€” it focuses the session before burning execution tokens.
  • Surface separation: Avoid mixing the same work between Claude in the browser and Claude Code, since usage is shared and context has to be rebuilt.
  • Worktree overhead: Worktrees can also increase token consumption because each parallel branch/session may maintain separate context.
  • 1 subject = 1 session β€” /clear vs /compact: Switch topic β†’ /clear (wipes history entirely, best when context is irrelevant noise). Use /compact to summarize and compress mid-task when history is growing but you need continuity.
  • Pin files with @./: When you know which files Claude must touch, reference them directly (e.g. @./src/foo.ts) β€” avoids costly file-search tool calls.
  • No Shakespearean prompts: Speak to LLMs directly. Bad: "Can you please analyse why this junit test XxxTest failed, then try to fix it" β†’ Good: "scope: unit test, goal: must succeed, file: @./src/test/XxxTest.java"

Local & Offline Models

Run open-weight models on your own hardware for data privacy, lower latency, and offline work. No data leaves your machine.

Hardware requirements β€” the bottleneck is always memory (RAM or VRAM), not CPU/GPU speed. A rough rule: a quantized (Q4) model needs ~0.6 GB per billion parameters. A dedicated GPU is ideal but not required β€” modern Macs with unified memory (M-series) are excellent for this.

Model sizeMinimum RAM/VRAMRuns on
1–3B4 GBAny laptop
7–8B8 GBMost laptops (M1/M2 Mac, mid-range GPU)
14–27B16–24 GBHigh-end laptop or desktop GPU (RTX 3090/4090, M3 Max)
70B+48+ GBMulti-GPU workstation or Mac Studio/Pro
  • Gemma 4 (Google DeepMind, open weights) β€” Multimodal model family, 1B to 27B. Gemma 4 27B needs ~16 GB RAM (Q4). Setup: ollama pull gemma4 then ollama run gemma4
  • Qwen (Alibaba, Apache 2.0 open source) β€” Strong multilingual model family, 0.5B to 235B. Qwen3 8B needs ~6 GB RAM (Q4). Setup: ollama pull qwen3 then ollama run qwen3
  • Ollama β€” The standard runtime for running local models; one command to pull and serve any supported model (ollama serve starts a local OpenAI-compatible API on localhost:11434). To use a local model with Claude Code: ollama launch claude --model qwen2.5-coder:14b (needs ~10 GB RAM; swap model name for any Ollama-supported model)

Multi-LLM Access & Routing

  • LiteLLM β€” Unified API for 100+ LLMs
  • OpenRouter β€” LLM routing and access
  • 1min AI β€” Multi-model AI access platform
  • LLMFit β€” Find which models & providers run on your hardware πŸ“Œ Unread

πŸ” Black Box Debug & Observability

You can't debug what you can't see β€” instrument what agents produce.

  • AI Agent Observability β€” Weights & Biases guide to agent observability

  • Langfuse β€” Open-source LLM engineering platform for tracing, prompt management, and evaluation. Instruments your LLM calls with traces, spans, and scores so you can debug failures, measure quality, and track costs across every model call in production β€” the standard observability stack for teams building on LLMs.

  • Entire β€” Git-native AI session recorder. Operates via Git hooks (post-commit) to automatically capture the full context of every agent run (transcript, prompts, tool calls, token usage, file edits) as checkpoints stored on a dedicated entire/checkpoints/v1 branch.

    Storage layout: Project config lives in a .entire/ hidden folder at the repo root (settings.json is version-controlled and shared with the team; settings.local.json is gitignored for local overrides) β€” but session data is not stored there. It lives on a separate entire/checkpoints/v1 Git branch (both local and remote), organized as sharded JSON files (entire/checkpoints/v1/<2-char-shard>/<remaining-id>/metadata.json).

    Per-commit metadata: Each Git commit gets an Entire-Checkpoint trailer linking back to the session that produced it, and an Entire-Attribution trailer showing the agent-vs-human line split β€” including token usage metrics (input, output, cache reads/writes, API call counts) so teams can track AI cost per commit. This turns the AI "black box" into an auditable record: run entire explain on any commit to replay why code was written, not just what changed.

    ⚠️ Treat prompts as code β€” since all your prompts are stored and can be pushed to the remote alongside the checkpoints branch, never paste secrets, credentials, or sensitive data into the agent conversation.

    βš–οΈ Auto-push trade-off:

    • push_sessions: true (default) β€” full team audit trail, PR context on entire.io, cross-team observability; but the entire/checkpoints/v1 branch grows fast (~2–4 GB/year for a 10-dev team on a large repo), bloating every git clone and CI fetch
    • push_sessions: false β€” loses most team value (no shared history, no web dashboard, no PR context); degrades to a personal local journal
    • Middle ground: use checkpoint_remote to redirect checkpoints to a separate private repo, keeping the main repo clean while preserving the full audit trail

    Local-first (data stays in your repo), open-source under MIT.


🧹 Technical Debt Management

AI writes fast, but someone has to maintain it.


πŸ‘οΈ Code Review

The last line of defense is now the main job.


πŸ§ͺ QA & Testing Strategy

If you didn't write it, you'd better know how to break it.


πŸ“£ Self Marketing

Build visibility on LinkedIn, Twitter/X, Slack, and beyond β€” your work won't speak for itself.

  • Personal Branding for Devs β€” freeCodeCamp handbook on developer personal branding
  • Your LinkedIn CV Won't Be Enough β€” A long list of hard skills and job titles on LinkedIn is becoming table stakes. AI-powered recruiting agents and talent-sourcing bots are already crawling GitHub repositories, analyzing commit history, PR quality, and actual contributions to assess what engineers truly deliver β€” not what they claim. The same applies beyond GitHub: agents will scrape your blog posts for depth of thought, your Stack Overflow answers for expertise signals, your open-source contributions for collaboration patterns, and your conference talks for communication skills. The implication: your real, observable output across platforms becomes your resume. Polished profiles without substance will be filtered out by the same AI that generates them. Invest in visible, verifiable outcomes β€” meaningful commits, well-crafted technical writing, thoughtful code reviews, and genuine community contributions β€” because that's what the crawlers will judge you on.

🌐 GEO / LLMO

Generative Engine Optimization β€” making your content discoverable by AI models.

  • Moz β€” Reference SEO resource (guides, tools, blog) β€” strong traditional SEO foundations remain essential for GEO, since AI agents still rely on well-structured, crawlable, authoritative content to surface answers
  • What is GEO/LLMO? β€” Introduction to Generative Engine Optimization
  • 8 On-Page SEO Tips for LLM/GEO β€” Practical optimization tips

βš–οΈ Legal, Compliance & Governance

GDPR, AI Act, licensing β€” the rules AI can't learn on its own.


πŸ”’ Cybersecurity

AI-generated code is only as secure as the reviewer.

About

AI writes the code. You own everything around it.

Resources

Stars

Watchers

Forks

Contributors