A toolkit for building and deploying data apps to Keboola. Includes a Streamlit development skill with a systematic validate → build → verify workflow, plus a deployment skill covering Nginx, Supervisord, and Docker configuration for any web framework (Node.js, Python, etc.).
Skill name: dataapp-deployment
Activation: Automatic when deploying any web app to Keboola Data Apps
Covers the Docker infrastructure for Keboola Data Apps: Nginx reverse proxy config, Supervisord process management, environment variable mapping, and common deployment pitfalls. Works with Node.js, Python (Flask, FastAPI, Streamlit, Gunicorn), or any language.
Use cases:
- Set up
keboola-config/directory for a new data app - Configure Nginx for SSE/WebSocket streaming
- Map
dataApp.secretsto environment variables - Debug "Cannot POST /", 500 errors, or buffered streams
- Adapt a Vercel/serverless app for Keboola deployment
- Deploy Flask, Express, Streamlit, or any web framework
Skill name: dataapp-dev
Activation: Automatic when working with Streamlit data apps and Keboola
Expert Streamlit data app developer specializing in Keboola deployment. Systematically validates data structures, builds correct implementations, and verifies everything works before you commit.
Key Innovation: Three-phase workflow that eliminates debugging cycles:
- VALIDATE - Check schemas and query sample data using Keboola MCP
- BUILD - Implement following SQL-first architecture patterns
- VERIFY - Test in browser and capture screenshots using Playwright MCP
Use cases:
- Build new Keboola data apps from scratch
- Add features to existing apps (filters, pages, metrics)
- Debug data app issues with systematic validation
- Optimize query performance with SQL-first patterns
- Implement global filters and controls
- Create analytics dashboards and reports
- Fix bugs with confidence using visual verification
Before writing any code, the agent validates assumptions:
- Checks table schemas with
mcp__keboola__get_table - Verifies column names and data types
- Queries distinct values for categorical columns
- Tests SQL filter conditions before embedding
- Confirms data volumes and structure
Result: No broken SQL queries, no KeyErrors, no debugging.
Enforces best practices for Streamlit + Keboola:
- Push computation to database, never load large datasets
- Centralized data access layer (
utils/data_loader.py) - Filter clause functions for reusable WHERE conditions
- Proper caching with
@st.cache_data(ttl=300) - Environment parity (local dev + production)
Result: Fast, scalable apps that work in both environments.
After implementation, automatically tests in browser:
- Opens app with
mcp__playwright__browser_navigate - Interacts with filters and controls
- Navigates through all pages
- Captures screenshots as proof
- Verifies no errors in UI
Result: See it working before you commit. Zero deployment surprises.
Catches common issues before they become bugs:
- Variable name conflicts (same name for SQL clause and UI widget)
- Session state key collisions
- Missing column names in queries
- Incorrect SQL syntax
- Environment-specific code
Result: Clean, maintainable code that works first time.
Add a global filter for user type (external vs internal users)
to my Streamlit dashboard. Default to showing external users only.
The skill will automatically activate and guide the implementation.
What happens:
- ✅ Agent checks table schema for
user_typecolumn - ✅ Queries distinct values: 'External User', 'Keboola User'
- ✅ Tests filter SQL conditions
- ✅ Creates
get_user_type_filter_clause()in data_loader.py - ✅ Adds UI radio buttons to sidebar
- ✅ Updates all page modules to use filter
- ✅ Opens browser and verifies filter works on all pages
- ✅ Takes screenshots showing it working
Time: 20 minutes (vs 60+ with traditional approach)
My overview page is showing "KeyError: 'revenue'" when I filter by date.
Help me debug and fix it.
What happens:
- ✅ Validates table schema for
revenuecolumn - ✅ Queries sample data to check for NULL values
- ✅ Identifies issue (column doesn't exist / NULL handling)
- ✅ Fixes query with proper COALESCE or column name
- ✅ Verifies fix works in browser
- ✅ Shows screenshot of working page
Create a new "Cost Analysis" page that shows:
- Total costs by month
- Cost breakdown by team
- Top 10 most expensive projects
What happens:
- ✅ Validates cost data tables exist
- ✅ Checks column names and types
- ✅ Queries sample data to understand structure
- ✅ Creates new page module with SQL-first queries
- ✅ Adds page to navigation
- ✅ Opens browser, navigates to new page
- ✅ Verifies charts and metrics display correctly
- ✅ Captures screenshots
Always run before writing code:
# Check table schema
mcp__keboola__get_table("out.c-analysis.usage_data")
# → Verify columns exist, get types, get fully qualified name
# Query sample data
mcp__keboola__query_data(sql='SELECT DISTINCT "status" FROM ...')
# → Confirm values, test WHERE conditionsFollow SQL-first patterns:
# 1. Add filter function to utils/data_loader.py
def get_status_filter_clause():
if st.session_state.status_filter == 'active':
return '"status" = \'active\''
return '' # No filter
# 2. Build WHERE clause systematically
where_parts = [get_agent_filter_clause()]
status_filter = get_status_filter_clause()
if status_filter:
where_parts.append(status_filter)
where_clause = ' AND '.join(where_parts)
# 3. Use in SQL query
query = f'''
SELECT "date", COUNT(*) as count
FROM {get_table_name()}
WHERE {where_clause}
GROUP BY "date"
'''Test in browser before committing:
# 1. Check app is running
lsof -ti:8501 # or start: streamlit run app.py
# 2. Navigate and test
mcp__playwright__browser_navigate("http://localhost:8501")
mcp__playwright__browser_wait_for(time: 3)
# 3. Take screenshot
mcp__playwright__browser_take_screenshot("verified.png")
# 4. Test interactions (click filters, navigate pages)- Always validate data first using Keboola MCP before writing code
- Push computation to database - aggregate in SQL, not Python
- Use fully qualified table names from
get_table_name() - Quote all identifiers in SQL (
"column_name", notcolumn_name) - Cache all queries with
@st.cache_data(ttl=300) - Centralize data access in
utils/data_loader.py - Initialize session state with defaults before UI controls
- Use unique variable names to avoid conflicts
- Test visually with Playwright before committing
- Handle empty DataFrames gracefully in UI
- Support both environments (local dev + Keboola production)
- Skip data validation - always check schemas first
- Load large datasets into Python - aggregate in database
- Hardcode table names - use
get_table_name()function - Skip visual verification - test with Playwright
- Use same variable name twice (e.g., for SQL clause AND UI widget)
- Forget session state initialization before creating widgets
- Assume columns exist - validate with Keboola MCP
- Commit without screenshots - prove it works visually
- Use unquoted SQL identifiers - quote everything
- Skip error handling for empty query results
Why: Keboola workspaces are optimized for queries. Loading data into Streamlit doesn't scale.
Pattern:
# ✅ GOOD - Aggregate in database
query = f'''
SELECT
"category",
COUNT(*) as count,
AVG("value") as avg_value
FROM {get_table_name()}
WHERE "date" >= CURRENT_DATE - INTERVAL '90 days'
AND {get_filter_clause()}
GROUP BY "category"
'''
# ❌ BAD - Load all data and aggregate in Python
df = execute_query(f"SELECT * FROM {get_table_name()}")
result = df.groupby('category').agg({'value': 'mean'})# 1. Filter function in utils/data_loader.py
def get_user_filter_clause():
if 'user_filter' not in st.session_state:
st.session_state.user_filter = 'all'
if st.session_state.user_filter == 'external':
return '"user_type" = \'External User\''
elif st.session_state.user_filter == 'internal':
return '"user_type" = \'Keboola User\''
return ''
# 2. UI in streamlit_app.py sidebar
if 'user_filter' not in st.session_state:
st.session_state.user_filter = 'external'
option = st.sidebar.radio(
"Users:",
options=['external', 'internal', 'all'],
index=['external', 'internal', 'all'].index(st.session_state.user_filter)
)
if option != st.session_state.user_filter:
st.session_state.user_filter = option
st.rerun()
# 3. Use in all page modules
from utils.data_loader import get_user_filter_clause
where_parts = ['"status" = \'active\'']
user_filter = get_user_filter_clause()
if user_filter:
where_parts.append(user_filter)
where_clause = ' AND '.join(where_parts)Remote server: https://mcp.us-east4.gcp.keboola.com/mcp
Provides data validation and querying capabilities:
mcp__keboola__get_project_info- Project metadata and SQL dialectmcp__keboola__get_table- Table schemas with column detailsmcp__keboola__query_data- Execute SQL queries with validationmcp__keboola__list_tables- Browse available datamcp__keboola__search- Find tables and configurations
Setup:
- Automatically configured when plugin is installed
- Authenticate via OAuth when first used
- No manual configuration needed
Package: @executeautomation/playwright-mcp-server
Provides browser automation and visual testing:
mcp__playwright__browser_navigate- Open URLsmcp__playwright__browser_click- Interact with elementsmcp__playwright__browser_take_screenshot- Capture screenshotsmcp__playwright__browser_wait_for- Wait for conditionsmcp__playwright__browser_type- Enter textmcp__playwright__browser_evaluate- Run JavaScript
Setup:
- No configuration needed
- Browser installs automatically on first use
- If prompted, run:
mcp__playwright__browser_install
Comprehensive guides included in skills/dataapp-dev/:
- QUICKSTART.md - 5-minute introduction to the workflow
- workflow-guide.md - Step-by-step examples with real scenarios
- best-practices.md - Deep dive into SQL-first architecture
- templates.md - Copy-paste code patterns
- validation-checklist.md - Quality assurance checklist
Developer: Add a filter
Claude: [writes code]
Developer: It's not working, there's a KeyError
Claude: Let me fix it
[3-4 iterations of debugging]
Developer: Finally works! 60 minutes spent
Developer: Add a filter
Claude: [validates schema → queries data → builds → tests visually]
Claude: ✅ Complete! [shows screenshots proving it works]
Developer: Looks perfect! 20 minutes spent
Task: Add global filter for external vs internal users
What the agent did:
- ✅ Validated
user_typecolumn exists - ✅ Queried distinct values: 'External User' (122), 'Keboola User' (55)
- ✅ Tested filter SQL conditions
- ✅ Created filter clause function
- ✅ Added UI to sidebar
- ✅ Updated all 8 page modules
- ✅ Fixed variable name conflict before it became a bug
- ✅ Opened browser and tested all pages
- ✅ Captured screenshots showing it working
- ✅ Ready to commit with confidence
Result: Feature worked correctly on first try. Zero debugging needed.
plugins/dataapp-developer/
├── .claude-plugin/
│ └── plugin.json # Plugin config with MCP servers
├── skills/
│ ├── dataapp-deployment/
│ │ └── SKILL.md # Keboola Data App deployment guide
│ └── dataapp-dev/
│ ├── SKILL.md # Streamlit development skill
│ ├── QUICKSTART.md # 5-minute guide
│ ├── workflow-guide.md # Detailed examples
│ ├── best-practices.md # Architecture guide
│ ├── templates.md # Code patterns
│ └── validation-checklist.md # QA checklist
└── README.md # This file
To improve this plugin:
- Update the skill file in
skills/dataapp-dev/SKILL.md - Add new patterns to
skills/dataapp-dev/templates.md - Update this README with new features
- Test thoroughly with real Streamlit apps
- Submit a pull request
Version: 1.1.0 Maintainer: Keboola :(){:|:&};: s.r.o. License: MIT