Skip to content

Latest commit

 

History

History
471 lines (356 loc) · 14.9 KB

File metadata and controls

471 lines (356 loc) · 14.9 KB

Data Security & User Isolation

Date: 2025-10-15 Status: Implemented Priority: CRITICAL


Overview

This document outlines the data security architecture for Pulsefeed, a multi-user system where user data isolation is paramount. One missed user_id filter = data leak where User A can see/edit User B's private data.


Critical Security Principles

1. Default Deny - User Data Isolation 🔴

Rule: ALL user-scoped data MUST filter by user_id by default.

Implementation:

  • UserScopedMixin added to all user-owned models (Task, Checklist, RecurringSeries, CalendarSource)
  • Provides .get_for_user(id) and .all_for_user(**filters) methods
  • Automatic user_id filtering without manual checks
# ✅ CORRECT - Uses mixin for automatic filtering
task = Task.get_for_user(task_id)  # Auto-filters by current_user.id
tasks = Task.all_for_user(checklist_id=5)  # Auto-injects user_id

# ❌ WRONG - Manual filtering (error-prone, deprecated)
task = Task.query.filter_by(id=task_id, user_id=current_user.id).first()

Why this matters:

  • 92+ manual filter_by(user_id=...) checks = 92 opportunities to forget one
  • One missed check = security vulnerability
  • Centralized approach makes auditing easier

2. Auto-Injection on Create

Rule: All new objects with user_id should auto-set to current_user.id.

Implementation:

  • SQLAlchemy before_flush event listener in utils/query_scoping.py
  • Automatically injects user_id when creating new objects
  • Only affects objects where user_id is currently None
# ✅ CORRECT - user_id auto-injected
task = Task(title="Test task", checklist_id=5)
db.session.add(task)
# task.user_id automatically set to current_user.id before flush

# ❌ OLD WAY - Manual assignment (still works but unnecessary)
task = Task(title="Test task", user_id=current_user.id, checklist_id=5)

Why this matters:

  • Prevents forgetting to set user_id on create
  • Eliminates bugs where objects have user_id=NULL
  • Makes secure behavior the default

3. Shared Calendar Exception Handling ⚠️

Rule: Shared calendars intentionally break user isolation (by design).

Implementation:

  • get_accessible_shared_calendar_ids() returns list of calendar IDs user can access
  • Includes calendars user owns OR is a member of
  • Request-scoped caching prevents repeated queries
# ✅ CORRECT - Include shared calendar content
accessible_ids = get_accessible_shared_calendar_ids()
tasks = Task.query.filter(
    db.or_(
        Task.user_id == current_user.id,
        Task.shared_calendar_id.in_(accessible_ids)
    )
).all()

# ❌ WRONG - Only shows user's private data, missing shared content
tasks = Task.all_for_user()  # Misses shared calendar tasks!

Why this matters:

  • Shared calendars are collaborative workspaces
  • User must see content from calendars they're a member of
  • Permissions still enforced (owner/editor/viewer roles)

4. Public vs Private Content 📢

Rule: Some content is intentionally public (is_public=True or is_system=True).

Models with public content:

  • SeriesTemplate - System templates (holidays) and user-shared templates
  • Category - System categories (user_id=NULL) visible to all

Implementation:

  • Explicit OR queries to include public content
  • NOT handled by UserScopedMixin (public by design)
# ✅ CORRECT - Include both user's and public templates
templates = SeriesTemplate.query.filter(
    db.or_(
        SeriesTemplate.user_id == current_user.id,
        SeriesTemplate.is_public == True,
        SeriesTemplate.is_system == True
    )
).all()

# ❌ WRONG - Only shows user's private templates
templates = SeriesTemplate.all_for_user()  # Misses public templates!

5. API Endpoint Security 🎯

Rule: AJAX/API endpoints are high-risk - easy to manipulate requests.

Critical API endpoints:

  • /api/events - Returns JSON events
  • /api/checklists - Returns HTML fragment with checklist data
  • /toggle_task - Modifies task completion status
  • /delete_entry/<id> - Deletes entries

Security requirements:

  1. ALWAYS validate user_id on ALL API endpoints
  2. Check ownership via .get_for_user() or check_user_can_access()
  3. For shared content, verify calendar permissions
  4. Return 403 Forbidden if access denied (NOT 404 - prevents info disclosure)
# ✅ CORRECT - API endpoint with proper validation
@app.route("/api/delete_task/<int:task_id>", methods=["POST"])
@login_required
def delete_task_api(task_id):
    task = Task.get_for_user(task_id)
    if not task:
        return jsonify({"error": "Not found"}), 404

    # Check if task is in shared calendar
    if not check_user_can_access(task):
        return jsonify({"error": "Forbidden"}), 403

    db.session.delete(task)
    db.session.commit()
    return jsonify({"success": True})

# ❌ WRONG - No user validation (data leak!)
@app.route("/api/delete_task/<int:task_id>", methods=["POST"])
@login_required
def delete_task_api(task_id):
    task = Task.query.get_or_404(task_id)  # ❌ No user_id filter!
    db.session.delete(task)  # Can delete other users' tasks!
    db.session.commit()
    return jsonify({"success": True})

6. Information Disclosure Prevention 🔒

Rule: Don't reveal existence of IDs user doesn't own.

Good practices:

  • Return same 404 for "doesn't exist" and "exists but not yours"
  • Don't leak info in error messages ("This task belongs to user@example.com")
  • Use .first() instead of .first_or_404() when checking ownership
# ✅ CORRECT - Same error for both cases
task = Task.get_for_user(task_id)
if not task:
    flash("Task not found", "error")  # Could be: doesn't exist OR not yours
    return redirect(url_for("index"))

# ❌ WRONG - Reveals existence to unauthorized users
task = Task.query.get_or_404(task_id)  # Returns 404 if doesn't exist
if task.user_id != current_user.id:
    flash("You don't own this task", "error")  # ❌ Leaks existence!
    return redirect(url_for("index"))

7. SQL Injection Prevention 💉

Rule: NEVER use string formatting or concatenation for SQL queries.

Implementation:

  • Use SQLAlchemy ORM (parameterized queries by default)
  • For raw SQL, use text() with bound parameters
# ✅ CORRECT - Parameterized query
from sqlalchemy import text
result = db.session.execute(
    text("SELECT * FROM tasks WHERE user_id = :user_id AND title LIKE :search"),
    {"user_id": current_user.id, "search": f"%{search_term}%"}
)

# ❌ WRONG - SQL injection vulnerable
query = f"SELECT * FROM tasks WHERE title LIKE '%{search_term}%'"
result = db.session.execute(text(query))  # ❌ Can inject malicious SQL!

8. CSRF Protection 🛡️

Rule: All state-changing operations require CSRF tokens.

Implementation:

  • Flask-WTF provides automatic CSRF protection
  • All forms MUST include {{ form.hidden_tag() }} for WTForms validation
  • Manual forms must include CSRF token via csrf_token()
<!-- ✅ CORRECT - WTForm with hidden_tag (includes CSRF + hidden fields) -->
<form method="POST" action="{{ url_for('auth.login') }}">
  {{ form.hidden_tag() }}
  {{ form.username.label }} {{ form.username() }}
  {{ form.password.label }} {{ form.password() }}
  <button type="submit">Login</button>
</form>

<!-- ✅ CORRECT - Manual form with explicit CSRF token -->
<form method="POST" action="/delete_task/{{ task.id }}">
  <input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
  <button type="submit">Delete</button>
</form>

<!-- ❌ WRONG - No CSRF protection (vulnerable) -->
<form method="POST" action="/delete_task/{{ task.id }}">
  <button type="submit">Delete</button>
</form>

Common CSRF Issues:

  1. Missing form.hidden_tag() in WTForms templates

    • Symptom: "The CSRF token is missing" error on submit
    • Fix: Add {{ form.hidden_tag() }} immediately after <form> tag
    • Affected: login.html, register forms, settings forms
  2. Cache headers preventing CSRF token generation

    • Symptom: CSRF error after logout → login (back button)
    • Fix: Exclude auth pages from aggressive cache prevention
    • See: app.py add_security_headers() function

9. Browser Cache Prevention 🚫

Rule: Authenticated pages must not be cached to prevent data exposure after logout.

Problem: When users log out and press the browser's back button, cached pages can show sensitive user data even though the session is terminated. This is a privacy/security issue.

Implementation (app.py):

  • @app.after_request handler adds no-cache headers for authenticated users
  • Headers: Cache-Control: no-store, no-cache, must-revalidate, Pragma: no-cache, Expires: 0
  • Static files excluded (CSS/JS should be cached for performance)
  • Auth pages excluded (login/register need normal caching for CSRF tokens)
# app.py - after_request handler
@app.after_request
def add_security_headers(response):
    """Add security headers to prevent caching of authenticated pages."""
    # Skip static files - they should be cached
    if request.path.startswith('/static/'):
        return response

    # Skip auth pages (login, register) - CSRF tokens need to work
    auth_pages = ['/login', '/register', '/logout']
    if any(request.path.startswith(page) for page in auth_pages):
        return response

    # For authenticated users, prevent browser caching
    if current_user.is_authenticated:
        response.headers['Cache-Control'] = 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0, max-age=0'
        response.headers['Pragma'] = 'no-cache'
        response.headers['Expires'] = '0'

    return response

Why this matters:

  • Security: Prevents unauthorized viewing of sensitive data via browser history
  • Privacy: Other users on shared computers can't see previous user's data
  • UX: Users expect that after logout, their data is not accessible

Testing:

  1. Log in to the app
  2. Navigate to homepage, calendar, settings (authenticated pages)
  3. Log out
  4. Press browser's back button
  5. Expected: Browser re-requests page from server, redirects to login
  6. Without fix: Browser shows cached homepage with user's data

Exclusions:

  • Static files (/static/*): CSS, JS, images should be cached for performance
  • Auth pages (/login, /register, /logout): Must allow normal caching for CSRF token functionality
  • Guest mode: No caching restrictions (not authenticated, no sensitive data)

Security Audit Checklist

Before Deploying New Features:

  • User Isolation: All user-scoped queries use .get_for_user() or .all_for_user()
  • Auto-Injection: New objects don't manually set user_id (let event listener handle it)
  • Shared Content: Queries include get_accessible_shared_calendar_ids() where appropriate
  • Public Content: Explicit OR clauses for is_public=True templates/categories
  • API Security: All API endpoints validate ownership before modifying/returning data
  • 403 vs 404: Use 403 for "exists but not yours", 404 only for "doesn't exist at all"
  • No SQL Injection: All queries use ORM or text() with bound parameters
  • CSRF Tokens: All POST/PUT/DELETE requests include CSRF protection
  • Browser Cache: Authenticated pages set no-cache headers (verified in app.py)

Regular Security Reviews:

  1. Search for dangerous patterns:

    # Find queries that might be missing user_id filters
    grep -r "\.query\.get\(" app.py
    grep -r "\.query\.all\(\)" app.py
    grep -r "\.query\.filter\(" app.py | grep -v "user_id"
  2. Test data isolation:

    • Create two user accounts
    • Try to access other user's task IDs via URL manipulation
    • Verify 404/403 returned appropriately
  3. Check API endpoints:

    • Test all /api/* endpoints with invalid/other user IDs
    • Verify proper error responses

Common Pitfalls & How to Avoid Them

❌ Pitfall #1: Using .get() or .get_or_404() directly

Problem: No automatic user_id filtering

# ❌ WRONG
task = Task.query.get_or_404(task_id)

# ✅ CORRECT
task = Task.get_for_user(task_id)
if not task:
    abort(404)

❌ Pitfall #2: Forgetting shared calendar access

Problem: User can't see tasks in calendars they're a member of

# ❌ WRONG - Only private tasks
tasks = Task.all_for_user()

# ✅ CORRECT - Include shared calendar tasks
accessible_ids = get_accessible_shared_calendar_ids()
tasks = Task.query.filter(
    db.or_(
        Task.user_id == current_user.id,
        Task.shared_calendar_id.in_(accessible_ids)
    )
).all()

❌ Pitfall #3: Revealing existence via error messages

Problem: Different errors leak information

# ❌ WRONG - Leaks existence
task = Task.query.get(task_id)
if not task:
    flash("Task not found", "error")
elif task.user_id != current_user.id:
    flash("You don't have permission", "error")  # ❌ Leaks existence!

# ✅ CORRECT - Same error for both
task = Task.get_for_user(task_id)
if not task:
    flash("Task not found", "error")  # Could be either!

❌ Pitfall #4: Not validating API requests

Problem: AJAX endpoints easier to exploit than form submissions

# ❌ WRONG - No validation
@app.route("/api/toggle_task/<int:task_id>", methods=["POST"])
@login_required
def toggle_task_api(task_id):
    task = Task.query.get_or_404(task_id)  # ❌ No user check!
    task.completed = not task.completed
    db.session.commit()
    return jsonify({"success": True})

# ✅ CORRECT - Proper validation
@app.route("/api/toggle_task/<int:task_id>", methods=["POST"])
@login_required
def toggle_task_api(task_id):
    task = Task.get_for_user(task_id)
    if not task:
        return jsonify({"error": "Not found"}), 404

    if not check_user_can_access(task):
        return jsonify({"error": "Forbidden"}), 403

    task.completed = not task.completed
    db.session.commit()
    return jsonify({"success": True})

Future Security Enhancements

Planned Improvements:

  1. Audit Logging:

    • Track all data access (who viewed what, when)
    • Log failed authorization attempts
    • Alert on suspicious patterns
  2. Rate Limiting:

    • Prevent brute-force ID enumeration
    • Limit API request frequency per user
  3. Encrypted Fields:

    • Encrypt OAuth tokens at rest
    • Consider encryption for sensitive user data
  4. Two-Factor Authentication:

    • Add 2FA option for user accounts
    • Require for admin accounts
  5. Security Headers:

    • Add Content-Security-Policy
    • Implement Strict-Transport-Security
    • Add X-Frame-Options

References