Skip to content

Latest commit

 

History

History
152 lines (106 loc) · 7.19 KB

File metadata and controls

152 lines (106 loc) · 7.19 KB

Complex Algorithm Documentation

This document explains the current and intended behavior for complex algorithmic areas in TeachLink Backend. It focuses on search, experimentation decision-making, and gamification scoring.

Search Algorithm

Current Status

  • src/search/search.service.ts currently contains a placeholder implementation.
  • src/search/elasticsearch/elasticsearch.service.ts sets up Elasticsearch index mappings for course data and analytics tracking.

Future Design Intent

  • User queries should be translated into an Elasticsearch query that combines:
    • match / multi_match text search on title, description, content, and instructorName
    • search_as_you_type autocomplete support for partial title matching
    • category, level, language, instructor, price, and status filters
    • custom sort options such as relevance, rating, views, and createdAt
  • Search results should be scored by relevance and optionally boosted by popularity signals such as views, enrollments, and rating.

Decision Logic

  • The primary decision in search is whether a result matches the query and filters.
  • A stable search cache key is generated by hashing the serialized query, filters, sort, page, and limit.
  • Search state should preserve paging and filter selections across repeated requests.

Edge Cases

  • Empty or whitespace-only queries should return a safe default or empty result set.
  • Invalid filter values should be ignored or normalized rather than failing the request.
  • Pagination values below 1 or excessively large limits should be clamped to protected defaults.
  • Autocomplete should return partial matches from the search_as_you_type field without exposing unpublished or archived content.

Expected Test Cases

  • Query with matching text returns results including course title and description.
  • Filtered query excludes non-matching categories and price ranges.
  • Autocomplete returns suggestions for partial titles.
  • Cache key generation remains consistent for logically equivalent filter sets.

Performance Characteristics

  • Elasticsearch index mappings use keyword fields for exact matching and text fields for full-text search.
  • search_as_you_type is optimized for prefix suggestions.
  • Query execution should avoid loading large result sets into application memory; use pagination and from/size or search_after for deep paging.
  • Analytics indexing should be asynchronous to avoid adding latency to search requests.

Experimentation / Recommendation Decision Logic

Core Behavior

The automated decision pipeline in src/ab-testing/automation/automated-decision.service.ts follows these steps:

  1. Validate the experiment is running.
  2. Evaluate whether the experiment has met the duration threshold.
  3. Calculate statistical significance for variants via StatisticalAnalysisService.
  4. Determine a winner by:
    • comparing each non-control variant against the control;
    • ensuring variant metrics meet a minimum sample size;
    • requiring at least one statistically significant metric;
    • applying an effect size threshold for business impact;
    • using the highest overall performance score as the selection tie-breaker.

Key Criteria

  • confidenceLevel: Confidence required for statistical tests.
  • minimumSampleSize: Minimum number of observations before considering a winner.
  • effectSizeThreshold: Minimum Cohen's d value for a meaningful difference.
  • durationThreshold: Minimum days an experiment must run before selecting a winner.

Edge Cases

  • No control variant: the service returns null and avoids selecting a winner.
  • Variants with insufficient sample size are skipped.
  • If none of the variants meet significance and effect size thresholds, no winner is selected.
  • Experiments that are not running are treated as ineligible for decision-making.

Expected Test Cases

  • autoSelectWinner returns no_winner when duration is below threshold.
  • autoSelectWinner returns no_winner when statistical significance is not met.
  • autoSelectWinner selects the correct variant when one has sufficient sample size, significant metrics, and the largest performance score.
  • getDecisionRecommendations returns accurate readiness guidance and winner candidate hints.
  • autoAllocateTraffic divides traffic proportionally across variants when scores are available.

Statistical Calculations

In src/ab-testing/analysis/statistical-analysis.service.ts:

  • Metric statistics compute standard error, confidence intervals, and p-values.
  • compareMetrics uses pooled standard error and a z-test with a critical value based on the requested confidence level.
  • calculateCohensD estimates effect size using pooled standard deviation.
  • interpretEffectSize maps Cohen's d to negligible/small/medium/large categories.

Performance Characteristics

  • Statistical analysis is data-driven and may require multiple repository queries.
  • Avoid repeated metric loads in hot paths by caching variant metrics per experiment when possible.
  • The current implementation performs a linear scan of variants and metrics, which is acceptable for small experiment sizes but should be optimized for larger experiments.

Gamification Algorithms

Points and Progression

src/gamification/points/points.service.ts currently implements:

  • point transaction creation for every user activity
  • progress updates for totalPoints and xp
  • level progression at every 1000 XP

Decision Logic

  • Each point addition increments both totalPoints and xp.
  • The current level is recalculated as Math.floor(xp / 1000) + 1.
  • A level-up event is planned but currently marked as TODO.

Edge Cases

  • Negative or zero point values should be validated before update.
  • New users without existing UserProgress are initialized with a default progression state.
  • Large point jumps may cross multiple levels in one update; the current logic supports this by recomputing level from XP.

Leaderboard Logic

src/gamification/leaderboards/leaderboards.service.ts currently:

  • orders users by totalPoints descending
  • retrieves the top limit players
  • computes a user rank by scanning the ordered progress list

Expected Test Cases

  • Adding points correctly updates user xp, totalPoints, and level.
  • New users receive an initialized progress record.
  • Leaderboard ranking orders users by descending points.
  • User rank is accurate and returns null for missing users.

Performance Characteristics

  • Leaderboard ranking uses an in-memory rank calculation (findIndex), which is O(n).
  • To scale, replace getUserRank with a database query that counts users with higher scores or uses a materialized rank field.
  • getTopPlayers should always use database ordering and take to limit results for large user sets.

Documentation Links

  • Search architecture: src/search/search.service.ts
  • Elasticsearch mapping: src/search/elasticsearch/elasticsearch.service.ts
  • Experiment decision logic: src/ab-testing/automation/automated-decision.service.ts
  • Statistical analysis: src/ab-testing/analysis/statistical-analysis.service.ts
  • Gamification scoring: src/gamification/points/points.service.ts
  • Leaderboard ranking: src/gamification/leaderboards/leaderboards.service.ts