Complex Algorithm Documentation

This document explains the current and intended behavior for complex algorithmic areas in TeachLink Backend. It focuses on search, experimentation decision-making, and gamification scoring.

Search Algorithm

Current Status

src/search/search.service.ts currently contains a placeholder implementation.
src/search/elasticsearch/elasticsearch.service.ts sets up Elasticsearch index mappings for course data and analytics tracking.

Future Design Intent

User queries should be translated into an Elasticsearch query that combines:
- match / multi_match text search on title, description, content, and instructorName
- search_as_you_type autocomplete support for partial title matching
- category, level, language, instructor, price, and status filters
- custom sort options such as relevance, rating, views, and createdAt
Search results should be scored by relevance and optionally boosted by popularity signals such as views, enrollments, and rating.

Decision Logic

The primary decision in search is whether a result matches the query and filters.
A stable search cache key is generated by hashing the serialized query, filters, sort, page, and limit.
Search state should preserve paging and filter selections across repeated requests.

Edge Cases

Empty or whitespace-only queries should return a safe default or empty result set.
Invalid filter values should be ignored or normalized rather than failing the request.
Pagination values below 1 or excessively large limits should be clamped to protected defaults.
Autocomplete should return partial matches from the search_as_you_type field without exposing unpublished or archived content.

Expected Test Cases

Query with matching text returns results including course title and description.
Filtered query excludes non-matching categories and price ranges.
Autocomplete returns suggestions for partial titles.
Cache key generation remains consistent for logically equivalent filter sets.

Performance Characteristics

Elasticsearch index mappings use keyword fields for exact matching and text fields for full-text search.
search_as_you_type is optimized for prefix suggestions.
Query execution should avoid loading large result sets into application memory; use pagination and from/size or search_after for deep paging.
Analytics indexing should be asynchronous to avoid adding latency to search requests.

Experimentation / Recommendation Decision Logic

Core Behavior

The automated decision pipeline in src/ab-testing/automation/automated-decision.service.ts follows these steps:

Validate the experiment is running.
Evaluate whether the experiment has met the duration threshold.
Calculate statistical significance for variants via StatisticalAnalysisService.
Determine a winner by:
- comparing each non-control variant against the control;
- ensuring variant metrics meet a minimum sample size;
- requiring at least one statistically significant metric;
- applying an effect size threshold for business impact;
- using the highest overall performance score as the selection tie-breaker.

Key Criteria

confidenceLevel: Confidence required for statistical tests.
minimumSampleSize: Minimum number of observations before considering a winner.
effectSizeThreshold: Minimum Cohen's d value for a meaningful difference.
durationThreshold: Minimum days an experiment must run before selecting a winner.

Edge Cases

No control variant: the service returns null and avoids selecting a winner.
Variants with insufficient sample size are skipped.
If none of the variants meet significance and effect size thresholds, no winner is selected.
Experiments that are not running are treated as ineligible for decision-making.

Expected Test Cases

autoSelectWinner returns no_winner when duration is below threshold.
autoSelectWinner returns no_winner when statistical significance is not met.
autoSelectWinner selects the correct variant when one has sufficient sample size, significant metrics, and the largest performance score.
getDecisionRecommendations returns accurate readiness guidance and winner candidate hints.
autoAllocateTraffic divides traffic proportionally across variants when scores are available.

Statistical Calculations

In src/ab-testing/analysis/statistical-analysis.service.ts:

Metric statistics compute standard error, confidence intervals, and p-values.
compareMetrics uses pooled standard error and a z-test with a critical value based on the requested confidence level.
calculateCohensD estimates effect size using pooled standard deviation.
interpretEffectSize maps Cohen's d to negligible/small/medium/large categories.

Performance Characteristics

Statistical analysis is data-driven and may require multiple repository queries.
Avoid repeated metric loads in hot paths by caching variant metrics per experiment when possible.
The current implementation performs a linear scan of variants and metrics, which is acceptable for small experiment sizes but should be optimized for larger experiments.

Gamification Algorithms

Points and Progression

src/gamification/points/points.service.ts currently implements:

point transaction creation for every user activity
progress updates for totalPoints and xp
level progression at every 1000 XP

Decision Logic

Each point addition increments both totalPoints and xp.
The current level is recalculated as Math.floor(xp / 1000) + 1.
A level-up event is planned but currently marked as TODO.

Edge Cases

Negative or zero point values should be validated before update.
New users without existing UserProgress are initialized with a default progression state.
Large point jumps may cross multiple levels in one update; the current logic supports this by recomputing level from XP.

Leaderboard Logic

src/gamification/leaderboards/leaderboards.service.ts currently:

orders users by totalPoints descending
retrieves the top limit players
computes a user rank by scanning the ordered progress list

Expected Test Cases

Adding points correctly updates user xp, totalPoints, and level.
New users receive an initialized progress record.
Leaderboard ranking orders users by descending points.
User rank is accurate and returns null for missing users.

Performance Characteristics

Leaderboard ranking uses an in-memory rank calculation (findIndex), which is O(n).
To scale, replace getUserRank with a database query that counts users with higher scores or uses a materialized rank field.
getTopPlayers should always use database ordering and take to limit results for large user sets.

Documentation Links

Search architecture: src/search/search.service.ts
Elasticsearch mapping: src/search/elasticsearch/elasticsearch.service.ts
Experiment decision logic: src/ab-testing/automation/automated-decision.service.ts
Statistical analysis: src/ab-testing/analysis/statistical-analysis.service.ts
Gamification scoring: src/gamification/points/points.service.ts
Leaderboard ranking: src/gamification/leaderboards/leaderboards.service.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complex Algorithm Documentation

Search Algorithm

Current Status

Future Design Intent

Decision Logic

Edge Cases

Expected Test Cases

Performance Characteristics

Experimentation / Recommendation Decision Logic

Core Behavior

Key Criteria

Edge Cases

Expected Test Cases

Statistical Calculations

Performance Characteristics

Gamification Algorithms

Points and Progression

Decision Logic

Edge Cases

Leaderboard Logic

Expected Test Cases

Performance Characteristics

Documentation Links

FilesExpand file tree

complex-algorithms.md

Latest commit

History

complex-algorithms.md

File metadata and controls

Complex Algorithm Documentation

Search Algorithm

Current Status

Future Design Intent

Decision Logic

Edge Cases

Expected Test Cases

Performance Characteristics

Experimentation / Recommendation Decision Logic

Core Behavior

Key Criteria

Edge Cases

Expected Test Cases

Statistical Calculations

Performance Characteristics

Gamification Algorithms

Points and Progression

Decision Logic

Edge Cases

Leaderboard Logic

Expected Test Cases

Performance Characteristics

Documentation Links