This repo processes and analyses human and AI-generated essays produced in a "Thinking about Thinking" 5-day intensive.
It includes scripts for:
- Chunking essays into logical steps
- Cleaning chunks into bullet points
- Embedding cleaned chunks with a sentence-embedding model
- Analyzing trajectories with several metrics
Essay data is organized in two directories:
- FINAL/ - Human-written essays (7 essays: students A-G)
- META/ - AI-generated essays from blind prompts (7 essays: students A-G)
Each essay is a markdown file named student-{ID}-{source}-{title}.md where:
{ID}is the student identifier (A through G){source}is either FINAL (human) or META (AI){title}is the essay topic
The essays were produced during a 5-day "Thinking about Thinking" intensive. On Day 2, each student captured their essay intentions in a "blind prompt." On Day 5, that frozen prompt was submitted to an LLM with a standardizing metaprompt, producing an AI-generated essay from the same seed. Students also produced their own essays through 5 days of structured thinking.
Processed data outputs are stored in:
- output/data/chunks/ - Essays segmented into argumentative chunks
- output/data/embeddings/ - Semantic embeddings for each chunk
- output/data/metrics/ - Computed trajectory metrics (CSV and JSON)
The analysis pipeline consists of five main steps:
1. Chunking (01_chunk.py)
Segments each essay into sequential argumentative units using the Anthropic Batch API with Claude Sonnet 4.5. Each chunk represents one argumentative move. The chunking preserves exact original text without paraphrasing.
Output: JSON files in output/data/chunks/ with format {student}-{source}.json
2. Chunk Cleaning (01_02_clean_chunks_batch.py)
Cleans and simplifies chunks into bullet points using the Anthropic Batch API. This optional but recommended step condenses verbose chunks into concise bullet-point summaries while preserving semantic content, which improves embedding quality.
Output: Updated JSON files in output/data/chunks/ with added cleaned_chunks field
3. Embedding (02_embed.py)
Generates semantic embeddings for each chunk using the sentence-transformers library with the all-MiniLM-L6-v2 model (384 dimensions). Embeddings are normalized for cosine similarity calculations.
Output: JSON files in output/data/embeddings/ containing chunks and their embedding vectors
4. Metrics Computation (03_metrics.py)
Computes trajectory metrics for each essay in PCA-reduced space:
- Displacement metrics: mean, variance, and max step sizes between consecutive chunks
- Tortuosity: ratio of path length to straight-line distance (path efficiency)
- Momentum: directional consistency between consecutive steps
- Divergence curves: position-aligned comparison of human vs AI trajectories
- Homogeneity: pairwise similarity within human and AI groups
All metrics are computed using cosine distance in a 138-component PCA space (95% variance explained).
Output:
- output/data/metrics/essay_metrics.csv - Per-essay metrics
- output/data/metrics/divergence_curves.json - Paired trajectory divergence
- output/data/metrics/similarity_matrix.csv - Pairwise similarities
5. Visualization (04_figures.py)
Generates publication-ready figures:
- Figure 1: 2D PCA trajectory visualization (one subplot per student)
- Figure 2: Aggregate metric comparison (human vs AI)
- Figure 2b: Paired metric comparison (per-student lines)
- Figure 3: Divergence curves showing how human/AI trajectories drift apart
- Figure 4: Homogeneity matrix heatmap (pairwise similarity)
- Figures 5-7: Displacement profile visualizations (violin plots, series, distributions)
Output: PDF and PNG figures in plots/
- process_single_essay.py - Processes a single essay from /FINAL through the pipeline
- generate_annex_figures.py - LaTeX tables and supplementary figures
- assess_pca_dimensions.py - Analyzes optimal PCA dimensionality
This project uses uv for dependency management. All scripts should be run using uv run from the project root.
- Install dependencies:
uv sync- Copy
.env.EXAMPLEto.envand add your Anthropic API key:
cp .env.EXAMPLE .env
# Then edit .env and add your actual API keyRun the entire pipeline with a single interactive command:
uv run python main.pyThis will guide you through all steps with:
- Interactive confirmations before using the Anthropic API
- Automatic detection of existing outputs
- Option to run all steps or select specific ones
- Clear progress tracking and error handling
Run scripts individually in order:
# 1. Chunk essays into argumentative units
uv run python scripts/01_chunk.py
# 2. Clean chunks into bullet points (optional but recommended)
uv run python scripts/01_02_clean_chunks_batch.py
# 3. Generate embeddings for chunks
uv run python scripts/02_embed.py
# 4. Compute trajectory metrics
uv run python scripts/03_metrics.py
# 5. Generate figures
uv run python scripts/04_figures.py- Chunking (01): ~30 mintues- 1 hour (Anthropic Batch API)
- Chunk Cleaning (01_02): ~ 30 minutes - 1 hour (Anthropic Batch API)
- Embedding (02): ~1-2 minutes (local model)
All trajectory analyses use cosine distance as the primary distance metric and are performed in PCA-reduced embedding space to reduce noise and focus on main patterns of semantic variation. We work in embedding space (not a decoded schema space) since our essays differ in thesis and argument structure.
Cosine Distance measures the angular separation between embedding vectors:
cosine_distance(u, v) = 1 - (u · v) / (||u|| ||v||)
Range: [0, 2], where 0 = identical, 1 = orthogonal, 2 = opposite
Cosine Similarity (used for the similarity matrix) is the inverse:
cosine_similarity(u, v) = 1 - cosine_distance(u, v) = (u · v) / (||u|| ||v||)
Range: [-1, 1], where 1 = identical, 0 = orthogonal, -1 = opposite
Before computing trajectory metrics, all embeddings are projected into a shared PCA subspace:
- Fit PCA on pooled embeddings from all essays (human + AI, all students)
- Transform each essay's embeddings using the fitted PCA
- Compute metrics in the PCA-reduced space
Dimensions:
- Original: 384 dimensions (
all-MiniLM-L6-v2model) - PCA-reduced: 138 components (95% variance explained)
All metrics computed in PCA-reduced space using cosine distance:
Cosine distance between consecutive chunk embeddings. Derived metrics:
- Mean displacement: Average semantic step size
- Displacement variance: Variability in step sizes
- Max displacement: Largest single conceptual jump
Interpretation: Higher variance suggests exploratory process with uneven pacing..
Ratio of total path length to straight-line endpoint distance:
tortuosity = (sum of displacements) / endpoint_distance
Range: [1, ∞), where 1 = perfectly direct path, >1 = circuitous/wandering
Expected pattern: AI essays should have lower tortuosity (planned all at once, direct path). Human essays should have higher tortuosity (5-day iterative process, revisiting and reframing).
Average cosine similarity between consecutive direction vectors. Measures whether an essay maintains consistent conceptual direction.
direction[i] = embedding[i+1] - embedding[i]
momentum = mean(cosine_similarity(direction[i], direction[i+1]))
Range: [-1, 1], where 1 = perfect momentum, 0 = orthogonal changes, -1 = reversing
Adapted from: Nour et al. (2025), "Charting trajectories of human thought using large language models" (VECTOR framework)
Position-aligned comparison showing how human and AI trajectories drift apart over the essay:
- Interpolate both trajectories to 20 common points (normalized positions 0 to 1)
- Compute cosine distance at each position:
divergence[t] = cosine_distance(human[t], ai[t])
Interpolation method: Linear interpolation in embedding space.
Expected pattern: Divergence should increase over the essay. Both start from the same blind prompt (similar openings), but the 5-day human process produces increasing departure from AI baseline.
Measures similarity within human group and within AI group:
- Interpolate all trajectories to 20 points
- Flatten to single vectors (20 × embedding_dim)
- Compute pairwise cosine similarities
- Compare mean within-group similarities
Expected pattern: AI essays should be more homogeneous (higher AI-AI similarity). This operationalizes "process convergence" — collapsing to a single prompt produces more similar outputs across different people.
To visualize high-dimensional trajectories (384D) in 2D:
- Pool all chunk embeddings from all essays
- Fit PCA extracting top 2 components
- Project all embeddings to 2D
- Plot each essay as a trajectory with points connected by lines
Rationale for pooled PCA: Ensures visual comparability across essays
Limitations: 2D projection captures only ~20-30% of variance. Qualitative illustration only — should not be over-interpreted.
With N=7 student pairs (6 complete pairs), we report:
- Descriptive statistics (means, standard deviations)
- Individual paired comparisons
- Visualizations of patterns
Detailed documentation is available in the docs/ directory:
- TRAJECTORY_ANALYSIS_README.md - Complete analysis methodology, pipeline overview, and design rationale
- trajectory_analysis_method.md - Technical methods documentation including formulas and code examples
- OUTPUT_FORMAT.md - Structure of chunking output JSON files
- README_CHUNKING.md - Chunking script usage and setup
- BATCH_README.md - Anthropic Batch API usage details
- CHUNKING_SETUP.md - Chunking configuration and setup
- CHUNK_CLEANING_README.md - Chunk cleaning/simplification process
- TEST_RESULTS.md - Validation and testing results
These documents provide comprehensive details on:
- Computational methods and mathematical formulas
- PCA preprocessing and dimensionality reduction
- Each trajectory metric with interpretation guides
- Visualization techniques and their limitations
- Implementation code examples
- Design decisions and rationale
They were largely used to provide context for development with Claude Code. Some might not be updated.
Nour, M. M., et al. (2025). "Charting trajectories of human thought using large language models." Nature.
Caveat: Unlike Nour et al. (2025) which used supervised decoding (participants retold the same known story), our essays differ in thesis, argument, and conclusions. No ground-truth schema exists. We work in semantic embedding space (with PCA preprocessing) — appropriate for our design but means we cannot decode trajectories into shared human-interpretable argumentative moves.
Created with Claude Code, in line with the intensive's theme.