'THINKING ABOUT THINKING' INTENSIVE -- QUANTITATIVE ANALYSIS

Thinking About Thinking

This repo processes and analyses human and AI-generated essays produced in a "Thinking about Thinking" 5-day intensive.

It includes scripts for:

Chunking essays into logical steps
Cleaning chunks into bullet points
Embedding cleaned chunks with a sentence-embedding model
Analyzing trajectories with several metrics

Data

Essay data is organized in two directories:

FINAL/ - Human-written essays (7 essays: students A-G)
META/ - AI-generated essays from blind prompts (7 essays: students A-G)

Each essay is a markdown file named student-{ID}-{source}-{title}.md where:

{ID} is the student identifier (A through G)
{source} is either FINAL (human) or META (AI)
{title} is the essay topic

The essays were produced during a 5-day "Thinking about Thinking" intensive. On Day 2, each student captured their essay intentions in a "blind prompt." On Day 5, that frozen prompt was submitted to an LLM with a standardizing metaprompt, producing an AI-generated essay from the same seed. Students also produced their own essays through 5 days of structured thinking.

Processed data outputs are stored in:

output/data/chunks/ - Essays segmented into argumentative chunks
output/data/embeddings/ - Semantic embeddings for each chunk
output/data/metrics/ - Computed trajectory metrics (CSV and JSON)

Pipeline

The analysis pipeline consists of five main steps:

1. Chunking (01_chunk.py)

Segments each essay into sequential argumentative units using the Anthropic Batch API with Claude Sonnet 4.5. Each chunk represents one argumentative move. The chunking preserves exact original text without paraphrasing.

Output: JSON files in output/data/chunks/ with format {student}-{source}.json

2. Chunk Cleaning (01_02_clean_chunks_batch.py)

Cleans and simplifies chunks into bullet points using the Anthropic Batch API. This optional but recommended step condenses verbose chunks into concise bullet-point summaries while preserving semantic content, which improves embedding quality.

Output: Updated JSON files in output/data/chunks/ with added cleaned_chunks field

3. Embedding (02_embed.py)

Generates semantic embeddings for each chunk using the sentence-transformers library with the all-MiniLM-L6-v2 model (384 dimensions). Embeddings are normalized for cosine similarity calculations.

Output: JSON files in output/data/embeddings/ containing chunks and their embedding vectors

4. Metrics Computation (03_metrics.py)

Computes trajectory metrics for each essay in PCA-reduced space:

Displacement metrics: mean, variance, and max step sizes between consecutive chunks
Tortuosity: ratio of path length to straight-line distance (path efficiency)
Momentum: directional consistency between consecutive steps
Divergence curves: position-aligned comparison of human vs AI trajectories
Homogeneity: pairwise similarity within human and AI groups

All metrics are computed using cosine distance in a 138-component PCA space (95% variance explained).

Output:

output/data/metrics/essay_metrics.csv - Per-essay metrics
output/data/metrics/divergence_curves.json - Paired trajectory divergence
output/data/metrics/similarity_matrix.csv - Pairwise similarities

5. Visualization (04_figures.py)

Generates publication-ready figures:

Figure 1: 2D PCA trajectory visualization (one subplot per student)
Figure 2: Aggregate metric comparison (human vs AI)
Figure 2b: Paired metric comparison (per-student lines)
Figure 3: Divergence curves showing how human/AI trajectories drift apart
Figure 4: Homogeneity matrix heatmap (pairwise similarity)
Figures 5-7: Displacement profile visualizations (violin plots, series, distributions)

Output: PDF and PNG figures in plots/

Additional Scripts

process_single_essay.py - Processes a single essay from /FINAL through the pipeline
generate_annex_figures.py - LaTeX tables and supplementary figures
assess_pca_dimensions.py - Analyzes optimal PCA dimensionality

Usage

This project uses uv for dependency management. All scripts should be run using uv run from the project root.

Setup

Install dependencies:

uv sync

Copy .env.EXAMPLE to .env and add your Anthropic API key:

cp .env.EXAMPLE .env
# Then edit .env and add your actual API key

Running the Pipeline

Option 1: Streamlined Pipeline (Recommended)

Run the entire pipeline with a single interactive command:

uv run python main.py

This will guide you through all steps with:

Interactive confirmations before using the Anthropic API
Automatic detection of existing outputs
Option to run all steps or select specific ones
Clear progress tracking and error handling

Option 2: Manual Step-by-Step

Run scripts individually in order:

# 1. Chunk essays into argumentative units
uv run python scripts/01_chunk.py

# 2. Clean chunks into bullet points (optional but recommended)
uv run python scripts/01_02_clean_chunks_batch.py

# 3. Generate embeddings for chunks
uv run python scripts/02_embed.py

# 4. Compute trajectory metrics
uv run python scripts/03_metrics.py

# 5. Generate figures
uv run python scripts/04_figures.py

Expected Processing Time

Chunking (01): ~30 mintues- 1 hour (Anthropic Batch API)
Chunk Cleaning (01_02): ~ 30 minutes - 1 hour (Anthropic Batch API)
Embedding (02): ~1-2 minutes (local model)

Supported Measures

All trajectory analyses use cosine distance as the primary distance metric and are performed in PCA-reduced embedding space to reduce noise and focus on main patterns of semantic variation. We work in embedding space (not a decoded schema space) since our essays differ in thesis and argument structure.

Distance Metric

Cosine Distance measures the angular separation between embedding vectors:

cosine_distance(u, v) = 1 - (u · v) / (||u|| ||v||)

Range: [0, 2], where 0 = identical, 1 = orthogonal, 2 = opposite

Cosine Similarity (used for the similarity matrix) is the inverse:

cosine_similarity(u, v) = 1 - cosine_distance(u, v) = (u · v) / (||u|| ||v||)

Range: [-1, 1], where 1 = identical, 0 = orthogonal, -1 = opposite

PCA Dimensionality Reduction

Before computing trajectory metrics, all embeddings are projected into a shared PCA subspace:

Fit PCA on pooled embeddings from all essays (human + AI, all students)
Transform each essay's embeddings using the fitted PCA
Compute metrics in the PCA-reduced space

Dimensions:

Original: 384 dimensions (all-MiniLM-L6-v2 model)
PCA-reduced: 138 components (95% variance explained)

Trajectory Metrics

All metrics computed in PCA-reduced space using cosine distance:

1. Step-to-Step Displacement

Cosine distance between consecutive chunk embeddings. Derived metrics:

Mean displacement: Average semantic step size
Displacement variance: Variability in step sizes
Max displacement: Largest single conceptual jump

Interpretation: Higher variance suggests exploratory process with uneven pacing..

2. Tortuosity (Path Efficiency)

Ratio of total path length to straight-line endpoint distance:

tortuosity = (sum of displacements) / endpoint_distance

Range: [1, ∞), where 1 = perfectly direct path, >1 = circuitous/wandering

Expected pattern: AI essays should have lower tortuosity (planned all at once, direct path). Human essays should have higher tortuosity (5-day iterative process, revisiting and reframing).

3. Momentum (Directional Consistency)

Average cosine similarity between consecutive direction vectors. Measures whether an essay maintains consistent conceptual direction.

direction[i] = embedding[i+1] - embedding[i]
momentum = mean(cosine_similarity(direction[i], direction[i+1]))

Range: [-1, 1], where 1 = perfect momentum, 0 = orthogonal changes, -1 = reversing

Adapted from: Nour et al. (2025), "Charting trajectories of human thought using large language models" (VECTOR framework)

4. Matched-Pair Divergence Curve

Position-aligned comparison showing how human and AI trajectories drift apart over the essay:

Interpolate both trajectories to 20 common points (normalized positions 0 to 1)
Compute cosine distance at each position: divergence[t] = cosine_distance(human[t], ai[t])

Interpolation method: Linear interpolation in embedding space.

Expected pattern: Divergence should increase over the essay. Both start from the same blind prompt (similar openings), but the 5-day human process produces increasing departure from AI baseline.

5. Inter-Essay Homogeneity (Pairwise Similarity)

Measures similarity within human group and within AI group:

Interpolate all trajectories to 20 points
Flatten to single vectors (20 × embedding_dim)
Compute pairwise cosine similarities
Compare mean within-group similarities

Expected pattern: AI essays should be more homogeneous (higher AI-AI similarity). This operationalizes "process convergence" — collapsing to a single prompt produces more similar outputs across different people.

Visualization Methods

2D PCA Projection

To visualize high-dimensional trajectories (384D) in 2D:

Pool all chunk embeddings from all essays
Fit PCA extracting top 2 components
Project all embeddings to 2D
Plot each essay as a trajectory with points connected by lines

Rationale for pooled PCA: Ensures visual comparability across essays

Limitations: 2D projection captures only ~20-30% of variance. Qualitative illustration only — should not be over-interpreted.

Statistical Considerations

With N=7 student pairs (6 complete pairs), we report:

Descriptive statistics (means, standard deviations)
Individual paired comparisons
Visualizations of patterns

Documentation

Detailed documentation is available in the docs/ directory:

TRAJECTORY_ANALYSIS_README.md - Complete analysis methodology, pipeline overview, and design rationale
trajectory_analysis_method.md - Technical methods documentation including formulas and code examples
OUTPUT_FORMAT.md - Structure of chunking output JSON files
README_CHUNKING.md - Chunking script usage and setup
BATCH_README.md - Anthropic Batch API usage details
CHUNKING_SETUP.md - Chunking configuration and setup
CHUNK_CLEANING_README.md - Chunk cleaning/simplification process
TEST_RESULTS.md - Validation and testing results

These documents provide comprehensive details on:

Computational methods and mathematical formulas
PCA preprocessing and dimensionality reduction
Each trajectory metric with interpretation guides
Visualization techniques and their limitations
Implementation code examples
Design decisions and rationale

They were largely used to provide context for development with Claude Code. Some might not be updated.

References

Nour, M. M., et al. (2025). "Charting trajectories of human thought using large language models." Nature.

Caveat: Unlike Nour et al. (2025) which used supervised decoding (participants retold the same known story), our essays differ in thesis, argument, and conclusions. No ground-truth schema exists. We work in semantic embedding space (with PCA preprocessing) — appropriate for our design but means we cannot decode trajectories into shared human-interpretable argumentative moves.

Created with Claude Code, in line with the intensive's theme.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
FINAL		FINAL
META		META
annex_figures		annex_figures
blind-prompts		blind-prompts
docs		docs
output		output
plots		plots
scripts		scripts
.env.EXAMPLE		.env.EXAMPLE
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
RESULTS.md		RESULTS.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

'THINKING ABOUT THINKING' INTENSIVE -- QUANTITATIVE ANALYSIS