Skip to content

Commit 44d4aea

Browse files
committed
Update course website with example notebooks and fix gitignores
1 parent 5ebbfa9 commit 44d4aea

15 files changed

+732
-0
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,8 @@
88
/trackchanges.sty
99

1010
/.luarc.json
11+
AGENT_INSTRUCTIONS.md
12+
.agents/
13+
.agent/
14+
_agents/
15+
_agent/

week10_summary.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Week 10 Summary: ML for characterization signals
2+
3+
## Cross-Book Summary
4+
5+
### 1. Clustering Spectral Data (Neuer Ch 5, McClarren Ch 4)
6+
- **K-Means Clustering:** A fundamental tool for grouping similar signals (e.g., XRD or EDS spectra). By minimizing the variance within clusters, we can automatically identify distinct phases or chemical environments in a dataset (Neuer Ch 5.3).
7+
- **Mini-Batch K-Means:** Essential for high-throughput characterization where millions of spectra are collected in a single mapping session.
8+
- **Visualization with t-SNE:** High-dimensional spectra (e.g., 2048 channels) are impossible to visualize directly. t-SNE projects these into 2D while preserving "neighborhood" relationships, making it easy to spot outliers or transitional states (Neuer Ch 5.4).
9+
10+
### 2. Autoencoders for Signal Processing (McClarren Ch 8)
11+
- **Latent Representations:** An autoencoder learns to compress a spectrum into a few "latent variables" that capture the essential physical information (peak positions, intensities).
12+
- **Denoising:** By training an autoencoder to reconstruct a clean signal from a noisy input, we can effectively remove experimental fluctuations without the blurring associated with traditional filters (McClarren Ch 8.3.2).
13+
- **Non-linear Compression:** Unlike PCA, autoencoders can capture non-linear relationships in spectral data, enabling much higher compression ratios for massive characterization libraries (McClarren Ch 8.2).
14+
15+
### 3. Scientific Integrity in ML
16+
- **Peak Preservation:** The goal of ML in characterization is to assist the scientist, not replace the physics. Models must be validated to ensure they do not "invent" peaks or smooth away critical structural information.
17+
18+
---
19+
20+
## 90-Minute Lecture Strategy (50 Slides)
21+
22+
### Part 1: High-Dimensional Signals (Slides 1-10)
23+
- The digital footprint of materials: XRD, EDS, EELS, and Raman.
24+
- Why manual peak-picking fails in high-throughput experiments.
25+
- The "Vector" representation of a spectrum.
26+
27+
### Part 2: Discovering Structure with Clustering (Slides 11-20)
28+
- K-Means: Geometry and Algorithm.
29+
- The "Elbow Method": Deciding how many phases are in your sample.
30+
- Case Study: Mapping a ternary alloy system with K-Means.
31+
32+
### Part 3: Visualizing the Unseen (Slides 21-30)
33+
- t-SNE: The intuition of "Stochastic Proximity."
34+
- Finding "Hidden" relationships in spectral libraries.
35+
- Pitfalls: Why t-SNE distances can be misleading.
36+
37+
### Part 4: Autoencoders & Denoising (Slides 31-45)
38+
- The Hourglass Architecture: Encoder, Bottleneck, Decoder.
39+
- Applications: Compressing leaf spectra (McClarren Ch 8.2).
40+
- Denoising characterization signals: Improving SNR with Deep Learning.
41+
- Feature extraction: Using the bottleneck as a physical descriptor.
42+
43+
### Part 5: From Data to Discovery (Slides 46-50)
44+
- Real-time spectral analysis during experiments.
45+
- Ensuring physical consistency in ML outputs.
46+
- Summary: The automated characterization pipeline.
47+
48+
---
49+
50+
## Quarto Website Update (Summary)
51+
**Summary for ML-PC Week 10:**
52+
This unit focuses on the processing of high-dimensional **Characterization Signals** (like XRD, EDS, and EELS) using unsupervised learning. We introduce **K-Means Clustering** and **t-SNE** for the automatic identification and visualization of phases in large experimental libraries. We then explore **Autoencoders**—neural networks that learn to compress complex spectra into a low-dimensional "latent space." This allows for advanced denoising and feature extraction, enabling scientists to handle the massive data volumes produced by modern high-throughput characterization tools without losing physical insight.

week11_summary.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Week 11 Summary: Automation in microscopy and characterization
2+
3+
## Cross-Book Summary
4+
5+
### 1. Multi-Modal Data Fusion (Murphy Ch 11, Neuer Ch 2)
6+
- **Beyond a Single Sensor:** In modern characterization, we often collect images (SEM), chemistry (EDS), and orientations (EBSD) simultaneously. Fusing these data streams provides a more complete physical picture than any single modality.
7+
- **Bayesian Sensor Fusion:** A mathematical framework for combining uncertain measurements. If two sensors (e.g., two thermocouples) provide conflicting information, the Bayesian posterior weights them by their respective precisions (inverse variances), allowing for robust state estimation (Murphy Ch 4.6.4).
8+
- **Latent Fusion:** Using autoencoders or PCA to find a shared low-dimensional embedding where different data types (images and spectra) can be compared and combined (Murphy Ch 19).
9+
10+
### 2. Reinforcement Learning for Control (McClarren Ch 9)
11+
- **The Autonomous Agent:** In RL, an agent learns to interact with an environment (e.g., a microscope or a furnace) to maximize a reward.
12+
- **The RL Loop:** State (current image), Action (adjusting focus/stigmation), and Reward (image sharpness/SNR).
13+
- **Policy Gradients:** A method for training deep neural networks to make a sequence of decisions that lead to an optimal scientific outcome (McClarren Ch 9.1).
14+
- **Case Study (McClarren):** Using RL to control the complex cooling cycles of glass, demonstrating the transition from monitoring to active control.
15+
16+
### 3. Computer Vision in the Lab (ML-PC Index)
17+
- **Automated Workflows:** Using CNNs for real-time region-of-interest (ROI) detection, automated autofocus, and high-speed classification of diffraction patterns (e.g., EBSD Kikuchi bands).
18+
19+
---
20+
21+
## 90-Minute Lecture Strategy (50 Slides)
22+
23+
### Part 1: Toward the Self-Driving Lab (Slides 1-10)
24+
- The bottleneck of human-operated characterization.
25+
- The concept of "Autonomous Characterization": Scan, Analyze, Decide, Repeat.
26+
- Overview of the automation stack.
27+
28+
### Part 2: ML-Assisted Instrument Tuning (Slides 11-20)
29+
- Computer Vision for Autofocus and Beam Alignment.
30+
- Real-time feedback loops: Turning pixels into control signals.
31+
- Case Study: Automated EBSD mapping.
32+
33+
### Part 3: Fusing Multi-Modal Data (Slides 21-35)
34+
- Why fuse? Structure vs. Chemistry vs. Properties.
35+
- Bayesian Fusion: Handling sensor noise and conflicts (Murphy Ch 11.4).
36+
- Multi-head NNs for multi-modal classification.
37+
- Case Study: Combining XRD and EDS for phase identification.
38+
39+
### Part 4: Reinforcement Learning for Lab Control (Slides 36-45)
40+
- Introduction to the RL Framework (McClarren Ch 9).
41+
- Defining Reward Functions for scientific experiments.
42+
- Case Study: Closing the loop in industrial glass processing.
43+
44+
### Part 5: Summary: The Integrated Pipeline (Slides 46-50)
45+
- The shift from "Post-mortem" analysis to "On-the-fly" discovery.
46+
- Challenges: Latency, safety, and physical limits of automation.
47+
- Summary: The vision of autonomous materials characterization.
48+
49+
---
50+
51+
## Quarto Website Update (Summary)
52+
**Summary for ML-PC Week 11:**
53+
This unit explores the cutting edge of **Autonomous Characterization**, where machine learning moves from passive data analysis to active instrument control. We introduce **Multi-Modal Data Fusion** techniques to combine information from diverse sensors like SEM images, EDS spectra, and process logs using Bayesian frameworks. We then discuss **Reinforcement Learning (RL)** as a tool for automating complex laboratory tasks, such as instrument tuning and process optimization. Through case studies in microscopy and industrial processing, students learn how to build integrated pipelines that can autonomously find, characterize, and decide the next steps of an experiment.

week12_summary.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Week 12 Summary: Uncertainty-aware regression & Gaussian Processes
2+
3+
## Cross-Book Summary
4+
5+
### 1. The Value of "Knowing what you don't know" (Neuer Ch 6, Murphy Ch 15)
6+
- **Epistemic vs. Aleatoric Uncertainty:**
7+
- **Aleatoric:** The inherent randomness in the physical process (e.g., sensor noise).
8+
- **Epistemic:** The model's ignorance due to lack of training data in a specific region of the parameter space.
9+
- **Danger of Overconfidence:** Standard neural networks often provide "point estimates" that can be wildly overconfident when extrapolating into unknown physical regimes.
10+
11+
### 2. Gaussian Processes (GPs) (Murphy Ch 15, Bishop Ch 6)
12+
- **Distribution over Functions:** A GP defines a prior over an infinite space of functions. After seeing data, it provides a posterior distribution, yielding both a mean prediction and a variance (uncertainty).
13+
- **Kernels as Physical Priors:** The kernel function (e.g., Radial Basis Function or Matérn) encodes our assumptions about the smoothness and length scales of the physical phenomenon (Bishop Ch 6.4).
14+
- **Non-Parametric Nature:** Unlike NNs, GPs don't have a fixed number of parameters; they scale with the number of training points, making them ideal for "small but high-quality" materials datasets.
15+
16+
### 3. GP-Based Process Maps (ML-PC Index)
17+
- **Confidence Ribbons:** Visualizing the uncertainty allows engineers to see where a process map is reliable and where more experiments are needed.
18+
- **Kriging:** GP regression is closely related to Kriging, a method long used in geostatistics and now widely applied to interpolate materials property surfaces.
19+
20+
---
21+
22+
## 90-Minute Lecture Strategy (50 Slides)
23+
24+
### Part 1: Uncertainty in Science (Slides 1-10)
25+
- Why a single number is never enough.
26+
- Risk management in materials processing: The cost of being wrong.
27+
- Visualizing distributions: Histograms, error bars, and density plots.
28+
29+
### Part 2: Gaussian Process Fundamentals (Slides 11-25)
30+
- The Bayesian viewpoint: Function space vs. Parameter space.
31+
- Kernels: How do we define "Similarity" between two material states?
32+
- The GP Math: Conditional Gaussians and Matrix Inversion.
33+
- Interpreting the Variance: Where does the "Shaded region" come from?
34+
35+
### Part 3: GP Case Studies (Slides 26-40)
36+
- Case Study: Predicting tensile strength across a temperature-strain rate space.
37+
- GP for Experimental Design: Identifying the "Gaps" in a database.
38+
- Multi-Task GPs: Sharing information between related properties (e.g., Hardness and Yield Strength).
39+
40+
### Part 4: Advanced Probabilistic ML (Slides 41-45)
41+
- Mixture Density Networks (MDNs): Handling multi-modal uncertainties (Neuer Ch 6.4).
42+
- Dropout as a Bayesian approximation in deep NNs.
43+
44+
### Part 5: Summary: Decision Making Under Uncertainty (Slides 46-50)
45+
- Using confidence intervals to define "Safe" process windows.
46+
- Summary: Building models that scientists can trust.
47+
48+
---
49+
50+
## Quarto Website Update (Summary)
51+
**Summary for ML-PC Week 12:**
52+
This unit introduces **Probabilistic Machine Learning**, focusing on the quantification of uncertainty in materials models. We explore why point estimates can be dangerous in engineering and introduce **Gaussian Processes (GPs)** as a powerful tool for uncertainty-aware regression. Students learn how kernels encode physical assumptions about data smoothness and how the resulting predictive distributions can be used to build robust process maps. We also discuss the difference between aleatoric (noise) and epistemic (ignorance) uncertainty and how to use confidence intervals to drive scientific decision-making.

week13_summary.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Week 13 Summary: Physics-informed and constrained ML
2+
3+
## Cross-Book Summary
4+
5+
### 1. Physics-Informed Neural Networks (PINNs) (Neuer Ch 6)
6+
- **Embedding Laws:** Instead of letting a network learn purely from data, we enforce physical laws (ODEs, PDEs) by including them in the loss function. This ensures the model's predictions are physically consistent (e.g., mass or energy is conserved).
7+
- **Automatic Differentiation:** Modern deep learning frameworks (Tensorflow, PyTorch) allow for the exact calculation of derivatives of the network's output with respect to its inputs. This enables the network to "evaluate" physical equations during the training process (Neuer Ch 6.3.1).
8+
- **Boundary Conditions:** Techniques like the Lagaris substitution allow us to force the network to satisfy initial or boundary conditions by design, rather than just as a soft constraint in the loss function (Neuer Ch 6.3.3).
9+
10+
### 2. Governing Equation Discovery (McClarren Ch 2.5)
11+
- **Dictionary-Based Regression:** If we don't know the exact law but suspect its form, we can build a "dictionary" of candidate functions (e.g., $\sin \theta, \theta^2, \dot{\theta}$).
12+
- **Sparse Identification:** Using regularized regression (Lasso), we can identify which few terms from the dictionary are actually responsible for the observed behavior, effectively "discovering" the underlying physics from noisy experimental data.
13+
- **Dimensional Reasoning:** Using unit analysis to guide the search for coefficients, ensuring the discovered model is physically plausible (McClarren Ch 2.5.1).
14+
15+
### 3. Constraints in Materials Science (ML-PC Index)
16+
- **Monotonicity:** Ensuring that a predicted property (e.g., hardness) always increases with a specific alloying element if the physics demands it.
17+
- **Hybrid Modeling:** Combining a physical "White-Box" model for well-understood parts of a process with a data-driven "Black-Box" for the complex, unknown parts (Grey-Box modeling).
18+
19+
---
20+
21+
## 90-Minute Lecture Strategy (50 Slides)
22+
23+
### Part 1: Why Physics Matters in ML (Slides 1-10)
24+
- The limits of unconstrained "Black-Box" models.
25+
- "Accurate but Physical": Why we need models that respect conservation laws.
26+
- The cost of training: PINNs require significantly less data.
27+
28+
### Part 2: Automatic Differentiation (Slides 11-20)
29+
- The fundamental engine: How GradientTape works (Neuer Ch 6.3).
30+
- Derivatives as first-class citizens in ML architectures.
31+
- Practical Example: Implementing a simple derivative constraint in code.
32+
33+
### Part 3: Solving Physics with NNs (Slides 21-35)
34+
- PINN Architectures: The Data Loss + The Physics Loss.
35+
- Enforcing Boundary and Initial Conditions (The Lagaris Approach).
36+
- Case Study: Heat transfer in 3D printing (solving the Heat Equation).
37+
38+
### Part 4: Equation Discovery from Lab Data (Slides 36-45)
39+
- Sparse Regression and the "Dictionary of Laws."
40+
- Case Study: Discovering the damped pendulum equation (McClarren Ch 2.5).
41+
- Using Unit Analysis to prune the search space.
42+
43+
### Part 5: Summary: The Grey-Box Future (Slides 46-50)
44+
- Hybrid architectures: When to use PINNs vs. traditional FEA.
45+
- Building trust: Why physical models are easier to deploy in industry.
46+
- Summary: ML as a partner to physical intuition.
47+
48+
---
49+
50+
## Quarto Website Update (Summary)
51+
**Summary for ML-PC Week 13:**
52+
This unit explores **Physics-Informed Machine Learning**, a paradigm that combines the flexibility of neural networks with the rigor of physical laws. We introduce **Physics-Informed Neural Networks (PINNs)** and discuss how automatic differentiation allows us to embed ordinary and partial differential equations directly into the training process. We also explore **Governing Equation Discovery**, using sparse regression to "extract" physical laws from noisy experimental data. Students learn how to apply physical constraints like monotonicity and conservation to build hybrid models that are more data-efficient, interpretable, and trustworthy for materials engineering.

week14_summary.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Week 14 Summary: Integration, limits, and reflection
2+
3+
## Cross-Book Summary
4+
5+
### 1. Explainability: Opening the Black Box (Neuer Ch 7)
6+
- **Beyond Prediction:** In engineering, knowing "what" will happen is often less important than knowing "why." Explainability builds the trust necessary for industrial deployment.
7+
- **Sensitivity Analysis:** A local explanation method where we perturb the input variables ( + \varepsilon$) and observe the change in output. This reveals which process parameters are the primary drivers of material performance (Neuer Ch 7.2).
8+
- **Levels of Explanation:** Explainability must be tailored to the audience, from the management level (KPIs) to the process expert (physical consistency) and the data scientist (feature importance).
9+
10+
### 2. Causality and Semantics (Neuer Ch 7.1, 7.3)
11+
- **Causal process chains:** Understanding that an anomaly discovered at the end of a process chain (e.g., final inspection) was likely caused by an event early in the chain. ML models should ideally move from **detection** (after the fact) to **prediction** (early warning) to allow for corrective action (Neuer Ch 7.3.3).
12+
- **Ontologies:** Digitizing the "meaning" of materials data. By mapping raw variables to semantic concepts (e.g., "Rolling Force" is a type of "Mechanical Stress"), we allow algorithms to leverage human-like reasoning.
13+
14+
### 3. The Limits of AI in Materials Science (Sandfeld Ch 1, McClarren)
15+
- **Data Bias:** Models are only as good as the history they have seen. If a database only contains "successful" experiments, the AI will be blind to failure modes.
16+
- **AI Hallucinations:** Large models can produce patterns that look physically plausible but violate fundamental laws. The materials scientist remains the ultimate filter for scientific truth.
17+
- **The Role of the Expert:** AI is a powerful assistant that automates the tedious (peak picking, segmentation) and explores the vast (high-dimensional process maps), but the "Scientific Question" and the "Final Interpretation" remain human tasks.
18+
19+
---
20+
21+
## 90-Minute Lecture Strategy (50 Slides)
22+
23+
### Part 1: Course Synthesis (Slides 1-10)
24+
- Recap: From signal formation (Week 2) to physics-informed models (Week 13).
25+
- The big picture: The AI-driven materials lifecycle.
26+
27+
### Part 2: Explainable ML (Slides 11-20)
28+
- The "Black Box" problem in high-stakes engineering.
29+
- Sensitivity analysis: Using perturbation theory to probe the model (Neuer Ch 7.2).
30+
- Feature importance: SHAP and LIME intuition.
31+
32+
### Part 3: Causality & Process Insight (Slides 21-30)
33+
- Thinking in causal graphs: Cause → Mechanism → Effect.
34+
- Detection vs. Prediction: The value of time in industrial ML (Neuer Ch 7.3.3).
35+
- Introduction to Materials Ontologies: Digitizing expert knowledge.
36+
37+
### Part 4: Ethics and Limits (Slides 31-45)
38+
- Bias in materials data: Representation and "Success" bias.
39+
- The danger of data-driven over-extrapolation.
40+
- Environmental and ethical cost of "Big AI" vs. the efficiency of PINNs.
41+
42+
### Part 5: Final Outlook: The AI 4 Materials Era (Slides 46-50)
43+
- Self-driving labs and the future of the materials scientist.
44+
- Conclusion: AI as a tool for a more sustainable and efficient world.
45+
46+
---
47+
48+
## Quarto Website Update (Summary)
49+
**Summary for ML-PC Week 14:**
50+
This final unit provides a comprehensive reflection on the role of machine learning in materials characterization and processing. We introduce the concepts of **Explainability** and **Sensitivity Analysis**, demonstrating how to look inside "black-box" models to understand the physical drivers of their predictions. We discuss **Causality** in the process chain and the use of **Ontologies** to digitize scientific meaning. Finally, we critically assess the **Limits and Ethics** of AI, focusing on data bias, the risk of physical hallucinations, and the evolving partnership between the human expert and the autonomous algorithm in the future of materials discovery.

0 commit comments

Comments
 (0)