This project contains two components implemented in Python:
- A Principal Component Analysis (PCA) module for dimensionality reduction and image reconstruction.
- A character-level N-gram language model supporting probability computation and text generation.
- load_and_center_dataset(filename): Loads data and centers it by subtracting the mean.
- get_covariance(dataset): Computes the sample covariance matrix.
- get_eig(S, k): Returns the top k eigenvalues and eigenvectors.
- get_eig_prop(S, prop): Returns eigenvectors explaining more than a given variance proportion.
- project_and_reconstruct_image(image, U): Projects an image into PCA subspace and reconstructs it.
- display_image(...): Displays original and reconstructed images side by side.
- fit(text): Builds n-gram counts from training text.
- logprob(s): Computes log-probability of a string.
- prob(s): Computes string probability.
- next_char_distribution(context): Returns the next-character distribution.
- generate(num_chars, seed): Generates text from the model.
Place your .npy dataset or text input in the project directory and call the appropriate functions.
- pca_and_ngram.py — main implementation file
- test_ngram.py — small test/demo script for the n-gram model
- dataset.npy — sample data (optional)
- README.md
Python 3
NumPy
SciPy
Matplotlib
Install them with:
pip install -r requirements.txt
Macy Xiang
https://github.com/macyxiangA