GitHub - ETS-Next-Gen/BIOT_Writing_Traits: Code that uses rotation matrices derived using BIOT (Best Interpretable Orthogonal Transformation) to align deBERTa hidden layers with the writing trait model of Deane (2024) [http://doi.org/10.1002/ets2.12377]

This library supports a method for creating interpretable 
dimensions of an LLM embedding, using a method that rotates
the embedding to maximize explainability with respect to a
list of predictor variables.

The method in question is BIOT 
(paper at https://www.sciencedirect.com/science/article/abs/pii/S0925231221006469, 
github repository at https://github.com/rebeccamarion/BIOT, 
fork for this project at https://github.com/ETS-Next-Gen/BIOT).

The code makes the following assumptions:

1. you start with a csv file that contains 
    (a) an ID field
    (b) a text field containing an essay or other documemt
    (c) one or more predictor variables to be used to align the LLM
    (d) possibly other variables of interest.

2. You run the main module in EMBEDDING MODE to produce the datafiles 
   that are needed to run BIOT

3. You run the BIOT code separately to produce rotation matrices.
   This gives you not only the rotation matrices but weights and
   correlation matrices telling you which dimensions in which hidden
   layers are potentially interpretable. 

4. You run main module in TRAIT SUMMARY MODE to produce trait scores 
   for each essay by applying selected rotation matrices to selected 
   hidden layers. This uses files named trait_vector.json and 
   excluded.json that provide a rule base. These files need to be
   manually created based on decisions about which information from
   which layer to use to create trait scores.

5.Alternatively, you run the main module in TOKEN MODE to produce token
  trait scores and/or highlighting for a specific trait on a specific
  document. The token scores basically tell you which parts of the essay
  contributed to high or low overall scores on a specific trait.

Installation:
pip install -r requirements.txt
python -m spacy download en_core_web_sm

Sample run commands:

EMBEDDING MODE
To run the default csv file to produce the embedding and predictor files needed to run BIOT rotation:

python src/main.py -i tests/input/persuade_2.0_sample.csv -r data/rotation/persuasive/ -o tests/output/persuade.csv -s holistic_essay_score,VocabularyLength,Organization,VocabularyFrequency,Interactivity,SentenceLength,SentenceStructure,SentenceComplexity,GrammarUsage,Narrativity,Contextualization,Conventionality,Mechanics,Dialogue,Cohesion,Concreteness,StanceTaking -v -e -n

TRAIT SUMMARY MODE

To run the default csv file containing multiple essays:

python src/main.py -i tests/input/persuade_2.0_sample.csv -r data/rotation/persuasive/ -v -o tests/output/persuade.csv

To do the same run but adding flags that make the key fields in the csv file explicit:

python src/main.py -i tests/input/persuade_2.0_sample.csv -r data/rotation/persuasive/ -v -o tests/output/persuade.csv -d ID -t full_text

to do the same but add a list of fields from the input file that are to be "passed through" and kept in the output file:

python src/main.py -i tests/input/persuade_2.0_sample.csv -r data/rotation/persuasive -v -o tests/output/persuade.csv -s prompt_name,task,assignment,source_text,gender,grade_level,ell_status,race_ethnicity,economically_disadvantaged,student_disability_status,VocabularyLength,Organization,VocabularyFrequency,Interactivity,SentenceLength,SentenceStructure,SentenceComplexity,GrammarUsage,Narrativity,Contextualization,Conventionality,Mechanics,Dialogue,Cohesion,Concreteness,StanceTaking -d ID -t full_text

TOKEN MODE:

to run a single file and output not only trait data for whole essays but trait information by token in a csv file (-c) and visualizations by token (-p):

python src/main.py -i data/essays/test_essay.txt -r data/rotation/persuasive/ -o tests/output/out.json -c -p

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
src		src
tasks		tasks
tests/input		tests/input
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages