VGS_seq_or_parallel_CogSci

This repository contains the instructions and scripts to replicate experiments reported in:

Khorrami, Cruz Blandon & Räsänen: Computational Insights to Acquisition of Phonemes, Words, and Word Meanings in Early Language: Sequential or Parallel Acquisition? Proc. CogSci-2023, Sydney, Australia. (https://escholarship.org/uc/item/79t028n8)

Project Description

This project aims to model an infant statistical language learner where the learner agent is utilizing two mechanisms including unsupervised speech pattern discovery and association of co-ocurring speech and visually perceived scene. Accordingly, the model is an unsupervised speech representation learning artificial neural network that is trained using on unlabelled speech and audiovisual data. The learning mechanism include a wav2vec 2.0 model for unsupervised speech learning and Transformer-based visually grounded speech model for audio-visual associative learning. The two mechanism are applied individually and in combination exploring various possible learning scenarios. After training, language capabilities of the model are evaluated in terms of semantic audiovisual mapping using speech and visual embeddings as well as phonemic and lexical category recognition using activation patterns of hidden layers of the speech encoder block.

Model Source

This project's model is based on the work from the following repository:

https://github.com/jasonppy/FaST-VGS-Family

To train the model, please download the code from the above repository and follow the provided instructions. Additionally, please ensure that you give credit to the creators for their contributions to the model.

For setting the weight of the audio (SSL) and audio-visual (VGS) losses (i.e., alpha coefficient), you need to modify the "weight_loss" function within the "steps/trainer.py" file of the model's source code.

Model Description

The VGS+ model combines a wav2vec 2.0-based speech self-supervised learning (SSL) and a transformer-based visually grounded speech (VGS) learning mechanisms within one model. It has shown that the speech representations obtained from hidden layers of the trained VGS+ model contain phonemic and lexical information.

How to Use

Phoneme discrimination score

"abx.py" provides the speech representations of a given hidden layer (of the speech encoder and decoder) for ABX task. Please first modify the path to the input data (test audio files) and the path to save the output data (speech embeddings).

For measuring the ABX phoneme discrimination score, please follow the instructions in following repository:

https://github.com/zerospeech/zerospeech2021

You can use the template provided at "abx.sh" to obtain ABX score for different layers (0:11) of any specific model (specified by the path to the bundle file of the model).

For test data, you need to download dev-clean subset of LibriSpeech data from https://www.openslr.org/12 .

Lexicon discrimination score

"lexical.py" provides the speech representations of a given hidden layer (of the speech encoder and decoder) for lexical task. Please first modify the path to the input data (test audio files) and the path to save the output data (speech embeddings).

For measuring the lexical score, please follow the instruction in the following repository:

https://github.com/SPEECHCOG/CDI_lextest

You can use the template provided at "lexica.sh" to obtain lexical score for different layers (0:11) of any specific model (specified by the path to the bundle file of the model).

Name		Name	Last commit message	Last commit date
Latest commit History 1,632 Commits
datasets		datasets
models		models
other		other
pics		pics
scripts		scripts
steps		steps
.gitignore		.gitignore
README.md		README.md
abx.py		abx.py
abx_fb.py		abx_fb.py
abx_fb.sh		abx_fb.sh
abx_long_template.sh		abx_long_template.sh
abx_short_temp.sh		abx_short_temp.sh
abx_short_template.sh		abx_short_template.sh
lex_short.sh		lex_short.sh
lex_short_cls.sh		lex_short_cls.sh
lexical-cls.py		lexical-cls.py
lexical.py		lexical.py
plot_abx.py		plot_abx.py
plot_cogsci.py		plot_cogsci.py
plot_events.py		plot_events.py
plot_lex.py		plot_lex.py
read_abx.py		read_abx.py
read_lex.py		read_lex.py
read_recalls.py		read_recalls.py
requirements.txt		requirements.txt
run_flickr8k.py		run_flickr8k.py
run_places.py		run_places.py
run_spokencoco.py		run_spokencoco.py
semRCNN.py		semRCNN.py
semtest.sh		semtest.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VGS_seq_or_parallel_CogSci

Project Description

Model Source

Model Description

How to Use

Phoneme discrimination score

Lexicon discrimination score

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VGS_seq_or_parallel_CogSci

Project Description

Model Source

Model Description

How to Use

Phoneme discrimination score

Lexicon discrimination score

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages