ENGtoID
English to Indonesian seq2seq translation model implementation for the final 2024 ICT303 assignment.
TLDR Summary
This project contains code which
- fetches a dataset
- filters out bad data
- tokenizes the data
- saves language token vocabularies
- trains an LSTM seq2seq model on the data with optional teacher forcing
You may view the final notebook here