Skip to content

Comments

Feat/adapter#4

Open
peradura wants to merge 13 commits intomainfrom
feat/adapter
Open

Feat/adapter#4
peradura wants to merge 13 commits intomainfrom
feat/adapter

Conversation

@peradura
Copy link
Member

  • Add 'restoration_training.ipynb' for knowledge distillation on the adapter model.
  • Review and verify model saving and adapter attachment logic in 'usage.ipynb'.
  • Refactor and modify the Adapter class in 'adapter.py' to improve training stability.

3GID and others added 6 commits July 1, 2025 14:31
Introduces a Trainer utility class for model training and evaluation, and a dataloader for the OpenAssistant/oasst1 dataset with conversation extraction and tokenization. Updates RetentionEngine to support adaptation via training on OASST data using the new utilities.
Replaces the OASST dataloader with a new PG19 dataloader under retentionengine/datasets, removing the old dataloader. Updates RetentionEngine and Adapter to use the new dataloader and support distillation training with PG19. Refactors training logic to use the new Adapter class and updates training parameters for long document handling.
- Fix redundant forward passes in epoch methods
- Unify data processing to eliminate code duplication

Breaking change: Step methods now return (loss, predictions, targets)"
restoration_training.ipynb: Implement restoration training notebook for adapter model.

usage.ipynb: Review and verify adapter usage logic.

adapter.py: Refactor and enhance Adapter class for stability.
@peradura peradura changed the title Feat/adapte Feat/adapter Aug 31, 2025
- Add `generate_dataset.py` to create a `.jsonl` dataset for memory training.
- Update `restoration_training.ipynb` to load and use the generated dataset, modifying the data loader and tokenizer.
adapter.py:
- Replace `torch.autocast` with `transformer_engine` to correctly apply FP8 to the teacher model.
- Add `convert_to_fp8_layers` helper and convert teacher model layers in `__init__`.

restoration_training.ipynb:
- Import the `transformer_engine` library.
- Update the testing loop to use `te.fp8_autocast` for the teacher model's inference to ensure consistency with the training environment
- attach base model input embeddings to Titans module for consistency
- add support for trimming leading transformer layers based on size limit
- refine config merge logic to replace with resized model config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants