Universal self-decoder for efficient, memory-constant depth scaling in transformers.
DepthScale implements a Universal Self-Decoder framework designed to enhance the reasoning capabilities of transformer models while maintaining constant memory overhead. This framework achieves improved multi-step logical reasoning accuracy by employing parameter-shared iterative reasoning.
The core innovation lies in the parameter-sharing scheme, where the same transformer weights are recursively applied across multiple reasoning iterations. This approach ensures semantic coherence throughout the refinement process via specialized attention mechanisms and careful memory management.
pip install depth_scale_yofrom universal_yoco.yoco_base import UniversalSelfDecoder
import torch
# Initialize the decoder with appropriate configuration
decoder = UniversalSelfDecoder(config_params)
# Run the iterative reasoning process
output = decoder.process_input(initial_prompt)
print(output)The UniversalSelfDecoder implements the core mechanism where transformer weights are reused across multiple reasoning steps. This drastically reduces the memory footprint compared to standard iterative fine-tuning methods.
# Example of applying the shared decoder layer
decoder_layer = UniversalSelfDecoder.apply_layer(input_tensor, weights)
refined_output = decoder_layer(input_tensor)By sharing parameters, DepthScale allows for deep, multi-step reasoning without the memory explosion typically associated with increasing model depth or iteration count.
# Demonstrating memory efficiency during scaling
memory_usage = calculate_memory(decoder)
print(f"Memory usage remains constant: {memory_usage}")The architecture revolves around the UniversalSelfDecoder class, which orchestrates the iterative refinement process. It relies on yoco_base.py for the core parameter-sharing logic and types.py for defining the necessary data structures and configurations.
The flow is: Input UniversalSelfDecoder (Iteration 1) UniversalSelfDecoder (Iteration 2) yoco_base.py.
graph TD
A[Input Data] --> B{UniversalSelfDecoder};
B --> C{Parameter Sharing Logic};
C --> D[Transformer Weights];
D --> B;
B --> E{Refinement Step};
E --> F{Convergence Check};
F -- Not Converged --> B;
F -- Converged --> G[Final Output];
The main class managing the iterative process.
Signature: __init__(self, config_params: dict)
Description: Initializes the self-decoder with specific configuration parameters.
Signature: process_input(self, initial_prompt: torch.Tensor) -> torch.Tensor
Description: Executes the full iterative reasoning process on the input tensor.
Contains the low-level implementation of the shared transformer layer.
Signature: apply_layer(input: torch.Tensor, weights: torch.Tensor) -> torch.Tensor
Description: Applies the shared transformer layer weights to the current state.
This implementation is derived from research exploring efficient reasoning augmentation in large language models. It builds upon concepts of iterative refinement and parameter efficiency in deep learning architectures. For the foundational concepts, please refer to related work on structured reasoning and memory-efficient transformers.
Tests are available in the repository, covering core functionality and parameter handling.
Contributions are welcome! Please see the contribution guidelines in the repository for details on submitting pull requests.
[Citation details would be added here if available]
This project is licensed under the MIT License - see the LICENSE file for details.
