Skip to content

Lumi-node/depth-scale-yo

Repository files navigation

DepthScale

DepthScale

Universal self-decoder for efficient, memory-constant depth scaling in transformers.

License Python Version Tests


DepthScale implements a Universal Self-Decoder framework designed to enhance the reasoning capabilities of transformer models while maintaining constant memory overhead. This framework achieves improved multi-step logical reasoning accuracy by employing parameter-shared iterative reasoning.

The core innovation lies in the parameter-sharing scheme, where the same transformer weights are recursively applied across multiple reasoning iterations. This approach ensures semantic coherence throughout the refinement process via specialized attention mechanisms and careful memory management.


Quick Start

pip install depth_scale_yo
from universal_yoco.yoco_base import UniversalSelfDecoder
import torch

# Initialize the decoder with appropriate configuration
decoder = UniversalSelfDecoder(config_params)

# Run the iterative reasoning process
output = decoder.process_input(initial_prompt)
print(output)

What Can You Do?

Parameter-Shared Iterative Reasoning

The UniversalSelfDecoder implements the core mechanism where transformer weights are reused across multiple reasoning steps. This drastically reduces the memory footprint compared to standard iterative fine-tuning methods.

# Example of applying the shared decoder layer
decoder_layer = UniversalSelfDecoder.apply_layer(input_tensor, weights)
refined_output = decoder_layer(input_tensor)

Memory-Constant Depth Scaling

By sharing parameters, DepthScale allows for deep, multi-step reasoning without the memory explosion typically associated with increasing model depth or iteration count.

# Demonstrating memory efficiency during scaling
memory_usage = calculate_memory(decoder)
print(f"Memory usage remains constant: {memory_usage}")

Architecture

The architecture revolves around the UniversalSelfDecoder class, which orchestrates the iterative refinement process. It relies on yoco_base.py for the core parameter-sharing logic and types.py for defining the necessary data structures and configurations.

The flow is: Input $\rightarrow$ UniversalSelfDecoder (Iteration 1) $\rightarrow$ Refined State $\rightarrow$ UniversalSelfDecoder (Iteration 2) $\rightarrow$ ... $\rightarrow$ Final Output. The parameter sharing ensures that the weights used in each iteration are identical, governed by the logic in yoco_base.py.

graph TD
    A[Input Data] --> B{UniversalSelfDecoder};
    B --> C{Parameter Sharing Logic};
    C --> D[Transformer Weights];
    D --> B;
    B --> E{Refinement Step};
    E --> F{Convergence Check};
    F -- Not Converged --> B;
    F -- Converged --> G[Final Output];
Loading

API Reference

UniversalSelfDecoder

The main class managing the iterative process.

Signature: __init__(self, config_params: dict) Description: Initializes the self-decoder with specific configuration parameters.

Signature: process_input(self, initial_prompt: torch.Tensor) -> torch.Tensor Description: Executes the full iterative reasoning process on the input tensor.

yoco_base.py Functions

Contains the low-level implementation of the shared transformer layer.

Signature: apply_layer(input: torch.Tensor, weights: torch.Tensor) -> torch.Tensor Description: Applies the shared transformer layer weights to the current state.

Research Background

This implementation is derived from research exploring efficient reasoning augmentation in large language models. It builds upon concepts of iterative refinement and parameter efficiency in deep learning architectures. For the foundational concepts, please refer to related work on structured reasoning and memory-efficient transformers.

Testing

Tests are available in the repository, covering core functionality and parameter handling.

Contributing

Contributions are welcome! Please see the contribution guidelines in the repository for details on submitting pull requests.

Citation

[Citation details would be added here if available]

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Universal YOCO transformer with constant-memory recursive attention for efficient depth scaling

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors