Decoder Architecture

Thank you for this nice work.

in the decoder you concatenate stages 2, 3 and 4 from the encoder, taking tiny model as an example the number of channels will be 64+160+256=480, while decoder dimension is 256 so we need channel mixing of conv1x1 with input 480 and output 256 channels but that doesnt seem to be the case in your released weights 

<img width="527" height="990" alt="Image" src="https://github.com/user-attachments/assets/8d0f3a22-b8a4-4fec-97f7-ac644b8bb680" />

other thing which matrix decomposition did you adopt in hamburger decoder, also to make sure the head in the decoder is a hamburger head. What are its parameters? and in the final mlp is it only one layer to map to the number of classes or has several hidden layers?

<img width="358" height="253" alt="Image" src="https://github.com/user-attachments/assets/f62c8263-cc8b-4eaa-aee1-ef9200312f47" />

Once again, thank you for this amazing work and for your effort.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoder Architecture #76

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Decoder Architecture #76

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions