Residual head connection #47

contentis · 2023-09-21T12:19:04Z

contentis
Sep 21, 2023

Hello World!

Upfront disclaimer: I'm no LLM researcher.

Looking at the additional heads, I'm wondering if the model could benefit from having a residual connection from head N to N+1. Given that token N+1 strongly depends on token N, I expect the accuracy to improve, especially for an increasing number of heads.

In its easiest form:

medusa_logits.append(self.base_model.lm_head(hiden_states))
for i in range(self.medusa):
     medusa_logits.append(self.medusa_head[i](medusa_logits[i] + hidden_states))

leeyeehoo · 2023-09-22T02:37:52Z

leeyeehoo
Sep 22, 2023
Maintainer

Agree. We have ongoing discussions inside the team about this. However, it might take some time for us to figure out a better structure. You can check the discussion in the closed PR

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Residual head connection #47

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Residual head connection #47

Uh oh!

contentis Sep 21, 2023

Replies: 1 comment

Uh oh!

leeyeehoo Sep 22, 2023 Maintainer

contentis
Sep 21, 2023

leeyeehoo
Sep 22, 2023
Maintainer