Conversation
one-optimize issue
circle2circle: /home/seongwoo/ONE/compiler/luci/lang/src/Nodes/CircleConst.cpp:48: typename loco::DataTypeImpl<DT>::Type& luci::CircleConst::at(uint32_t) [with loco::DataType DT = loco::DataType::U4; typename loco::DataTypeImpl<DT>::Type = unsigned char; uint32_t = unsigned int]: Assertion `n < size<DT>()' failed.
[1] 2252871 abort (core dumped) ~/ONE/build/compiler/circle2circle/circle2circle decoder_layer.q.circleRMSNorm issueThis is not related with this PR but current main branch got errors. Seems that this was because of circle2circle: ERROR: Optimized graph is invalidWhen I tested this issue with |
|
@mhs4670go i managed to run |
|
After the patch, the error in the main branch has been resolved. But, current branch still got an error. circle2circle: /home/seongwoo/ONE/compiler/luci/lang/src/Nodes/CircleConst.cpp:47: typename loco::DataTypeImpl<DT>::Type& luci::CircleConst::at(uint32_t) [with loco::DataType DT = loco::DataType::FLOAT32; typename loco::DataTypeImpl<DT>::Type = float; uint32_t = unsigned int]: Assertion `dtype() == DT' failed.The error happened in Turns out that |
@mhs4670go |
|
@stamalakhov Ah, seems that below is a problem.
Some passes hasn't considered quantized inputs. Maybe this is the case. I'll update the code soon. |
| attn_weights = torch.cat(attn_weights_parts, dim=1) | ||
| attn_out_h = torch.cat(attn_out_parts, dim=1) |
There was a problem hiding this comment.
@mhs4670go
I believe attn_weights should be quantized also. Currently they are producing floats. Or is it intended?
There was a problem hiding this comment.
@stamalakhov Ah, I missed it. Thanks for the letting me know!
|
I am observing a significant increase in PEIR after introducing the changes. Even though I ran it with float RMSNorm. python tico/quantization/wrapq/examples/quantize_llama_decoder_layer.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.037503
│ PEIR : 60.925078 %
└──────────────────────────────────────────────────────
┌────────────────────────────────────────────┐
20.6┤ │
│ • │
│ │
15.7┤ │
│ │
│ │
│ │
10.8┤ │
│ • │
│ │
6.0┤ •• │
│ • │
│ • │
│ │
1.1┤ • │
│ •• │
│ │
-3.8┤ │
│ • │
│ • • • │
│ • │
-8.6┤ │
└┬──────────┬──────────┬─────────┬──────────┬┘
-8.6 -1.3 6.0 13.3 20.6 |
|
@stamalakhov FYI, RMSNorm seems to have a similar problem. After applying QuantRMSNorm, even though I set to int16, it has high PEIR. |
@mhs4670go |
| L = hidden_states.size(1) | ||
| attention_mask = self._slice_causal(L, hidden_states.device) | ||
|
|
||
| if position_embeddings is None: |
There was a problem hiding this comment.
@mhs4670go I believe these ones should be used unconditionally, otherwise LlamaModel will send its own position_embeddings and they will be desynchronized with _rot(). IMHO
There was a problem hiding this comment.
@mhs4670go This seems to be the problem. Position embeddings are wrong, they don't have -[]. Moreover right now seq_len is hardcoded to 256, so the whole sequence should be padded to 256.
There was a problem hiding this comment.
@mhs4670go This seems to be the problem. Position embeddings are wrong, they don't have -[]. Moreover right now seq_len is hardcoded to 256, so the whole sequence should be padded to 256.
pads = torch.zeros(ids["input_ids"].shape[0], model.config.max_position_embeddings - ids["input_ids"].shape[1], dtype=ids["input_ids"].dtype)
for j in range(model.config.max_position_embeddings - ids["input_ids"].shape[-1]):
pads[0, j] = tokenizer.pad_token_id
ids["input_ids"] = torch.cat((ids["input_ids"], pads), dim = 1)
or in some other way, but sequence length should be 256.
There was a problem hiding this comment.
@mhs4670go ids should be padded to 256. may be this way:
pads = torch.zeros(ids["input_ids"].shape[0], model.config.max_position_embeddings - ids["input_ids"].shape[1], dtype=ids["input_ids"].dtype)
for j in range(model.config.max_position_embeddings - ids["input_ids"].shape[-1]):
pads[0, j] = tokenizer.pad_token_id
ids["input_ids"] = torch.cat((ids["input_ids"], pads), dim = 1)
or in some other way.
|
@mhs4670go which seems to be valid. |
|
After some bug fix according to @stamalakhov 's suggestion, it resolved the error. Thanks a lot! ┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.020452
│ PEIR : 1.983250 %
└──────────────────────────────────────────────────────
┌───────────────────────────────────────────┐
22.5┤ │
│ • │
│ │
16.7┤ • │
│ │
│ │
│ │
10.9┤ │
│ │
│ • │
5.1┤ •• │
│ • │
│ │
│ •• │
-0.6┤ •• │
│ │
│ • │
-6.4┤ • │
│ • │
│ • │
│ • │
-12.2┤ │
└┬──────────┬─────────┬──────────┬─────────┬┘
-12.2 -3.5 5.1 13.8 22.5 |
This commit unrolls GQA and remove neg ops. TICO-DCO-1.0-Signed-off-by: seongwoo <mhs4670go@naver.com>
This commit rewrites the wrapper for below.
The changes makes exported graph hardware-friendly.
TICO-DCO-1.0-Signed-off-by: seongwoo mhs4670go@naver.com