Skip to content

bugfix - fixes utf8 exception in detect_paper_name#1492

Open
aw-cloud wants to merge 1 commit intoahrm:developmentfrom
aw-cloud:bugfix_invalid_utf8_detect_paper_name
Open

bugfix - fixes utf8 exception in detect_paper_name#1492
aw-cloud wants to merge 1 commit intoahrm:developmentfrom
aw-cloud:bugfix_invalid_utf8_detect_paper_name

Conversation

@aw-cloud
Copy link
Copy Markdown
Contributor

Fixes #1411
utf8_decode called from detect_paper_name can throw, in particular when given invalid utf8. Adds a try / catch block in the calling function to account for this possibility. Behaviour is that invalid utf8 causes the paper name to be left blank and an error message is logged to stderr.

utf8_decode called from detect_paper_name can throw, in particular when
given invalid utf8. Adds a try / catch block in the calling function to
account for this possibility. Behaviour is that invalid utf8 causes the
paper name to be left blank and an error message is logged to stderr.
@ahrm
Copy link
Copy Markdown
Owner

ahrm commented Oct 27, 2025

Thanks. I am not a huge fan of try/catch, ideally I would like the issue to be handled without exceptions (but I am open to this if there is no other way). Unfortunately I can not reproduce the crash on my system. What happens if we replace utf8_decode with this:

std::wstring utf8_decode(const std::string& encoded_str) {
    return QString::fromUtf8(encoded_str).toStdWString();
}

?
Does it still crash?

@aw-cloud
Copy link
Copy Markdown
Contributor Author

Changing utf8_decode to use the QString library function does stop the crash. It leaves "broken" unicode strings all over the place though, e.g. L"评点本金庸武侠全集 倚天屠龙记 连城决��" is the detected_paper_name from the sample document @ShadiZade provided. I don't know if that could cause problems with the database or other important stuff that relies on utf8_decode.

If using those replacement characters is safe then you can switch to QString::fromUtf8.

@NerdAlert2023
Copy link
Copy Markdown

Couldn't you just throw replacement characters instead?

Shouldn't be any crashes if you do that.

return QString::fromUtf8(buffer, strlen(buffer)).toStdWString();

That way no try or catches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants