-
-
Notifications
You must be signed in to change notification settings - Fork 22
add support for base64 embedded images #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,7 @@ | |
| from collections import OrderedDict | ||
| import requests | ||
| from io import BytesIO | ||
| import base64 | ||
|
|
||
|
|
||
| # __________________________________________________________________________________________________ | ||
|
|
@@ -543,6 +544,24 @@ def handle_starttag(self, tag, attrs): | |
| except: | ||
| pass | ||
|
|
||
| if attrs[HTML.Attrs.SRC].startswith(("data:image/jpeg;base64,")): | ||
| try: | ||
| image = Image.open( | ||
| BytesIO(base64.b64decode(attrs[HTML.Attrs.SRC][23:].encode("utf-8"))) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion: The code uses hardcoded offsets for slicing the base64 string. Hardcoded offsets are brittle and may fail if the prefix changes or new formats are introduced. Use string splitting to extract the base64 data for better reliability. |
||
| ) | ||
| self.cached_images[attrs[HTML.Attrs.SRC]] = deepcopy(image) | ||
| except: | ||
| pass | ||
|
|
||
| if attrs[HTML.Attrs.SRC].startswith(("data:image/png;base64,", "data:image/gif;base64,")): | ||
|
Comment on lines
+547
to
+556
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion: The code for handling base64 image decoding is duplicated for each image type. Refactor the base64 decoding logic into a shared helper to simplify maintenance and future extensions. Suggested implementation: from collections import OrderedDict
import requests
from io import BytesIO
import base64
from PIL import Image
from copy import deepcopy
def decode_base64_image(src: str) -> "Image.Image|None":
"""
Decodes a base64-encoded image from a data URI.
Supports JPEG, PNG, and GIF formats.
Returns a PIL Image or None if decoding fails.
"""
prefixes = {
"data:image/jpeg;base64,": 23,
"data:image/png;base64,": 22,
"data:image/gif;base64,": 22,
}
for prefix, offset in prefixes.items():
if src.startswith(prefix):
try:
image_data = base64.b64decode(src[offset:].encode("utf-8"))
return Image.open(BytesIO(image_data))
except Exception:
return None
return None image = decode_base64_image(attrs[HTML.Attrs.SRC])
if image is not None:
self.cached_images[attrs[HTML.Attrs.SRC]] = deepcopy(image) |
||
| try: | ||
| image = Image.open( | ||
| BytesIO(base64.b64decode(attrs[HTML.Attrs.SRC][22:].encode("utf-8"))) | ||
| ) | ||
| self.cached_images[attrs[HTML.Attrs.SRC]] = deepcopy(image) | ||
| except: | ||
| pass | ||
|
|
||
| if attrs[HTML.Attrs.SRC] in self.cached_images.keys(): | ||
| image = deepcopy(self.cached_images[attrs[HTML.Attrs.SRC]]) | ||
| elif os.path.exists(attrs[HTML.Attrs.SRC]): | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Bare except statements are used, which can hide unexpected errors.
Catching all exceptions makes it harder to identify and address real issues. Please catch only the relevant exceptions to improve error handling.