Hi,
If my understanding is correct have noticed that when dealing with pdf the process is roughly this:
- each page is converted to image at 200dpi (the default of pdf2image)
- it gets then resized to max_img_size x max_img_size
Wouldn't this process possibly greatly reduce the quality even if max_img_size is set very big?
also the resize also alters the aspect ratio of the images. Is this done on purpose?
thanks
Here the relevant code bits:
|
img = Image.open(file_path) |
|
img = img.resize((max_img_size, max_img_size)) |
|
file_paths = convert_files_to_images(file_paths) |
|
resize_images(file_paths, max_img_size) |
|
return convert_from_path(file_path) |
Hi,
If my understanding is correct have noticed that when dealing with pdf the process is roughly this:
Wouldn't this process possibly greatly reduce the quality even if max_img_size is set very big?
also the resize also alters the aspect ratio of the images. Is this done on purpose?
thanks
Here the relevant code bits:
docext/docext/core/utils.py
Lines 51 to 52 in 668eee6
docext/docext/core/pdf2md/pdf2md.py
Lines 86 to 87 in 668eee6
docext/docext/core/file_converters/pdf_converter.py
Line 14 in 668eee6