Releases · scribeocr/scribe.js · GitHub

15 Nov 07:28

Balearica

v0.9.3 Latest

Latest

What's Changed

Fixed bug causing text layer in PDF exports to be broken (#58)
- This issue impacts all PDFs created with two patch releases from the last ~week (0.9.1 and 0.9.2). Anybody using those versions should update ASAP.

Full Changelog: v0.9.2...v0.9.3

Assets 2

14 Nov 04:42

Balearica

v0.9.2

What's Changed

Fixed bug causing crash on single-core systems (#56)
Updated scribe.opt.workerN option to cap workers created for PDF rendering

Full Changelog: v0.9.1...v0.9.2

Assets 2

07 Nov 07:28

Balearica

v0.9.1

What's Changed

Various updates to experimental and debugging-related features.
- None of the documented features should change with this release.

Full Changelog: v0.9.0...v0.9.1

Assets 2

08 Sep 08:15

Balearica

v0.9.0

What's Changed

Added URW Gothic font
Added Deno support
Updated .html export format
- This format contains a .html file that should closely resemble the original document.
- This should be useful for converting .pdf files to a format that can be displayed natively in the browser.
Added experimental .txt import format
- For obvious reasons, importing .txt files will not work with most operations.
- This mode is currently exclusively useful for development/debugging purposes and making basic .pdf files from .txt files.
Performance improvements to PDF exports
Various refactoring and minor updates.

Full Changelog: v0.8.0...v0.9.0

Assets 2

09 Mar 09:39

Balearica

v0.8.0

What's Changed

Added scribe CLI command
- If scribe.js is installed globally (npm i -g scribe.js-ocr), the scribe command can be used to process documents from the command line.
  - For example, scribe recognize analyst_report.png runs OCR on an image and saves the result as a PDF.
- This feature is still experimental and command/argument names and features may change without warning.
Added new intermediate data format .scribe for storing and loading document data.
- Given OCR is computationally expensive, it is often desirable to save results for later use without losing data.
- By saving results to .scribe files, results can be re-loaded later (e.g. to export with slightly different settings).
  - While several other output formats can be re-loaded later (notably .hocr and .pdf), only .scribe can be re-loaded without any data being lost in the export/import process.
  - .scribe files only contain the text layer; they do not contain embedded images or PDF files.
    - .scribe files can be loaded alongside image/PDF files to restore both image and text data.

Full Changelog: v0.7.4...v0.8.0

Assets 2

03 Mar 08:08

Balearica

v0.7.4

What's Changed

Fixed bug causing crash for certain PDF input documents.
Added support for bold + italic style (previously only bold or italic style)
Added support for underline style.
- Underlined text is currently detected automatically when importing a text-native PDF or Abbyy XML file.
Disabled ligatures by default.
- To re-enable, set scribe.opt.ligatures to true.

Full Changelog: v0.7.3...v0.7.4

Assets 2

03 Mar 08:02

Balearica

v0.7.3

What's Changed

Updated HTML export to support Node.js

Full Changelog: v0.7.2...v0.7.3

Assets 2

20 Feb 04:25

Balearica

v0.7.2

What's Changed

Added HTML output format (browser only).
- This implementation is still preliminary; the implementation may change substantially in future versions.
Standardized fonts and font names

Full Changelog: v0.7.1...v0.7.2

Assets 2

09 Feb 19:46

Balearica

v0.7.1

What's Changed

Standardized fonts and font names

Full Changelog: v0.7.0...v0.7.1

Assets 2

07 Jan 08:38

Balearica

v0.7.0

What's Changed

Major rework of PDF export implementation.
- Writing to PDF is faster and uses less memory.
  - Documents that used to crash due to memory errors now run almost instantly.
- For many inputs, output PDF file sizes are now much smaller.
Fixed memory leaks within OCR module.
Misc bug fixes.

Full Changelog: v0.6.1...v0.7.0

Assets 2