fast-aug is a library for fast text augmentation, available for both Rust and Python as fast-aug.
It is designed with focus on performance and real-time usage (e.g. during training), while providing a wide range of text augmentation methods.
Note: x25 times faster than nlpaug!
fast-aug is available on PyPI.
pip install fast-augfrom fast_aug.text import CharsRandomSwapAugmenter
text_data = "Some text!"
augmenter = CharsRandomSwapAugmenter(
0.5, # probability of words selection
0.5, # probability of characters selection
None, # stopwords
)
assert augmenter.augment(text_data) != text_data
assert augmenter.augment_batch([text_data]) != [text_data]TBA
Comparison of the fast-aug library with the other NLP augmentation libraries.
fast-aug- this, Fast Augmentation library written in Rust, with Python bindingsnlpaug- nlpaug - The most popular NLP augmentation libraryfasttextaug- fasttextaug - re-write of somenlpaug's augmenters in Rust with Python bindingsauglynot included as "Our text augmentations use nlpaug as their backbone"augmentynot included as it is too slow (2-8 times slower thannlpaug)
It is end-to-end comparison, including dataset loading, classes initialization and augmentation of all samples (one-by-one or provided as a list).
See ./benchmarks/compare_text.py for details of the comparison.
All libs compared on tweeteval dataset - sentiment test set - 12k samples.
Note: dataset text file size is 1.1Mb, it is included in the memory usage.
Any contribution is warmly welcomed!
Please see the GitHub repository README at fast-aug.