A library to encode text as DNA and decode DNA to text.
GeneSpeak allows you to encode regular text as DNA using
base-pairs (A, T, G, C) and convert back to the
original text. Text encoding is done for both ascii and
utf-8 characters based on the strategy keyword argument.
The encoding scheme could be any combination of A, T, G, C.
You can install the library via pip or conda.
Install with pip
pip install genespeakInstall with conda
conda install -c conda-forge genespeakSee the quickstart guide here.
| Service | Link/Badge |
|---|---|
| Colab | |
| Binder | |
| SageMaker StudioLab | |
| Deepnote | |
| Kaggle |
You can play around with GeneSpeak in this streamlit app: https://tinyurl.com/genespeak-demo
import genespeak as gp
print(f'{gp.__name__} version: {gp.__version__}')
schema = "ATCG" # (1)
strategy = "ascii" # (2)
text = "Hello World!"
dna = gp.text_to_dna(text, schema=schema)
text_from_dna = gp.dna_to_text(dna, schema=schema)
print(f'Text: {text}\nEncoded DNA: {dna}\nDecoded Text: {text_from_dna}\nSuccess: {text == text_from_dna}')Output
genespeak version: 0.0.5
Text: Hello World!
Encoded DNA: TACATCTTTCGATCGATCGGACAATTTGTCGGTGACTCGATCTAACAT
Text: Hello World!
Encoded DNA: TACATCTTTCGATCGATCGGACAATTTGTCGGTGACTCGATCTAACAT
Decoded Text: Hello World!The genespeak docs are maintained here.
The library is available under MIT license.
You may cite this library as follows.
@software{ray2022genespeak,
author = {Ray, Sugato},
title = {GeneSpeak - A library to encode text as DNA and decode DNA to text},
url = {https://github.com/sugatoray/genespeak},
doi = {10.5281/zenodo.5885777},
month = {1},
year = {2022}
}Let's have some fun! β¨ The following is a GeneSpeak thumbprint of genespeak itself.
| schema | strategy | thumbprint |
|---|---|---|
ATCG |
ascii |
TCTGTCTTTCGCTCTTTGAGTGAATCTTTCATTCCG |
Includes health and security badges from:
- Sonarcloud
- OSSF Code Quality
