Skimlit

Overview

Skimlit is a sequential sentence classification project aimed at analyzing and classifying medical abstracts from the PubMed dataset. By utilizing cutting-edge embeddings, including character, token, and positional encodings, Skimlit demonstrates high-performance models, particularly the Tribrid Pos Char Token Embed model. The models are evaluated on the PubMed 20k and PubMed 200k RCT datasets, highlighting significant advancements in text classification.

Model Performance

PubMed 20k Dataset

Baseline Model:
- Accuracy: 72.18% | Precision: 0.719 | Recall: 0.722 | F1 Score: 0.699
Custom Token Embed Conv1D:
- Accuracy: 78.67% | Precision: 0.783 | Recall: 0.787 | F1 Score: 0.784
Pretrained Token Embed:
- Accuracy: 78.67% | Precision: 0.783 | Recall: 0.787 | F1 Score: 0.784
Custom Char Embed Conv1D:
- Accuracy: 65.18% | Precision: 0.643 | Recall: 0.652 | F1 Score: 0.643
Hybrid Char Token Embed:
- Accuracy: 73.21% | Precision: 0.733 | Recall: 0.732 | F1 Score: 0.730
Tribrid Pos Char Token Embed:
- Accuracy: 83.42% | Precision: 0.834 | Recall: 0.834 | F1 Score: 0.833

PubMed 200k Dataset

Best Model (Tribrid Pos Char Token Embed):
- Accuracy: 87.51% | Precision: 0.876 | Recall: 0.875 | F1 Score: 0.874

Getting Started

Clone the Repository:

git clone https://github.com/your-username/skimlit.git

Navigate to the Project Directory:
```
cd skimlit
```
Set Up Environment: Install required Python packages:
```
pip install -r requirements.txt
```
Train Models: Refer to steps in notebook.
Run Analysis: Review metrics and model results outlined in notebook.ipynb.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Contact

For any questions or feedback, reach out via email.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
all_results_combined_20k.png		all_results_combined_20k.png
final_model.png		final_model.png
helper_functions.py		helper_functions.py
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skimlit

Overview

Model Performance

PubMed 20k Dataset

PubMed 200k Dataset

Getting Started

Contributing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Skimlit

Overview

Model Performance

PubMed 20k Dataset

PubMed 200k Dataset

Getting Started

Contributing

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages