KeyFiltering.py

This Python script is designed for filtering and performing statistical analysis on scientific articles. It processes a given Excel file to filter text based on specified keywords and generates comprehensive statistical reports.

Cite

KARA, B. C. (2025). keyFinder - Filtering and Analyzing Scientific Articles Based on Specific Keywords (1.0.0). Zenodo. https://doi.org/10.5281/zenodo.15167425

Features

Keyword Filtering:
- Filters the Title (TI), Abstract (AB), and Keywords (DE) columns based on predefined keywords.
- Allows specifying a list of keywords to exclude during the filtering process.
Data Cleaning and Categorization:
- Removes duplicate entries based on the DI column.
- Normalizes and categorizes the SC column, reporting the number of articles in each unique category.
Detailed Match Analysis:
- Tracks which columns and rows match specific keywords.
- Highlights matching keywords and saves detailed match data.
Statistical Reporting:
- Saves filtered data, match details, keyword statistics, and category counts in separate Excel files for further analysis.
Advanced Analytics:
- Calculates the percentage of filtered and unique records.
- Provides insights into unique keywords in the DE column, excluding specified keywords.

Input Files

Terms/Surveying_Methods.txt: Contains the list of keywords for filtering.
Terms/Surveying_Methods_Exclude.txt: Specifies keywords to exclude from the filtering process.
Data/WOS+SCP_Raw.xlsx: The Excel file containing the raw dataset for processing.

Output Files

Generated outputs are saved in the Result/Istatistic folder:

filtered_keywords_data_full.xlsx: Filtered dataset after applying keyword filters.
match_details.xlsx: Detailed keyword matches, including highlighted text.
match_statistics.xlsx: Statistical breakdown of keyword matches in each column.
de_keywords_statistics.xlsx: Unique keywords from the DE column, excluding specified keywords.
category_counts_statistics.xlsx: Counts of articles in each unique category from the SC column.

How to Use

Place the required input files in the specified directories.
Run the script: python KeyFiltering.py.
Check the Result/Istatistic folder for the output files.

Dependencies

Python 3.x
Libraries:
- pandas
- re
- os

Install dependencies via pip:

pip install pandas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyFiltering.py

Cite

Features

Input Files

Output Files

How to Use

Dependencies

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

KeyFiltering.py

Cite

Features

Input Files

Output Files

How to Use

Dependencies