ATRS

Official implementation of:

Lim, H., Li, X., Park, S., Li, Q., & Kim, J. (2026). Reducing contextual noise in review-based recommendation via aspect term extraction and attention modeling. Information Sciences, 735, 123078. Paper

Overview

This repository is the official implementation of ATRS (Aspect Term-aware Recommender System), published in Information Sciences (2026).

Most review-based recommendation models process entire review bodies indiscriminately, allowing aspect-relevant signal to be diluted by surrounding context. ATRS addresses this by routing review text through a dedicated Aspect Term Extraction (ATE) stage that filters out non-aspect content before downstream encoding.

The retained aspect terms are encoded with a 1D-CNN over Word2Vec embeddings, fused with user/item ID embeddings, and passed through a self-attention block to form aspect-aware user and item representations. These are concatenated and forwarded to an MLP that predicts a continuous rating score as a regression target. Quantitative comparisons against representative recommendation baselines on Amazon and Yelp datasets are reported in Experimental Results.

Repository Structure

├── data/
│   ├── raw/                        # Source datasets — place {fname}.{raw_ext} here
│   ├── processed/                  # Pipeline parquet caches (preprocessed / aspects)
│   ├── ate_output/                 # PyABSA workspace + extraction JSON
│   │   └── .pyabsa/                # Contained pyabsa CWD: checkpoints/, checkpoints.json, result JSON
│   └── ATRS Architecture.png
│
├── model/
│   ├── atrs.py                     # ATRS architecture, trainer, and predictor
│   └── save/                       # Best checkpoint per dataset (best.pth)
│
├── src/
│   ├── config.yaml                 # Single source of truth for all hyperparameters
│   ├── data_processing.py          # DataProcessor pipeline + RecommenderDataset + DataLoader factory
│   ├── aspect_extraction.py        # ATExtractor — PyABSA wrapper for aspect term extraction
│   ├── preprocessing.py            # Review text cleaning + k-core filter
│   ├── path.py                     # Project path constants (auto-creates runtime folders)
│   └── utils.py                    # Metrics, parquet/yaml/seed helpers, gz loader
│
├── main.py                         # Entry point: data preparation → train → test
├── requirements.txt
├── README.md
└── .gitignore

Model Description

ATRS consists of two sequential modules. The full architecture is illustrated below.

1. Aspect Term Extraction Module

A pretrained Transformer encoder (PyABSA's English ATE checkpoint, FAST-LCF-ATEPC over DeBERTa-v3-base) reads each cleaned review and emits BIO-tagged aspect terms. Per-row aspect lists are then aggregated into per-user and per-item aspect sets, which become the inputs to the RS module.

Implementation: src/aspect_extraction.py, invoked from src/data_processing.py.

2. Recommender System Module

Each user and item aspect set is tokenized over a Word2Vec-trained vocabulary, encoded by a 1D-CNN (AspectEncoder), and concatenated with a learned ID embedding. The fused vector is projected and passed through a multi-head self-attention + FFN block (SelfAttentionBlock, Eqs 5–10) to yield aspect-aware user (F_u) and item (F_v) representations. Their concatenation is fed to an MLP regressor that outputs the predicted rating (Eqs 11–12).

Implementation: AspectEncoder, SelfAttentionBlock, ATRS.regressor in model/atrs.py.

How to Run

Configuration

All hyperparameters live in src/config.yaml — it is the single source of truth. Defaults reproduce the paper experiments.

The torch==2.3.1+cu121 / torchvision==0.18.1+cu121 wheels in requirements.txt target an RTX 3080 Ti (CUDA 12.1). A CUDA-capable GPU is required — main.py raises RuntimeError if no CUDA device is detected.

End-to-end run from a fresh checkout:

conda create -n atrs python=3.11
conda activate atrs
pip install -r requirements.txt
python main.py

Data Preparation

Place the dataset as data/raw/{fname}.{raw_ext} where {fname} and {raw_ext} match data.fname / data.raw_ext in config.yaml.

Required columns in raw JSONL: user_id, parent_asin, text, rating (an aspect column with pre-extracted terms is optional — if present, the ATE stage is skipped)

The pipeline writes two cached artifacts under data/processed/ plus the final model checkpoint. On re-run, any artifact already on disk is reused as-is — to invalidate, delete the file. The train/test split, Word2Vec embeddings, and sequence padding are rebuilt in memory on every run.

{fname}_preprocessed.parquet — after text cleaning and k-core filter: raw columns + clean_text (HTML/URL-stripped, lowercased, contractions-expanded, stopwords-removed, lemmatized review body)

{fname}_aspects.parquet — after PyABSA aspect extraction and per-user/item aggregation: preprocessed columns + aspect (per-row term list), user_aspect_set (flattened concatenation per user), item_aspect_set (flattened concatenation per item)

Re-runs and caching

On every call to python main.py, the pipeline auto-skips any cache layer already on disk (aspects → preprocessed → raw). The train/test split, Word2Vec, and sequence padding always run fresh in memory — so changes to test_size, random_state, val_ratio, aspect_length_percentile, or w2v_* take effect immediately on the next run. Only k_core requires manually deleting {fname}_preprocessed.parquet to re-trigger the upstream filter.

PyABSA's ./checkpoints.json and ./checkpoints/ directory are hardcoded CWD-relative inside the library; ATRS routes them under data/ate_output/.pyabsa/ via a chdir context so they don't pollute the project root.

Experimental Results

ATRS was evaluated on three real-world review datasets: Musical Instruments, Video Games, and Yelp (Pennsylvania). The results demonstrate that ATRS consistently outperforms representative baselines across all evaluation metrics, achieving average improvements of 19.54% in MAE and 11.89% in RMSE.

Model	Musical Instruments				Video Games				Yelp
Model	MAE	MSE	RMSE	MAPE	MAE	MSE	RMSE	MAPE	MAE	MSE	RMSE	MAPE
PMF	1.306	2.640	1.625	35.034	1.220	2.407	1.551	33.948	1.276	2.803	1.674	38.330
NCF	1.174	1.705	1.306	35.401	0.948	1.331	1.154	35.032	1.085	1.674	1.294	39.320
DeepCoNN	0.786	1.137	1.067	29.931	0.847	1.263	1.124	32.850	0.937	1.381	1.175	38.276
NARRE	0.767	0.993	0.997	29.459	0.776	1.173	1.083	30.518	0.886	1.212	1.101	36.724
AENAR	0.665	0.970	0.985	27.193	0.693	1.002	1.001	28.039	0.845	1.177	1.085	35.605
SAFMR	0.705	0.975	0.987	28.388	0.711	1.033	1.016	30.016	0.881	1.229	1.109	36.076
MFNR	0.708	0.965	0.982	26.922	0.730	0.980	0.990	27.863	0.855	1.174	1.084	33.923
ATRS (Proposed)	0.640	0.933	0.966	26.638	0.646	0.970	0.985	27.537	0.832	1.163	1.078	34.917

Citation

If you use this repository in your research, please cite:

@article{LIM2026123078,
  title = {Reducing contextual noise in review-based recommendation via aspect term extraction and attention modeling},
  author = {Heena Lim and Xinzhe Li and Seonu Park and Qinglong Li and Jaekyeong Kim},
  journal = {Information Sciences},
  volume = {735},
  pages = {123078},
  year = {2026},
  doi = {10.1016/j.ins.2026.123078}
}

Contact

For research inquiries or collaborations, please contact:

Seonu Park Ph.D. Student, Department of Big Data Analytics Kyung Hee University Email: sunu0087@khu.ac.kr

Qinglong Li Assistant Professor, Division of Computer Engineering Hansung University Email: leecy@hansung.ac.kr

Last updated: April 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ATRS

Overview

Repository Structure

Model Description

1. Aspect Term Extraction Module

2. Recommender System Module

How to Run

Configuration

Data Preparation

Re-runs and caching

Experimental Results

Citation

Contact

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ATRS

Overview

Repository Structure

Model Description

1. Aspect Term Extraction Module

2. Recommender System Module

How to Run

Configuration

Data Preparation

Re-runs and caching

Experimental Results

Citation

Contact