Skip to content

Hub-Tian/CogRail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CogRail: Benchmarking VLMs for Cognitive Railway Intrusion Perception

Project Overview

CogRail is the first multimodal benchmark and open-source framework dedicated to cognitive railway intrusion perception, built on real-world surveillance scenes with cognition-driven, multi-dimensional instruction-level annotations (the CogRail dataset). It supports spatio-temporal reasoning, motion prediction, and threat assessment for objects of interest (OOIs) in railway environments. The project integrates visual question-answer annotations with expert-defined threat semantics and leverages instance synthesis to enhance data diversity while maintaining consistent label space across subsets. CogRail systematically evaluates state-of-the-art vision-language models (such as Qwen-VL and LLaVA) in railway scenarios, revealing their strengths and limitations in complex spatio-temporal reasoning. It also introduces the RAILGPT multi-task fine-tuning framework, which combines visual prompts, textual instructions, and specialized agents to optimize cognitive capabilities across position awareness (CogRailPos), motion prediction (CogRailMove), and threat analysis (CogRailThreat) tasks. After joint fine-tuning, RAILGPT achieves an 18.6% F1 improvement on the threat analysis task, demonstrating the effectiveness of structured multi-task learning in safety-critical scenarios and providing a complete benchmark toolchain for both research and engineering applications. You can view our paper at https://arxiv.org/abs/2601.09613


Key Contributions (Highlights)

  • First CogRail Benchmark: Integrates open-source surveillance scenarios with cognition-driven question-answer annotations, supporting spatio-temporal reasoning and intrusion risk prediction.
  • Systematic Evaluation of Representative VLMs: Reveals model strengths and weaknesses in cognitive railway scenarios.
  • Multi-task Joint Fine-tuning (RAILGPT): Employs visual prompts + textual prompts + dedicated agents to significantly enhance accuracy and interpretability.

✨ Benchmark

CogRail systematically evaluates vision-language models in railway intrusion perception scenarios. It defines three core tasks and provides unified annotations and synthetic data diversity.

Three Core Tasks

  • CogRailPos (Spatial Awareness): Determine OOI location relative to railway infrastructure.
  • CogRailMove (Motion Prediction): Predict threat level of movement.
  • CogRailThreat (Threat Assessment): Integrate spatial + motion info to assess threat.

Dataset Sources & Labels

  • Sources:
  • Labels:
  • Unified Label Space The CogRail dataset contains two main folders: Cog-MRSI/ and Cog-RailSem19/. Each folder has a training set (train) and a test set (test).

CogRail Dataset Construction Pipeline

Dataset Pipeline

Threat Level Distribution & Object Composition

Statistics

📊 CogRail Dataset

Our projects can be accessed at: https://huggingface.co/datasets/BITZhangqy/Cog-Rail/

RAILGPT Multi-Task Learning Architecture

Framework


📈 Experimental Results

Performance Comparison among SOTA VLMs on CogRail averaged on different Prompt types and sub-datasets

Performance

Performance(F1) Comparison on Type-I Visual Prompt in Cog-MRSI dataset via Individual Fine-tuning

Type-I MRSI

Performance(F1) Comparison on Type-II Visual Prompt in Cog-MRSI dataset via Individual Fine-tuning

Type-II MRSI

Performance(F1) Comparison on Type-I Visual Prompt in Cog-RailSem19 dataset via Individual Fine-tuning

Type-I RailSem19

Performance (F1) Comparison on Type-II Visual Prompt in Cog-RailSem19 dataset via Individual Fine-tuning

Type-II RailSem19

Citation

If you find our work helpful in your research, please consider citing:

@misc{tian2026cograilbenchmarkingvlmscognitive,
      title={CogRail: Benchmarking VLMs in Cognitive Intrusion Perception for Intelligent Railway Transportation Systems}, 
      author={Yonglin Tian and Qiyao Zhang and Wei Xu and Yutong Wang and Yihao Wu and Xinyi Li and Xingyuan Dai and Hui Zhang and Zhiyong Cui and Baoqing Guo and Zujun Yu and Yisheng Lv},
      year={2026},
      eprint={2601.09613},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.09613}, 
}

About

Cognitive Railway Intrusion Perception

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages