Geo-Localization Analysis of Tweets

This project analyzes the spatial distribution of tweets in London using a geo-tagged dataset. It implements a newsworthiness scoring mechanism to filter and examine tweets for their relevance. The project includes data preprocessing, grid-based spatial analysis, and the application of a newsworthiness scoring model to geo-tagged tweets.

Overview

This project:

Organizes tweets into 1km x 1km grids in London.
Develops a newsworthiness scoring method based on high-quality, low-quality, and background tweets.
Analyzes geo-tagged data using the newsworthiness scores to assess the spatial distribution of newsworthy tweets.

Dataset

The dataset comprises geo-tagged tweets from London, organized into several JSON files. It includes separate datasets for background tweets and for high and low-quality tweets used for the newsworthiness model.

Geo-tagged tweets: geoLondonSep2022_*.json
Background tweets: bgQuality.json
High-quality tweets: highQuality.json
Low-quality tweets: lowQuality.json

Installation

To run this project, you need to install the required Python libraries:

pip install -r requirements.txt

requirements.txt:

nltk
geopandas
matplotlib
numpy
pandas
seaborn
shapely

Usage

Navigate to the data directory containing the JSON files.

%cd C:\Users\Simran\Desktop\neccchv\Simran\data\datajson

Run the main script:
```
python geo_localization_analysis.py
```

This script will:

Combine and preprocess tweet data.
Calculate tweet density in 1km x 1km grids.
Apply newsworthiness scoring to the tweets.
Visualize the results.

Methods

1. Grid-Based Spatial Analysis

Compute Haversine Distance: Calculate the distance between two geo-locations.
Grid Dimensions: Determine the number of rows and columns for the grid covering the London area.
Tweet Distribution: Count the number of tweets in each grid cell.

2. Newsworthiness Scoring

Data Preprocessing: Tokenize and remove stopwords from tweets.
Term Frequency Calculation: Compute term and document frequencies.
Likelihood Ratios: Calculate likelihood ratios for terms based on term frequencies in high-quality, low-quality, and background tweets.
Scoring Tweets: Assign newsworthiness scores to tweets based on term likelihood ratios.

3. Analysis and Visualization

Distribution Visualization: Create histograms and heatmaps to visualize tweet distribution and newsworthiness scores.
Statistical Analysis: Compute and visualize statistics of tweet distribution across grid cells.

Visualizations

Distribution of Tweets in Grid Cells:
Heatmap of Tweet Distribution:
Newsworthiness Score Distribution:

Discussion

Spatial Distribution: The tweet density varies significantly across London, with certain areas having a higher concentration of tweets.
Newsworthiness: The newsworthiness score helps filter tweets, identifying those more relevant for analysis. The chosen threshold effectively separates high and low newsworthy tweets, with a reasonable balance between sensitivity and specificity.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin feature-branch).
Create a new Pull Request.

License

This project is licensed under my name - Simran Garg, GIT-https://github.com/Mejorarsim.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Geo Localization Analysis.mp4		Geo Localization Analysis.mp4
README.md		README.md
Web_Science.ipynb		Web_Science.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geo-Localization Analysis of Tweets

Table of Contents

Overview

Dataset

Installation

Usage

Methods

1. Grid-Based Spatial Analysis

2. Newsworthiness Scoring

3. Analysis and Visualization

Visualizations

Discussion

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Mejorarsim/Geo-Localization

Folders and files

Latest commit

History

Repository files navigation

Geo-Localization Analysis of Tweets

Table of Contents

Overview

Dataset

Installation

Usage

Methods

1. Grid-Based Spatial Analysis

2. Newsworthiness Scoring

3. Analysis and Visualization

Visualizations

Discussion

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages