CBMI (Cerberus-Misinformation) is a Python-based tool designed to analyze social media activity on X and Reddit, helping you identify potential misinformation campaigns. It pulls posts based on your keywords, crunches the data with sentiment analysis, tracks user behavior, and cross-references it with Google Trends to give you a fuller picture. The output? A neat PDF report packed with insights—think sentiment trends, frequency charts, and a quick “is this fishy?” assessment—all backed by visuals you can actually use.
Here’s the technical rundown of how CBMI works:
-
Data Collection:
- Hits X’s API (v2 via
tweepy) and Reddit’s API (viapraw) to grab posts matching your keywords within a specified time frame. It respects the free tier limits, capping at 10,000 posts total (split evenly between platforms). - Pulls Google Trends data (via
pytrends) to see how your keywords are trending in search interest over the same period.
- Hits X’s API (v2 via
-
Analysis:
- Runs sentiment analysis on each post using
TextBlob—scores range from -1 (negative) to 1 (positive)—and tracks daily averages to spot mood swings. - Counts post frequency per day with
pandasto catch sudden spikes that might signal a coordinated push. - Extracts the top 10 keywords from posts using
scikit-learn’sCountVectorizerfor a peek at what’s buzzing. - Checks user behavior—flags accounts less than 30 days old posting more than 5 times as potential bots.
- Runs sentiment analysis on each post using
-
Visualization:
- Generates three plots with
matplotlib(using theAggbackend for file-only output, handy for iOS or headless systems):- Post frequency over time.
- Sentiment trend over time.
- A dual-axis chart comparing social media posts to Google Trends interest.
- Generates three plots with
-
Misinformation Detection:
- Looks for red flags like frequency spikes (3x the average), big sentiment shifts (>0.5), or bot-like activity. Correlates social media sentiment with Google Trends to see if chatter aligns with public curiosity.
- Wraps it up in an “Intelligence Brief” with a verdict (e.g., “Potential misinformation detected”) and reasoning tied to the data.
-
Reporting:
- Compiles everything into a PDF via
reportlab, including raw stats, plots, and the brief. You also get the raw.pngplot files as a bonus.
- Compiles everything into a PDF via
It’s not rocket science, but it’s built to cut through the noise and give you something actionable—whether you’re tracking rumors, scams, or just curious about what’s trending.
Before you can use CBMI, you’ll need to set it up. Here’s the step-by-step:
- Python: Version 3.9 or higher. Check with
python --version. - API Access: You’ll need keys for X and Reddit (details below).
- A Terminal: For CLI usage or initial setup.
- Optional (GUI): A desktop OS (Windows, Mac, Linux) with Tkinter installed for the GUI version.
-
Clone the Repository:
- Grab the code from GitHub:
git clone https://github.com/jdmx0/cbmi.git cd cbmi
- Grab the code from GitHub:
-
Install Dependencies:
- Run this in your terminal to get all the Python libraries CBMI needs:
pip install -r requirements.txt
- This pulls in
tweepy,praw,textblob,pandas,matplotlib,reportlab,pytrends, andscikit-learn. On some systems, you might needpip install tkfor the GUI.
- Run this in your terminal to get all the Python libraries CBMI needs:
-
Configure API Credentials:
- Open
cbmi/config.pyin a text editor (e.g., Notepad, VS Code). - Add your API keys:
- X: Get these from the X Developer Portal:
CONSUMER_KEYCONSUMER_SECRETACCESS_TOKENACCESS_TOKEN_SECRET
- Reddit: Create an app at Reddit Apps:
REDDIT_CLIENT_IDREDDIT_CLIENT_SECRET- Leave
REDDIT_USER_AGENTas"cbmi"or tweak it if you want.
- X: Get these from the X Developer Portal:
- Save the file. These keys let CBMI talk to X and Reddit—without them, you’ll get nada.
- Open
CBMI offers two ways to run it: Command-Line Interface (CLI) for quick, no-frills analysis, and Graphical User Interface (GUI) for a point-and-click experience. Here’s how to use both.
The CLI is perfect if you’re comfortable with a terminal and want to get straight to it.
-
Navigate to the Folder:
- In your terminal, move to the
cbmidirectory:cd path/to/cbmi
- In your terminal, move to the
-
Launch the Script:
- Type and hit Enter:
python main.py
- Type and hit Enter:
-
Answer the Prompts:
- Keywords: Type space-separated words (e.g.,
election fraud). Hit Enter. - Sample Size: Enter a number between 1 and 1,000,000 (e.g.,
500). It’s capped at 10,000 due to API limits. Hit Enter. - Start Time: Enter a date like
2023-10-01(YYYY-MM-DD). Must be within X’s 7-day free API window. Hit Enter. - End Time: Enter a later date like
2023-10-07. Hit Enter.
- Keywords: Type space-separated words (e.g.,
-
Watch It Work:
- The terminal shows progress: “Collecting up to 250 posts from X and Reddit...” then “PDF report generated: report.pdf”.
- It grabs posts, analyzes them, and saves the report.
-
Check the Output:
- Find
report.pdfin thecbmifolder, along with.pngplot files (frequency.png,sentiment.png,trends.png). - Open the PDF to see your analysis.
- Find
$ python main.py
Enter keywords (space-separated): vaccine conspiracy
Enter sample size (1-1000000): 1000
Enter start time (YYYY-MM-DD): 2023-10-01
Enter end time (YYYY-MM-DD): 2023-10-07
Collecting up to 500 posts from X and Reddit...
PDF report generated: report.pdf