4CAT: Capture and Analysis Toolkit

4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to make the capture and analysis of data from these platforms accessible to people through a web interface, without requiring any programming or web scraping skills. Our target audience is researchers, students and journalists interested using Digital Methods in their work.

In 4CAT, you create a dataset from a given platform according to a given set of parameters; the result of this (usually a CSV file containing matching items) can then be downloaded or analysed further with a suite of analytical 'processors', which range from simple frequency charts to more advanced analyses such as the generation and visualisation of word embedding models.

4CAT has a (growing) number of supported data sources corresponding to popular platforms that are part of the tool, but you can also add additional data sources using 4CAT's Python API. The following data sources are currently supported actively:

4chan
8kun
Bitchute
Parler
Reddit
Telegram
Twitter API (Academic and regular tracks)

The following platforms are supported through other tools, from which you can import data into 4CAT for analysis:

Facebook (via CrowdTangle exports)
Instagram (via CrowdTangle)
TikTok (via Zeeschuimer or tiktok-scraper)

A number of other platforms have built-in support that is untested, or requires e.g. special API access. You can view the full list of data sources in the GitHub repository.

Install

You can install 4CAT locally or on a server via Docker or manually. Copying our docker-compose_prod.yml file and using

docker-compose -f docker-compose_prod.yml up

will pull the latest version from Docker Hub, but detailed and alternative installation instructions are available in our wiki. Currently 4chan, 8chan, and 8kun require additional steps; please see the wiki.

Please check our issues and create one if you experience any problems (pull requests are also very welcome).

Components

4CAT consists of several components, each in a separate folder:

backend: A standalone daemon that collects and processes data, as queued via the tool's web interface or API.
webtool: A Flask app that provides a web front-end to search and analyze the stored data with.
common: Assets and libraries.
datasources: Data source definitions. This is a set of configuration options, database definitions and python scripts to process this data with. If you want to set up your own data sources, refer to the wiki.
processors: A collection of data processing scripts that can plug into 4CAT to manipulate or process datasets created with 4CAT. There is an API you can use to make your own processors.

Documentation

Documentation, in this documentation branch, is done (semi-)automatically, with the use of docstrings formatted in RestructuredText, through Sphinx (v4.5) and its autodoc and autosummary modules. Therefore, when merging commits and general updates to the code into this branch, one ought to be aware that the overall code structure have changed slightly; mostly in terms of directories and filenames being renamed with underscores taking the place of hyphens throughout. This is as Sphinx need to import all relevant directories and py-files as modules and packages. This is also why there are a lot of empty "init.py" files.

Currently many functions lack thorough documentation, or any documentation at all, please follow the steps below to update as you encounter unfinished documentation!

Docstring formats:

Please follow the official ReStructuredText formatting for your docstrings:

"""
[Summary]

[Detailed description if needed]

:param [ParamName]: [ParamDescription], defaults to [DefaultParamVal]
:type [ParamName]: [ParamType](, optional)
:raises [ErrorType]: [ErrorDescription]
:return: [ReturnDescription]
:rtype: [ReturnType]
"""

Make sure to, as a minimum, keep the empty lines between summary and param-list

How to update the documentation

For more info, this short guide might be helpful. Make sure that you have sphinx, sphinx-mdinclude sphinx-rtd-theme installed via pip in your environment:

$ (sudo) pip install sphinx

$ (sudo) pip install sphinx-rtd-theme

$ (sudo) pip install sphinx-mdinclude

Scenario: I changed or added new information to existing docstrings:

Update project files and merge commits into this documentation branch, be aware of possible conflicts within naming conventions and resolve accordingly
Open the "documentation" folder in your terminal
Run make clean && make html as this will clear existing HTML and regenerate new, including your new docstrings, and you can do what you want with the newly generated HTML-files in the /build directory - success!

Scenario: I added a new submodule (i.e. a new datasource)

Update project files and merge commits into this documentation branch, be aware of possible conflicts within naming conventions and resolve accordingly
Open the "documentation" folder in your terminal
Either:
- Delete the rst file(s) in question within the "source"-directory (leave the index.rst!)
or
- Manually add the stub in the relevant rst file within the "source"-directory as required
Regenerate missing rst files by running sphinx-apidoc -o ./source .. "/*setup*" "/*4cat-daemon*" "/*config*" in your terminal and reformat accordingly for layout, as all formatting will be lost upon deletion. The last part of the command lists the aspects of the code that we want to exclude, as these are work poorly for Sphinx documentation.
run make clean && make html as this will clear existing HTML and regenerate new, including your new docstrings, and you can do what you want with the newly generated HTML-files in the /build directory - success!

Feel free to contact Martin on Slack about the documentation.

Credits & License

4CAT was created at OILab and the Digital Methods Initiative at the University of Amsterdam. The tool was inspired by DMI-TCAT, a tool with comparable functionality that can be used to scrape and analyse Twitter data.

4CAT development is supported by the Dutch PDI-SSH foundation through the CAT4SMR project.

4CAT is licensed under the Mozilla Public License, 2.0. Refer to the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 2,602 Commits
.github		.github
backend		backend
common		common
config		config
datasources		datasources
docker		docker
documentation		documentation
helper-scripts		helper-scripts
processors		processors
sessions		sessions
webtool		webtool
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.zenodo.json		.zenodo.json
4cat-daemon.py		4cat-daemon.py
LICENSE		LICENSE
LICENSE-3DPARTY		LICENSE-3DPARTY
README.md		README.md
SECURITY.md		SECURITY.md
VERSION		VERSION
config.py		config.py
config.py-example		config.py-example
docker-compose.yml		docker-compose.yml
docker-compose_prod.yml		docker-compose_prod.yml
docker-compose_public_ip.yml		docker-compose_public_ip.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

4CAT: Capture and Analysis Toolkit

Install

Components

Documentation

Docstring formats:

How to update the documentation

Scenario: I changed or added new information to existing docstrings:

Scenario: I added a new submodule (i.e. a new datasource)

Credits & License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

4CAT: Capture and Analysis Toolkit

Install

Components

Documentation

Docstring formats:

How to update the documentation

Scenario: I changed or added new information to existing docstrings:

Scenario: I added a new submodule (i.e. a new datasource)

Credits & License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages