Get a representative sample of articles for a domain that can be used for further studies like credibility assestment of the domain or any other type of analysis.
The project is made up of two stages:
- Size Reduction Stage: where the number of the article is reduced baed on the statistical limited population theory
- Topic sampling: A representative sample from each topic is taken to ensure the diversity and representativeness of the sample. We use BERTopic.