-
Notifications
You must be signed in to change notification settings - Fork 0
Review notes #1
Copy link
Copy link
Open
Description
zolizoli
opened on Apr 8, 2024
Issue body actions
- [Technological note]No requirements.txt, no info on Python version, package version. Suggestion: use poetry for version control and package management
- [Technological note] What is the purpose of the repo? Should it be a package that can be used in other applications? Is it a standalone app? Suggestion: clarify the purpose and use a cookiecutter template to structure the code.
- [Technological note] Although it works, it is not general and hard to follow.
- [Methodological note] Please define "over" and "under" representation of a specific group. Do you want to sample according to a predefined distribution (e.g. 70% of your respondents are male, but you know that the proportion of males in the whole population is about 49% so you adjust your sample accordingly. In that case your input should be a) respondent data b) sample size c) the "true" or desired distribution of the categories to be sampled).
- [Technological/Methodological note] Do some exploratory data analysis on your data and check the distributions for each categorical variable. Check the numbers for the smallest categories. If their number is high enough, use imbalanced-learn random under sampling method first. Then you can take a simple random sample from the undersampled data and you get what you need.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels