Skip to content

Review notes #1

@zolizoli

Description

@zolizoli
  1. [Technological note]No requirements.txt, no info on Python version, package version. Suggestion: use poetry for version control and package management
  2. [Technological note] What is the purpose of the repo? Should it be a package that can be used in other applications? Is it a standalone app? Suggestion: clarify the purpose and use a cookiecutter template to structure the code.
  3. [Technological note] Although it works, it is not general and hard to follow.
  4. [Methodological note] Please define "over" and "under" representation of a specific group. Do you want to sample according to a predefined distribution (e.g. 70% of your respondents are male, but you know that the proportion of males in the whole population is about 49% so you adjust your sample accordingly. In that case your input should be a) respondent data b) sample size c) the "true" or desired distribution of the categories to be sampled).
  5. [Technological/Methodological note] Do some exploratory data analysis on your data and check the distributions for each categorical variable. Check the numbers for the smallest categories. If their number is high enough, use imbalanced-learn random under sampling method first. Then you can take a simple random sample from the undersampled data and you get what you need.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions