Skip to content

gazdagergo/diverse_groups

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diverse Group Selection Script

Overview

This script is designed to help select a diverse group of individuals from a larger dataset, ensuring a balanced representation across various dimensions such as gender, age group, education, residence, disability, and interest. It includes features to bias the selection process to either over-represent or under-represent certain groups based on predefined criteria, enhancing the flexibility of the selection process to suit specific needs.

Aim

The primary aim of this script is to facilitate the creation of a diversified group from a dataset, ensuring that the final selection mirrors a balanced and diverse representation. It's particularly useful in scenarios where equitable representation is critical, such as in surveys, research studies, or team formations.

Installation

To run this script, you will need Python installed on your machine, along with the following libraries:

  • Pandas
  • NumPy

You can install these libraries using pip:

pip install pandas numpy

Test data generation

$ python generate_csv.py

Execution

To execute the script, follow these steps:

  1. Prepare your dataset according to the specified data structure and save it as a CSV file.
  2. Run the script using a Python interpreter. The first parameter is the CSV file path, the second is the target group size.
$ python main.py example_people_data.csv 60

Biasing Options

The script supports biasing options to either over-represent or under-represent specific groups within the dataset. This feature allows for more control over the diversity of the selected group, making it possible to adjust the selection process based on specific needs or goals.

How to Use Biasing

Biasing is applied through predefined criteria within the script. These criteria can be adjusted by modifying the bias_weights calculation, which assigns different weights to individuals based on attributes such as 'Disability', 'Age Group', or any other column in the dataset.

For example, to over-represent individuals with disabilities, a higher weight is assigned to records where Disability == 'Yes'. Conversely, to under-represent middle-aged males, a lower weight can be assigned to records matching this criterion.

Customizing Bias Criteria

You can customize the bias criteria by editing the select_diverse_group_with_bias in main.py

Data Structure

Your dataset should be a CSV file with the following columns:

  • ID: An incremental integer identifying each record.
  • Age Group: Categorized age groups, e.g., '18-29', '30-39', etc.
  • Education: Level of education, e.g., 'Elementary', 'Secondary', 'Higher'.
  • Gender: Gender identification, e.g., 'Male', 'Female'.
  • Residence: Type of residence, e.g., 'Capital', 'Non-Capital'.
  • Disability: Disability status, e.g., 'Yes', 'No'.
  • Interest: Level of interest, e.g., 'No', 'Some', 'Yes'.

Example CSV dataset structure:

ID,Age Group,Education,Gender,Residence,Disability,Interest
1,18-29,Higher,Female,Capital,No,Yes
2,30-39,Secondary,Male,Non-Capital,Yes,Some
...

AI sources

This script has been made with the help of ChatGPT v4. Please, find the related conversations below:

Simple diverse selection:

https://chat.openai.com/share/1f1c89a2-59ae-44a6-979c-ce720b279229,

Extended dimensions and bias:

https://chat.openai.com/share/437498d5-7535-4788-87ca-c0dc1e35a2c2

Contributing

Contributions are welcome! If you have suggestions or enhancements, please open an issue or submit a pull request.

License

MIT License - Feel free to use, modify, and distribute this script as you see fit.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages