This repository contains scripts used for generating various types of charts and enriching them with contextual metadata using GPT. The charts include bar, line, and pie charts, with options for creating both standard and misleading visualizations.
- ATTENTION: This repository is not designed to function as a standalone program. These scripts are intended for research purposes, and the researcher will need to manually enter file names, adjust parameters in the main scripts, and run validation scripts themselves.
- OpenAI API Key: You will need an OpenAI API key to run these scripts, as they utilize GPT for generating metadata and descriptions. Ensure that your API key is set up in your environment.
generate_chart_context.py: Generate charts with contextual metadata.generate_chart_x_categories.py: Generating chart x-axis categories.generate_chart_y_values.py: Generates charts y-axis values.generate_chart_captions.py: Generates captions and additional descriptive metadata for the charts.
add_misleading_feature.py: Programmatically adds specified misleading feature to existing charts. It allows you to specify the type of misleading feature to apply, such as non-zero baselines or inconsistent time intervals, and modifies the chart's metadata accordingly.
remove_parse_error_charts.py: Run after every run-through of an LLM script to remove charts that have parsing errors, which can occur when the LLM provides unexpected feedback.validate_generated_charts.py: Validates the output of charts generated by the LLM-based script. It checks for empty or null values in generated fields. Invalid charts are logged in a separate file. To be run after the scriptgenerate_chart_context.pyvalidate_x_categories.py: Checks for duplicate or null values in the x-axis categories of generated charts which are then removed from the dataset, with the cleaned data saved to a new file. To be run after the scriptgenerate_chart_x_categories.pyvalidate_y_values.py: Checks for duplicate or null values in the y-axis values of generated charts which are then removed from the dataset, with the cleaned data saved to a new file. To be run after the scriptgenerate_chart_y_values.pyvalidate_chart_captions.py: Validates the captions generated for charts and removes charts with null or empty values. To be run aftergenerate_chart_captions.py.validate_charts_misleading_features.py: Validates the misleading features applied to charts. It checks whether the charts marked as misleading have the correct features, such as non-zero baselines or inconsistent time intervals for bar and line charts, and non-sum-to-100% or over-segmentation for pie charts. Any discrepancies are logged for further review. To be run afteradd_misleading_feature.py.
generate_bar_images.pygenerate_line_images.pygenerate_pie_images.py
prepare_final_metadata_for_hugging_face.pypush_to_hugging_face.py