Welcome

Hello I'm YuChen. This project aims to solve the Text-to-SQL issue by Langgraph new agent framework Deep Agent.

Expected Abilities

Flexible Utilization: User only need to ask in natural language and attatch the datasets as input.
Automatically Data Schema Analyze: System will read and understand the user query and data schema.
Text-to-SQL: Transfer user's query to SQL language.
Visualization Present: Present the analysis to answer user's query.

Present

Example 1: Dataset Analysis

Task: Analyze the count of surviving males over age 20 in the Titanic dataset.
Input: titanic_dataset.csv
Response: 在這次的數據分析任務中，您要求分析泰坦尼克號數據集中20歲以上存活男性的數量。根據分析結果，泰坦尼克號數據集中20歲以上的存活男性數量為64人。

Planning:

{
     "planning": 
     [
          {
               "task_index": 1, 
               "planning_type": "data_analysis", 
               "question": "Analyze the count of surviving males over age 20 in the Titanic dataset.", 
               "answer": "The count of surviving males over age 20 in the Titanic dataset is 64.", 
               "dataset_path": "path to titanic_dataset.csv", 
               "finish": true
          }
     ]
}

Example 2: Dataset Cleaning

Task: Clean the Titanic dataset to make sure there are no missing values and save it for me.
Input: titanic_dataset.csv

Response:

在這次計劃中，我們針對泰坦尼克號數據集進行了清理和分析，具體步驟如下：

1. **檢查缺失值**：
- 在泰坦尼克號數據集中，發現以下缺失值：
     - **年齡 (Age)**: 177 個缺失值
     - **艙位 (Cabin)**: 687 個缺失值
     - **登船港口 (Embarked)**: 2 個缺失值

2. **處理缺失值的建議**：
- 提出了多種處理缺失值的方法，包括：
     - 刪除缺失值的行
     - 使用均值、中位數或眾數進行填補
     - 創建新類別來標記缺失值
     - 使用插值法估算缺失值
     - 標記缺失值的二元列
     - 根據領域知識採用特定方法

3. **保存清理後的數據集**：
- 清理後的泰坦尼克號數據集已成功保存，路徑為：`path to titanic_dataset.csv cleaned_titanic_dataset.csv`。

如需進一步的幫助或示例代碼，請隨時告知！

Planning:

{
     "planning":
     [
          {
               "task_index": 1,
               "planning_type": "data_analysis",
               "question": "Check the Titanic dataset for any missing values and provide a summary of the findings.",
               "answer": "The summary of missing values in the Titanic dataset is as follows:\n\n- **Age**: 177 missing values\n- **Cabin**: 687 missing values\n- **Embarked**: 2 missing values",
               "dataset_path": "path to titanic_dataset.csv",
               "finish": true
          },
          {
               "task_index": 2,
               "planning_type": "data_analysis",
               "question": "If there are missing values in the Titanic dataset, suggest appropriate methods for handling them.",
               "answer": "The Titanic dataset contains several columns with missing values, particularly in the \"Age\", \"Cabin\", and \"Embarked\" columns. Here are some appropriate methods for handling these missing values:\n\n1. **Remove Rows with Missing Values**:\n   - If the number of missing values is small, you can simply remove the rows with missing data.\n\n2. **Imputation**:\n   - **Mean/Median Imputation**: For numerical columns like \"Age\", you can fill missing values with the mean or median of the column.\n   - **Mode Imputation**: For categorical columns like \"Embarked\", you can fill missing values with the mode (most frequent value).\n   - **Predictive Imputation**: Use machine learning models to predict and fill missing values based on other features.\n\n3. **Create a New Category**:\n   - For categorical variables, you can create a new category for missing values (e.g., \"Unknown\" for \"Cabin\" or \"Embarked\").\n\n4. **Use Interpolation**:\n   - For numerical data, you can use interpolation methods to estimate missing values based on other data points.\n\n5. **Flag Missing Values**:\n   - Create a new binary column indicating whether the value was missing, which can be useful for models that can leverage this information.\n\n6. **Domain-Specific Methods**:\n   - Depending on the context, you might have specific methods to handle missing values based on domain knowledge.\n\nWould you like to see a Python code example implementing any of these methods?",
               "dataset_path": "path to titanic_dataset.csv",
               "finish": true
          },
          {
               "task_index": 3,
               "planning_type": "data_analysis",
               "question": "After cleaning the dataset, save the cleaned version of the Titanic dataset.",
               "answer": "The cleaned version of the Titanic dataset has been saved to: `path to cleaned_titanic_dataset.csv`.",
               "dataset_path": "path to titanic_dataset.csv",
               "finish": true
          }
     ]

}

Example 3: Dataset Analysis and Visualization

Task: Clean the Titanic dataset to make sure there are no missing values and visualize the age and distribution with picture results.
Input: titanic_dataset.csv

Response:

在這次計劃中，我們針對泰坦尼克號數據集進行了數據清理和可視化分析。

1. **數據清理**：
- 我們清理了泰坦尼克號數據集，以確保沒有缺失值。清理前的缺失值統計如下：
     - 年齡 (Age): 177
     - 船艙 (Cabin): 687
     - 登船港口 (Embarked): 2
- 清理後的數據集已保存為 `titanic_dataset_cleaned.csv`。

2. **數據可視化**：
- 我們成功地可視化了乘客的年齡分佈，並將結果保存為 PNG 文件。該文件的路徑為 `titanic_age_distribution.png`。

整體來說，計劃已順利完成，數據集已清理並可視化。

Planning:

{
     "planning":
     [
          {
               "task_index": 1,
               "planning_type": "data_analysis",
               "question": "Clean the Titanic dataset to ensure there are no missing values.",
               "answer": "The Titanic dataset has been cleaned to remove rows with missing values. Here are the details:\n\n- **Missing values before cleaning:**\n  - Age: 177\n  - Cabin: 687\n  - Embarked: 2\n\nThe cleaned dataset has been saved as `titanic_dataset_cleaned.csv`.",
               "dataset_path": "path to titanic_dataset.csv",
               "finish": true
          },
          {
               "task_index": 2,
               "planning_type": "data_analysis",
               "question": "Visualize the age distribution of passengers in the Titanic dataset.",
               "answer": "The age distribution of passengers in the Titanic dataset has been successfully visualized and saved as a PNG file. You can find the file at the following path:\n\n**titanic_age_distribution.png**  \n`path to titanic_age_distribution.png`",
               "dataset_path": " path to titanic_dataset.csv",
               "finish": true
          }
     ]  
}

Architecture

User Question
     ↓
Intent Agent 
     └─ Check if user's question aligns with the system's capabilities.
     ↓
Planning Agent 
     └─ Orchestrate specialized sub-agents and manage complex, multi-step procedure.
     ↓
Specialized Sub-agents
     ├─  Dataframe Agent
     │    ├─ Helper Tools
     │    │    ├─ execute_python()
     │    ├─ Filesystem Tools (optional)
     │    │    ├─
     │    │    ├─ ls
     │    │    ├─ grep
     │    │    ├─ read_file
     │    │    ├─ write_file
     │    │    └─ edit_file
     └─ Other Specialized Sub-agents ... (waiting)
     ↓
Summary Agent
     ↓
Formatted Answer

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
agents		agents
images		images
utils/AI_utils		utils/AI_utils
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
graph.py		graph.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome

Expected Abilities

Present

Example 1: Dataset Analysis

Example 2: Dataset Cleaning

Example 3: Dataset Analysis and Visualization

Architecture

Reference

Reading Reference

Datasets Reference

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Welcome

Expected Abilities

Present

Example 1: Dataset Analysis

Example 2: Dataset Cleaning

Example 3: Dataset Analysis and Visualization

Architecture

Reference

Reading Reference

Datasets Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages