- Overview
- Features
- ML Algorithm & Training
- Project Structure
- Setup
- App and Script flags
- Browser Dashboard
Overview.mp4
pHorest is a cyber-physical system designed to combat soil acidification and nutrient runoff—major environmental concerns in modern chemistry. By integrating IoT sensors with a Random Forest Machine Learning model, the system identifies the soil's chemical state and provides precise neutralization strategies through recommendations of crops and fertilizers.
- Hardware Sensing: Real-time pH, TDS, and Temperature monitoring via Arduino.
- Cloud Rainfall Input: 30-day rainfall is fetched online by location (Open-Meteo) for model input.
- AI Inference: Random Forest Classifier trained on agricultural datasets.
- Smart Recommendations: Crop prediction + fertilizer suggestion with pH-aware chemistry advice.
- Algorithm:
RandomForestClassifier - Input features (strict order):
Nitrogen,phosphorus,potassium,temperature,humidity,ph,rainfall - Target label:
label(crop name) - Train/test split:
80/20usingtrain_test_split(..., test_size=0.2, random_state=42) - Model params used:
n_estimators=100,random_state=42 - Saved model artifact:
random forest/model/soil_model.pkl(viajoblib.dump)
- Trained in Jupyter Notebook.
- Training code used: same as
random forest/training/train.ipynbin this repo - Dataset used:
Crop_recommendation.csv(same schema asrandom forest/dataset/Crop_recommendation.csvin this repo).
- Place
random forest/model/soil_model.pklin therandom forest/model/directory. - Ensure runtime feature order exactly matches training feature order.
- Run
scripts/ser_script.py,scripts/sim_script.py, orapp/app.pyto load the model and predict crops.
app/app.py: Browser dashboard (live readings + crop + fertilizer recommendation).scripts/: Contains utility scripts.ser_script.py: Local Python bridge between Arduino and ML model.sim_script.py: Simulation script for software-only demonstration.
random forest/: Contains all machine learning related files.dataset/: Contains the dataset.Crop_recommendation.csv: The dataset used for training.
model/: Contains the trained model.soil_model.pkl: Serialized Random Forest model.
training/: Contains the training notebook.train.ipynb: Jupyter notebook for training the model.
docs/setup.md: Required hardware components and Arduino wiring map.arduino/: Directory containing Arduino firmware files for data acquisition.
Before proceeding, Set up hardware per docs/setup.md and flash arduino/arduino_full/arduino_full.ino via Arduino IDE.
-
Connect Arduino (for live/serial mode), commonly
/dev/ttyACM0or/dev/ttyUSB0for Linux/macOS, orCOM3for Windows. -
Install dependencies:
Linux / macOS:
python3 -m pip install pandas joblib pyserial scikit-learn flask
Windows:
py -m pip install pandas joblib pyserial scikit-learn flask
To run the scripts, use the appropriate command for your operating system:
-
Terminal:
Linux / macOS:
# Simulation python3 scripts/sim_script.py # Serial python3 scripts/ser_script.py --port=/dev/ttyACM0
Windows:
# Simulation py scripts/sim_script.py # Serial py scripts/ser_script.py --port=COM3
-
Browser dashboard:
Linux / macOS:
# Simulation (default) python3 app/app.py # Serial (explicit port) python3 app/app.py --serial /dev/ttyACM0 # Serial (auto-detect first available Arduino-like port) python3 app/app.py --serial
Windows:
# Simulation (default) py app/app.py # Serial py app/app.py --serial COM3
app/app.py→ starts in SIM modeapp/app.py --sim→ explicitly starts in SIM modeapp/app.py --serial <PORT>→ starts in SERIAL modeapp/app.py --serial→ starts in SERIAL mode and auto-detects the port if/dev/ttyACM0is unavailable--lock-mode→ hides the mode selector and prevents switching modes during runtime--no-check→ skips strict pH input range validation (chemical status still shown)
Usage examples:
# Start in simulation mode
python3 app/app.py
# Start in serial mode with a specific port
python3 app/app.py --serial /dev/ttyACM0
# Start in serial mode with auto-port detection and locked UI
python3 app/app.py --serial --lock-modethe mode selector can be hidden by pressing 'm' in normal mode.
These scripts support optional flags to provide rainfall data:
--location <LAT> <LON>: Fetches 30-day rainfall data from an online weather API (Open-Meteo) based on the provided latitude and longitude.- If the API call fails, the script will fall back to using the
--rainfall_datavalue if provided, otherwise it will use a default value (100.0mm) and print a warning.
- If the API call fails, the script will fall back to using the
--rainfall_data <VALUE>: Directly specifies the 30-day rainfall value in millimeters.- If both
--locationand--rainfall_dataare provided,--locationis attempted first. If it fails,--rainfall_datais used as a fallback.
- If both
Usage examples:
# Simulation with location-based rainfall
python3 scripts/sim_script.py --location 34.0522 -118.2437
# Simulation with direct rainfall data
python3 scripts/sim_script.py --rainfall_data 150.0
# Serial with location-based rainfall
python3 scripts/ser_script.py --port=/dev/ttyACM0 --location 34.0522 -118.2437
# Serial with direct rainfall data
python3 scripts/ser_script.py --port=/dev/ttyACM0 --rainfall_data 150.0Open http://127.0.0.1:5000 after starting app/app.py.
- Set your coordinates using Use Browser Location or manual latitude/longitude.
- The app fetches rolling 30-day rainfall (mm) online and feeds it to the model.
- If location/weather fetch is unavailable, the app falls back to cached or default rainfall and shows the source status in UI.