Skip to content

Sample data publishing #62

Open
oshadmon wants to merge 23 commits into
rs-improvementsfrom
os-dev
Open

Sample data publishing #62
oshadmon wants to merge 23 commits into
rs-improvementsfrom
os-dev

Conversation

@oshadmon
Copy link
Copy Markdown
Collaborator

@oshadmon oshadmon commented Apr 4, 2025

  1. README explains how AnyLog works when publishing data
  2. mnist and winnio have sample data for POST and PUT respectively
  3. You can view expected behavior on root@74.207.235.89 (your machine) + Remote-CLI

Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
@oshadmon oshadmon requested a review from royshadmon April 4, 2025 04:39
@oshadmon oshadmon self-assigned this Apr 4, 2025
royshadmon and others added 10 commits April 8, 2025 12:55
…note that the paths in the env_files/mnist are Window machine examples
updating edgelake image tages
clarification in readme
* improvements to code, including file handling

* gpu laptop updates
fixing merge conflict to enable data handler to choose whether to train on GPU if available and defaults to CPU if not

* providing option in data handler for GPU use

* fixing mongo file get and including the same code in aggregator

* cleaning aggregator code

* merging code, integrating GPU usage if it exists, better error handling on file read between AnyLog nodes, updated exampels  for mnist.env files, code cleaning

* undoing accidental changes to EdgeLake dir
…ining node db validation on init (#71)

* Marked where+which IBM imports are used

* Marked all IBM DataHandler and ModelUpdate instances

* Removed need for IBM DataHandler for Winniio model

* Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* [Forgot to push local_model_update.py] Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* Updated IBM's Fusion Model to support our version of ModelUpdate (removes the need for manually changes)

* Created a base aggregation model class + federated average agg. model to replace IBM's IterAvgFusionHandler

* Removed temp ibm_fusion_handler.py

* Removed temp ibm_fusion_handler.py

* MNist works with Keras instead of PyTorch, removed IBM dependency

* Removed IBM federated learning, upgraded platform to Python 3.11/requirements, merged w/updated README

* Revert "Removed IBM federated learning & upgraded platform to Python 3.11"

* Revert "Revert "Removed IBM federated learning & upgraded platform to Python 3.11""

* Upgraded platform to run on Python 3.12 + requirements; updated README

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Fixing FastAPI naming error

* Adding ability of logger to env's; added decision tree PDF

* Update winniio.env to have debugger support

* Create mnist.env example

* Update .gitignore to allow mnist.env

* Update .gitignore to allow both .env to be unwritten

* Fixed pathing for logs and winniio bug fix

* Fixed errors for release 1; formatting

* final release 1 updates. refactoring directory names.  everything works on both mnist and winniio dataset

* adding env files back to main

* cleaning codebase

* Implemented index key into initialization + README; added missing torchvision requirement; fixed typos

* added `blockchain get index` ability; added node_type to policy definitions; tested w/`blockchain get [query]`

* Added db validation check when a node starts up; adjusted logger for node_server / node

* Converted Flask to FastAPI for continuous training code (originally written in Flask by Chahel); fixed merge conflicts

* Update README.md for continue-training command

* Fixed file write paths to include/sorted by index

* Fixed round 1 bug for continuous training; added {index}-r to the blockchain to hold most recent aggregated model file

* updating README and small code cleaning to PR

---------

Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: royshadmon <16313057+royshadmon@users.noreply.github.com>
…init / minParams customization & checker + README) (#74)

* Marked where+which IBM imports are used

* Marked all IBM DataHandler and ModelUpdate instances

* Removed need for IBM DataHandler for Winniio model

* Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* [Forgot to push local_model_update.py] Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* Updated IBM's Fusion Model to support our version of ModelUpdate (removes the need for manually changes)

* Created a base aggregation model class + federated average agg. model to replace IBM's IterAvgFusionHandler

* Removed temp ibm_fusion_handler.py

* Removed temp ibm_fusion_handler.py

* MNist works with Keras instead of PyTorch, removed IBM dependency

* Removed IBM federated learning, upgraded platform to Python 3.11/requirements, merged w/updated README

* Revert "Removed IBM federated learning & upgraded platform to Python 3.11"

* Revert "Revert "Removed IBM federated learning & upgraded platform to Python 3.11""

* Upgraded platform to run on Python 3.12 + requirements; updated README

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Fixing FastAPI naming error

* Adding ability of logger to env's; added decision tree PDF

* Update winniio.env to have debugger support

* Create mnist.env example

* Update .gitignore to allow mnist.env

* Update .gitignore to allow both .env to be unwritten

* Fixed pathing for logs and winniio bug fix

* Fixed errors for release 1; formatting

* final release 1 updates. refactoring directory names.  everything works on both mnist and winniio dataset

* adding env files back to main

* cleaning codebase

* Implemented index key into initialization + README; added missing torchvision requirement; fixed typos

* added `blockchain get index` ability; added node_type to policy definitions; tested w/`blockchain get [query]`

* Added db validation check when a node starts up; adjusted logger for node_server / node

* Converted Flask to FastAPI for continuous training code (originally written in Flask by Chahel); fixed merge conflicts

* Update README.md for continue-training command

* Fixed file write paths to include/sorted by index

* Fixed round 1 bug for continuous training; added {index}-r to the blockchain to hold most recent aggregated model file

* updating README and small code cleaning to PR

* Adjust API calls to support dynamic nodes; minParams also dynamically adjusted for newly added nodes in middle of training

* Adjusted minParams to prevent stalling when it is greater than the number of active nodes

* Update README.md for dynamic nodes

* Update README.md for pathing

* Enabled threading for node initialization (originally from Nikolas' code); fixed bugs and loggers

* Added new endpoint to update minParams (available after initialization); removed dynamic minParams when new nodes are added during training; updated README w/new endpoint

* updating indexing to write files to a directory named the index, not filename including the index name. also adding index parameter to the continue-traiing functionality. changes also require updates to env files, so please take note

* updating README with the continue-training rest call

* removing unused variable from env files

* Added index component to `/start-training`; updated README

* when adding a new node to the training process, it starts at the most recent round

* removing commented code

---------

Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: royshadmon <16313057+royshadmon@users.noreply.github.com>
…ltaneously / Aggregator Direct-Inference / Docker containerization / Pre-pulling policies) (#84)

* Marked where+which IBM imports are used

* Marked all IBM DataHandler and ModelUpdate instances

* Removed need for IBM DataHandler for Winniio model

* Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* [Forgot to push local_model_update.py] Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* Updated IBM's Fusion Model to support our version of ModelUpdate (removes the need for manually changes)

* Created a base aggregation model class + federated average agg. model to replace IBM's IterAvgFusionHandler

* Removed temp ibm_fusion_handler.py

* Removed temp ibm_fusion_handler.py

* MNist works with Keras instead of PyTorch, removed IBM dependency

* Removed IBM federated learning, upgraded platform to Python 3.11/requirements, merged w/updated README

* Revert "Removed IBM federated learning & upgraded platform to Python 3.11"

* Revert "Revert "Removed IBM federated learning & upgraded platform to Python 3.11""

* Upgraded platform to run on Python 3.12 + requirements; updated README

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Fixing FastAPI naming error

* Adding ability of logger to env's; added decision tree PDF

* Update winniio.env to have debugger support

* Create mnist.env example

* Update .gitignore to allow mnist.env

* Update .gitignore to allow both .env to be unwritten

* Fixed pathing for logs and winniio bug fix

* Fixed errors for release 1; formatting

* final release 1 updates. refactoring directory names.  everything works on both mnist and winniio dataset

* adding env files back to main

* cleaning codebase

* Implemented index key into initialization + README; added missing torchvision requirement; fixed typos

* added `blockchain get index` ability; added node_type to policy definitions; tested w/`blockchain get [query]`

* Added db validation check when a node starts up; adjusted logger for node_server / node

* Converted Flask to FastAPI for continuous training code (originally written in Flask by Chahel); fixed merge conflicts

* Update README.md for continue-training command

* some docker stuff done, working on make commands

* Fixed file write paths to include/sorted by index

* Fixed round 1 bug for continuous training; added {index}-r to the blockchain to hold most recent aggregated model file

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* updating README and small code cleaning to PR

* Adjust API calls to support dynamic nodes; minParams also dynamically adjusted for newly added nodes in middle of training

* Adjusted minParams to prevent stalling when it is greater than the number of active nodes

* Update README.md for dynamic nodes

* Update README.md for pathing

* working on letting app2.py work with platform_cmponents

* Enabled threading for node initialization (originally from Nikolas' code); fixed bugs and loggers

* Added new endpoint to update minParams (available after initialization); removed dynamic minParams when new nodes are added during training; updated README w/new endpoint

* Checks if index is unique; modularized initialization portion of aggregator and nodes

* app2.py correctly starts aggregator server

* updating indexing to write files to a directory named the index, not filename including the index name. also adding index parameter to the continue-traiing functionality. changes also require updates to env files, so please take note

* updating README with the continue-training rest call

* removing unused variable from env files

* Added index component to `/start-training`; updated README

* when adding a new node to the training process, it starts at the most recent round

* removing commented code

* Adding to-do's; bug fixes

* pc work from the previous meeting 5/1

* Added aggregator and node modularity (except for different data handlers); implemented simultaneous training processes on the same servers but only works if the same model is training on all processes; indicated which model is running during progression (minus the progress bar); ensured training is multithreaded and dynamic; index is now required for all current endpoints; fixed README

* Update README.md for module path

Will be changed later for simplicity of user

* testing docker build on pc

* 5/5/25 starting on make fucntionality, still having some  issues with envs and running the container with correct files

* Tested multiple (different) data handlers running concurrently; fixed README; passed db through initialization command; moved DB check in node_server from lifespan() to initialization --> marking the end of testing for training of multiple models

* Adjusted some logger messages

* Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)

* fixing directories on default .env files

* 5/6 work, docker build runs aggregator correctly, need to access from host browser

* 5-8 merging pre-main into containerization

* Update README.md (forgot commas on command examples)

* aggregator and nodes deployed as docker containers using a single docker compose file, hanging on training

* updating env files

* Revert "Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)"

This reverts commit 39a335a.

* Reworked "converting module_path to module_path";passed db_name; fixed README

* Update docker-compose.yaml

* updated readme to match pre-main

* updated readme to include containerization

* updated readme to include containerization

* Added message at the end of training process; checked that module_path existed; checked if node actually initialized; started aggregator direct inference; some code cleaning

* Added direct_inference to aggregator (takes in list of test data and list of labels/predictions); enabled aggregator to have an fl_model (in both data handlers); updated direct_inference in the mnist data handler (will update for winniio later)

* Updated agg. direct_inference for winniio; direct inference input now allows list of elements of any type (conversions and validations are done within the data_handler); updated README

* updated README w/WINNIIO direct_inference example

* added logs to node_server.py, updated docxkers to include the cuda path

* Commented plan for updating listen_for_update_agg()

* Updated aggregator to preemptively pull node model links in listen_for_update_agg() (new function for reading/fetching node model links)

* replaced absolute path in docker-compose with a relative path (no longer needs to be updated), removed cuda usage in the containerized apis

* removed unused app.py, renamed app2.py to app.py

* pulled node_server.py from pre-main

* Update README.md

* updated README to include running specific apis only

* updated README to include taking down all the apis

* fix for continue-training that kee[s end_round state

* updating gitignore to prevent default evnfiles in both EdgeLake and edgefl from being overwritten

---------

Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: royshadmon <16313057+royshadmon@users.noreply.github.com>
Co-authored-by: Miguel61823 <mmascare@ucsc.edu>
Co-authored-by: David Wu <122853894+DDublue@users.noreply.github.com>
Co-authored-by: Miguel61823 <146488686+Miguel61823@users.noreply.github.com>
* Marked where+which IBM imports are used

* Marked all IBM DataHandler and ModelUpdate instances

* Removed need for IBM DataHandler for Winniio model

* Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* [Forgot to push local_model_update.py] Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* Updated IBM's Fusion Model to support our version of ModelUpdate (removes the need for manually changes)

* Created a base aggregation model class + federated average agg. model to replace IBM's IterAvgFusionHandler

* Removed temp ibm_fusion_handler.py

* Removed temp ibm_fusion_handler.py

* MNist works with Keras instead of PyTorch, removed IBM dependency

* Removed IBM federated learning, upgraded platform to Python 3.11/requirements, merged w/updated README

* Revert "Removed IBM federated learning & upgraded platform to Python 3.11"

* Revert "Revert "Removed IBM federated learning & upgraded platform to Python 3.11""

* Upgraded platform to run on Python 3.12 + requirements; updated README

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Fixing FastAPI naming error

* Adding ability of logger to env's; added decision tree PDF

* Update winniio.env to have debugger support

* Create mnist.env example

* Update .gitignore to allow mnist.env

* Update .gitignore to allow both .env to be unwritten

* Fixed pathing for logs and winniio bug fix

* Fixed errors for release 1; formatting

* final release 1 updates. refactoring directory names.  everything works on both mnist and winniio dataset

* adding env files back to main

* cleaning codebase

* Implemented index key into initialization + README; added missing torchvision requirement; fixed typos

* added `blockchain get index` ability; added node_type to policy definitions; tested w/`blockchain get [query]`

* Added db validation check when a node starts up; adjusted logger for node_server / node

* Converted Flask to FastAPI for continuous training code (originally written in Flask by Chahel); fixed merge conflicts

* Update README.md for continue-training command

* some docker stuff done, working on make commands

* Fixed file write paths to include/sorted by index

* Fixed round 1 bug for continuous training; added {index}-r to the blockchain to hold most recent aggregated model file

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* updating README and small code cleaning to PR

* Adjust API calls to support dynamic nodes; minParams also dynamically adjusted for newly added nodes in middle of training

* Adjusted minParams to prevent stalling when it is greater than the number of active nodes

* Update README.md for dynamic nodes

* Update README.md for pathing

* working on letting app2.py work with platform_cmponents

* Enabled threading for node initialization (originally from Nikolas' code); fixed bugs and loggers

* Added new endpoint to update minParams (available after initialization); removed dynamic minParams when new nodes are added during training; updated README w/new endpoint

* Checks if index is unique; modularized initialization portion of aggregator and nodes

* app2.py correctly starts aggregator server

* updating indexing to write files to a directory named the index, not filename including the index name. also adding index parameter to the continue-traiing functionality. changes also require updates to env files, so please take note

* updating README with the continue-training rest call

* removing unused variable from env files

* Added index component to `/start-training`; updated README

* when adding a new node to the training process, it starts at the most recent round

* removing commented code

* Adding to-do's; bug fixes

* pc work from the previous meeting 5/1

* Added aggregator and node modularity (except for different data handlers); implemented simultaneous training processes on the same servers but only works if the same model is training on all processes; indicated which model is running during progression (minus the progress bar); ensured training is multithreaded and dynamic; index is now required for all current endpoints; fixed README

* Update README.md for module path

Will be changed later for simplicity of user

* testing docker build on pc

* 5/5/25 starting on make fucntionality, still having some  issues with envs and running the container with correct files

* Tested multiple (different) data handlers running concurrently; fixed README; passed db through initialization command; moved DB check in node_server from lifespan() to initialization --> marking the end of testing for training of multiple models

* Adjusted some logger messages

* Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)

* fixing directories on default .env files

* 5/6 work, docker build runs aggregator correctly, need to access from host browser

* 5-8 merging pre-main into containerization

* Update README.md (forgot commas on command examples)

* aggregator and nodes deployed as docker containers using a single docker compose file, hanging on training

* updating env files

* Revert "Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)"

This reverts commit 39a335a.

* Reworked "converting module_path to module_path";passed db_name; fixed README

* Update docker-compose.yaml

* updated readme to match pre-main

* updated readme to include containerization

* updated readme to include containerization

* Added message at the end of training process; checked that module_path existed; checked if node actually initialized; started aggregator direct inference; some code cleaning

* Added direct_inference to aggregator (takes in list of test data and list of labels/predictions); enabled aggregator to have an fl_model (in both data handlers); updated direct_inference in the mnist data handler (will update for winniio later)

* Updated agg. direct_inference for winniio; direct inference input now allows list of elements of any type (conversions and validations are done within the data_handler); updated README

* updated README w/WINNIIO direct_inference example

* added logs to node_server.py, updated docxkers to include the cuda path

* Commented plan for updating listen_for_update_agg()

* Updated aggregator to preemptively pull node model links in listen_for_update_agg() (new function for reading/fetching node model links)

* replaced absolute path in docker-compose with a relative path (no longer needs to be updated), removed cuda usage in the containerized apis

* removed unused app.py, renamed app2.py to app.py

* pulled node_server.py from pre-main

* Update README.md

* updated README to include running specific apis only

* updated README to include taking down all the apis

* fix for continue-training that kee[s end_round state

* updating gitignore to prevent default evnfiles in both EdgeLake and edgefl from being overwritten

* Added db script and .env files for chest xray bbox model

* Added chest xray bbox data handler (not yet tested) and updated .env's + requirements.txt

* Renamed bbox data handler for consistency

* Tested training of BBox data handler (need to reduce the agg/model file sizes because nodes take too long copying; also still need to test inference); updated README w/Kaggle setup

* Reduced model size of the chest xrays bbox data handler to fix the slow copying (invalid load key bug); inference tested and working

* updating env files and cleaning saved images (#96)

---------

Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: royshadmon <16313057+royshadmon@users.noreply.github.com>
Co-authored-by: Miguel61823 <mmascare@ucsc.edu>
Co-authored-by: David Wu <122853894+DDublue@users.noreply.github.com>
Co-authored-by: Miguel61823 <146488686+Miguel61823@users.noreply.github.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Signed-off-by: Ori Shadmon <oshadmon@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants