Feature/data management#466
Open
Shashankss1205 wants to merge 3 commits into
Open
Conversation
b550e24 to
9a11d65
Compare
…save_data - inspect_data: rich metadata (mtype, scitype, shape, freq, cutoff, missing values, head, summary_stats) - split_data: temporal train/test split with test_size (fraction) or fh (horizon count) - transform_data: unified action='format' (auto-fix freq/dupes/NaN) or action='convert' (mtype conversion) - save_data: persist data handles to CSV/Parquet/JSON files - Added 15 unit tests covering all tools and edge cases - Wired all 4 tools into server.py (Tool schemas + call_tool dispatcher) - Updated tools/__init__.py exports
9a11d65 to
5d5afe9
Compare
Add accurate LLM-facing descriptions for inspect_data, split_data, transform_data, and save_data. Remove format_time_series MCP tool now subsumed by transform_data(action='format'). Fix fh list splitting to use max(fh) steps, rename return test_size to n_test, and add fh validation. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: Add Data Management Tools
Reference Issues/PRs
Fixes #465 , Stacked on top of #463
What does this implement/fix? Explain your changes.
Adds 4 new MCP tools that complete the Data Management group of the ideal architecture:
inspect_datamtype(),check_is_mtype(),get_cutoff(), scitype detectionsplit_datatest_size(fraction) orfh(horizon count)temporal_train_test_split()transform_dataaction="format"(auto-fix freq/dupes/NaN) oraction="convert"(mtype conversion)convert_to(), frequency inference, fill NaN, dedupsave_datainspect_datareturns: mtype, scitype, shape, columns, dtypes, index_names, freq, cutoff, n_missing, head (first 5 rows), and summary_stats.split_datacreates two new data handles (train + test) and reports the cutoff timestamp. Supports both fractional (test_size=0.2) and horizon-based (fh=12) splitting.transform_datasubsumes the existingformat_time_seriestool asaction="format"and adds a newaction="convert"mode that callssktime.datatypes.convert_to()for mtype conversion.save_datacombinesyandXinto a single DataFrame and writes to disk in CSV (default), Parquet, or JSON format.Files created:
src/sktime_mcp/tools/inspect_data.pysrc/sktime_mcp/tools/split_data.pysrc/sktime_mcp/tools/transform_data.pysrc/sktime_mcp/tools/save_data.pytests/test_data_management.py(15 unit tests)Files modified:
src/sktime_mcp/server.py— Tool schemas + call_tool dispatcher routingsrc/sktime_mcp/tools/__init__.py— Updated exportsDoes your contribution introduce a new dependency? If yes, which one?
No. All tools use existing dependencies (
pandas,sktime).What should a reviewer concentrate their feedback on?
inspect_data— verify the mtype/scitype detection fallback logic is robustsplit_data— review the temporal splitting strategy and handle registrationtransform_data— confirm theconvert_to()integration handles edge casessave_data— verify thepathlib.Pathusage and format dispatchformat_time_seriestool remains available for backward compatibility (it does —transform_data(action="format")delegates to the same executor method)Any other comments?
All 189 tests pass cleanly under
make check(format + lint + pytest). The existingformat_time_seriestool is preserved for backward compatibility —transform_datawraps the same executor logic with a cleaner interface.PR checklist
For all contributions
I've added unit tests and made sure they pass locally (
make check).I've added the tool to the online documentation in
docs/source/.I've updated the existing example scripts or provided a new one to showcase how my tool works in
examples/.