Skip to content

Expose the cached bdf.parquet and bdf.csv files via url for external applications#1772

Draft
be-smith wants to merge 8 commits into
mainfrom
bes/exposing_echem_cache
Draft

Expose the cached bdf.parquet and bdf.csv files via url for external applications#1772
be-smith wants to merge 8 commits into
mainfrom
bes/exposing_echem_cache

Conversation

@be-smith

@be-smith be-smith commented May 26, 2026

Copy link
Copy Markdown
Member

This PR adds a parquet_url to the echem cycle block data, alongside the existing bdf_url, containing a link to the cached .bdf.parquet file. This is smaller and faster to load than the CSV and is more useful for external tools (e.g. datalab-plot).

It also adds a new route GET /items/<item_id>/blocks/<block_id>/bdf?format=parquet|csv that serves either cache file directly. If the cache doesn't yet exist (e.g. the block was added but never processed), the route generates it on demand before serving. This gives external analysis tools a single clean endpoint to pull processed echem data without needing to know the internal file structure. This route can also be used in the future for analysis blocks to fetch the fastest form of the echem data - similar to #1737

@cypress

cypress Bot commented May 26, 2026

Copy link
Copy Markdown

datalab    Run #5001

Run Properties:  status check passed Passed #5001  •  git commit f49684268f ℹ️: Merge 082b2006cd7dc823357df8b6c298c337809b4170 into 019fc0478b29555d59d0e8d14d1e...
Project datalab
Branch Review bes/exposing_echem_cache
Run status status check passed Passed #5001
Run duration 07m 55s
Commit git commit f49684268f ℹ️: Merge 082b2006cd7dc823357df8b6c298c337809b4170 into 019fc0478b29555d59d0e8d14d1e...
Committer Ben Smith
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 478
View all changes introduced in this branch ↗︎

…n identically names files from each directoru
@codecov

codecov Bot commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.35484% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.30%. Comparing base (b15e12e) to head (082b200).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
pydatalab/src/pydatalab/routes/v0_1/files.py 74.41% 11 Missing ⚠️
pydatalab/src/pydatalab/apps/echem/blocks.py 57.89% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1772      +/-   ##
==========================================
+ Coverage   78.81%   79.30%   +0.48%     
==========================================
  Files          83       83              
  Lines        7223     7401     +178     
==========================================
+ Hits         5693     5869     +176     
- Misses       1530     1532       +2     
Files with missing lines Coverage Δ
pydatalab/src/pydatalab/apps/echem/blocks.py 80.40% <57.89%> (-0.04%) ⬇️
pydatalab/src/pydatalab/routes/v0_1/files.py 64.56% <74.41%> (+5.04%) ⬆️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the echem “cycle” block output to expose a cached parquet BDF export via parquet_url and adds a dedicated endpoint for external tools to download either the cached parquet or CSV BDF representation for a given item/block.

Changes:

  • Add parquet_url alongside bdf_url when processing echem cycle blocks.
  • Add GET /items/<item_id>/blocks/<block_id>/bdf?format=parquet|csv to serve (and generate on-demand) cached BDF exports.
  • Add server tests covering parquet_url population and on-demand cache generation/serving.

Reviewed changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 3 comments.

File Description
pydatalab/src/pydatalab/apps/echem/blocks.py Produces parquet cache path and populates parquet_url in cycle block web data.
pydatalab/src/pydatalab/routes/v0_1/files.py Adds a new route to serve cached parquet/CSV and generate caches on-demand.
pydatalab/tests/server/test_echem_block.py Adds coverage for parquet_url and the new BDF cache download endpoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +319 to +327
pydatalab.mongo.flask_mongo.db.items.update_one(
{"item_id": item_id, f"blocks_obj.{block_id}": {"$exists": True}},
{
"$set": {
f"blocks_obj.{block_id}.bdf_url": block.data.get("bdf_url"),
f"blocks_obj.{block_id}.parquet_url": block.data.get("parquet_url"),
}
},
)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ml-evs this fix is simple enough, the solution still generates the cache file on read only. I personally think that populating the url so that the cache is visible is probably the right thing to do if we actually make the cache file. What are your thoughts?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I have implemented the fix so the urls won't be updated

Comment thread pydatalab/src/pydatalab/routes/v0_1/files.py
Comment thread pydatalab/src/pydatalab/routes/v0_1/files.py
@be-smith be-smith requested a review from ml-evs May 28, 2026 18:53
return jsonify({"status": "success"}), 200


@FILES.route("/items/<string:item_id>/blocks/<string:block_id>/bdf", methods=["GET"])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FILES.route("/files/<string:file_id>/<string:filename>/formats", methods=["GET"])

Maybe we think two routes generally for blocks? The above could return e.g.,

{"data": {"formats": ["bdf+csv", "bdf+parquet"]}

then a second route

@FILES.route("/files/<string:file_id>/<string:filename>/formats/<string:format>", methods=["GET"])

that can match on those keys and let you download the file.

@ml-evs ml-evs left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed lets put this on hold for now, but extract the changes to the echem block into another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants