Expose the cached bdf.parquet and bdf.csv files via url for external applications by be-smith · Pull Request #1772 · datalab-org/datalab

be-smith · 2026-05-26T11:58:58Z

This PR adds a parquet_url to the echem cycle block data, alongside the existing bdf_url, containing a link to the cached .bdf.parquet file. This is smaller and faster to load than the CSV and is more useful for external tools (e.g. datalab-plot).

It also adds a new route GET /items/<item_id>/blocks/<block_id>/bdf?format=parquet|csv that serves either cache file directly. If the cache doesn't yet exist (e.g. the block was added but never processed), the route generates it on demand before serving. This gives external analysis tools a single clean endpoint to pull processed echem data without needing to know the internal file structure. This route can also be used in the future for analysis blocks to fetch the fastest form of the echem data - similar to #1737

…nerate the cache if not already present

… if we already generated it through this route

… populated.

cypress · 2026-05-26T12:40:40Z

datalab Run #5001

Run Properties: Passed #5001 • f49684268f ℹ️: Merge 082b2006cd7dc823357df8b6c298c337809b4170 into 019fc0478b29555d59d0e8d14d1e...

Project	`datalab`
Branch Review	`bes/exposing_echem_cache`
Run status	`Passed #5001`
Run duration	`07m 55s`
Commit	`f49684268f ℹ️: Merge 082b2006cd7dc823357df8b6c298c337809b4170 into 019fc0478b29555d59d0e8d14d1e...`
Committer	`Ben Smith`
View all properties for this run ↗︎

Test results
Failures	`0`
Flaky	`0`
Pending	`0`
Skipped	`0`
Passing	`478`
View all changes introduced in this branch ↗︎

…n identically names files from each directoru

codecov · 2026-05-28T13:25:31Z

Codecov Report

❌ Patch coverage is 69.35484% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.30%. Comparing base (b15e12e) to head (082b200).
⚠️ Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
pydatalab/src/pydatalab/routes/v0_1/files.py	74.41%	11 Missing ⚠️
pydatalab/src/pydatalab/apps/echem/blocks.py	57.89%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1772      +/-   ##
==========================================
+ Coverage   78.81%   79.30%   +0.48%     
==========================================
  Files          83       83              
  Lines        7223     7401     +178     
==========================================
+ Hits         5693     5869     +176     
- Misses       1530     1532       +2

Files with missing lines	Coverage Δ
pydatalab/src/pydatalab/apps/echem/blocks.py	`80.40% <57.89%> (-0.04%)`	⬇️
pydatalab/src/pydatalab/routes/v0_1/files.py	`64.56% <74.41%> (+5.04%)`	⬆️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR extends the echem “cycle” block output to expose a cached parquet BDF export via parquet_url and adds a dedicated endpoint for external tools to download either the cached parquet or CSV BDF representation for a given item/block.

Changes:

Add parquet_url alongside bdf_url when processing echem cycle blocks.
Add GET /items/<item_id>/blocks/<block_id>/bdf?format=parquet|csv to serve (and generate on-demand) cached BDF exports.
Add server tests covering parquet_url population and on-demand cache generation/serving.

Reviewed changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 3 comments.

File	Description
pydatalab/src/pydatalab/apps/echem/blocks.py	Produces parquet cache path and populates `parquet_url` in cycle block web data.
pydatalab/src/pydatalab/routes/v0_1/files.py	Adds a new route to serve cached parquet/CSV and generate caches on-demand.
pydatalab/tests/server/test_echem_block.py	Adds coverage for `parquet_url` and the new BDF cache download endpoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

be-smith · 2026-05-28T18:28:35Z

+            pydatalab.mongo.flask_mongo.db.items.update_one(
+                {"item_id": item_id, f"blocks_obj.{block_id}": {"$exists": True}},
+                {
+                    "$set": {
+                        f"blocks_obj.{block_id}.bdf_url": block.data.get("bdf_url"),
+                        f"blocks_obj.{block_id}.parquet_url": block.data.get("parquet_url"),
+                    }
+                },
+            )


@ml-evs this fix is simple enough, the solution still generates the cache file on read only. I personally think that populating the url so that the cache is visible is probably the right thing to do if we actually make the cache file. What are your thoughts?

Currently I have implemented the fix so the urls won't be updated

…tem and filename has an expected cache extension

…ile URLs

ml-evs · 2026-05-29T09:40:34Z

    return jsonify({"status": "success"}), 200
+
+
+@FILES.route("/items/<string:item_id>/blocks/<string:block_id>/bdf", methods=["GET"])


@FILES.route("/files/<string:file_id>/<string:filename>/formats", methods=["GET"])

Maybe we think two routes generally for blocks? The above could return e.g.,

{"data": {"formats": ["bdf+csv", "bdf+parquet"]}

then a second route

@FILES.route("/files/<string:file_id>/<string:filename>/formats/<string:format>", methods=["GET"])

that can match on those keys and let you download the file.

ml-evs

As discussed lets put this on hold for now, but extract the changes to the echem block into another PR

be-smith added 5 commits May 22, 2026 15:54

Added parquet path to block data for Cycleblock

d2d61ef

Added endpoint for getting either the .csv or .parquet cache. Will ge…

fdd7b46

…nerate the cache if not already present

Minor bug fix

78a8fd6

Ensure urls are updated in mongodb - no false generation of the cache…

fe49b9a

… if we already generated it through this route

Added tests for the route and making sure bdf_url and parquet_url are…

5b3fa38

… populated.

Added __init__.py to tests/server and tests/app to distinguish betwee…

e66b41d

…n identically names files from each directoru

ml-evs requested a review from Copilot May 28, 2026 13:46

Copilot started reviewing on behalf of ml-evs May 28, 2026 13:47 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

be-smith added 2 commits May 28, 2026 19:51

Add BDF cache endpoint validation: check source file is attached to i…

88aa55f

…tem and filename has an expected cache extension

Add tests for BDF cache endpoint rejecting unexpected or unattached f…

082b200

…ile URLs

be-smith requested a review from ml-evs May 28, 2026 18:53

ml-evs reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose the cached bdf.parquet and bdf.csv files via url for external applications#1772

Expose the cached bdf.parquet and bdf.csv files via url for external applications#1772
be-smith wants to merge 8 commits into
mainfrom
bes/exposing_echem_cache

be-smith commented May 26, 2026 •

edited

Loading

Uh oh!

cypress Bot commented May 26, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

be-smith May 28, 2026

Uh oh!

be-smith May 28, 2026

Uh oh!

Uh oh!

Uh oh!

ml-evs May 29, 2026

Uh oh!

ml-evs left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return jsonify({"status": "success"}), 200


		@FILES.route("/items/<string:item_id>/blocks/<string:block_id>/bdf", methods=["GET"])

Conversation

be-smith commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cypress Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

datalab Run #5001

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

be-smith May 28, 2026

Choose a reason for hiding this comment

Uh oh!

be-smith May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ml-evs May 29, 2026

Choose a reason for hiding this comment

Uh oh!

ml-evs left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

be-smith commented May 26, 2026 •

edited

Loading

cypress Bot commented May 26, 2026 •

edited

Loading

codecov Bot commented May 28, 2026 •

edited

Loading