Skip to content

fix: make rocket fin example run on MacOS#541

Open
nikolasborrel wants to merge 9 commits intomainfrom
nikolasborrel_fix_rocket_fin_example
Open

fix: make rocket fin example run on MacOS#541
nikolasborrel wants to merge 9 commits intomainfrom
nikolasborrel_fix_rocket_fin_example

Conversation

@nikolasborrel
Copy link
Copy Markdown

@nikolasborrel nikolasborrel commented Mar 24, 2026

Relevant issue or PR

N/A

Description of changes

Demo notebook (optimization_os.ipynb):

  • Pre-serve bars_3d as a persistent container and pass it via URL reference ({"type": "url", ...}) instead of {"type": "image", ...}, preventing a new container being spawned on every optimization iteration
  • Serving design_tess with design_tess = Tesseract.from_image("sdf_fd", network=NETWORK, network_alias="sdf_fd")
    Requirements — remove strict version pins across all showcase tesseracts:
  • sdf_fd, bars_3d: upgrade pyvista (unpinned) to fix vtk wheel incompatibility with Python 3.11
  • Updated pinned requirements deps
  • Updated to tesseract-core==1.6.0`

Platform configs:

  • tesseract_config.yaml: change target_platform from linux/x86_64 to native to avoid QEMU emulation on Apple Silicon

Testing done

  • Running notebook manually

NOTE: suddenly jax_fem produces NaNs - To me the inputs looks correct, maybe some dependencies have been updated...

File "/python-env/lib/python3.12/site-packages/jax_fem/solver.py", line 415, in solver
    assert np.all(np.isfinite(res_val)), f"res_val contains NaN, stop the program!"
AssertionError: res_val contains NaN, stop the program!

@PasteurBot
Copy link
Copy Markdown
Contributor

PasteurBot commented Mar 24, 2026

CLA signatures required

Thank you for your PR, we really appreciate it! Like many open-source projects, we ask that all contributors sign our Contributor License Agreement before we can accept your contribution. This only needs to be done once per contributor. You can do so by commenting the following on this pull request:


@PasteurBot I have read the CLA Document and I hereby sign the CLA


1 out of 2 committers have signed the CLA.
✅ (dionhaefner)[https://github.com/dionhaefner]
@nikolasborrel
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@nikolasborrel nikolasborrel force-pushed the nikolasborrel_fix_rocket_fin_example branch from f0a8c1e to 6cca239 Compare March 24, 2026 13:50
@nikolasborrel nikolasborrel requested a review from andrinr March 24, 2026 13:50
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.16%. Comparing base (61a8140) to head (f40d6d8).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #541   +/-   ##
=======================================
  Coverage   77.16%   77.16%           
=======================================
  Files          32       32           
  Lines        4418     4418           
  Branches      728      728           
=======================================
  Hits         3409     3409           
  Misses        714      714           
  Partials      295      295           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PasteurBot
Copy link
Copy Markdown
Contributor

PasteurBot commented Mar 24, 2026

Benchmark Results

Benchmarks use a no-op Tesseract to measure pure framework overhead.

🚀 0 faster, ⚠️ 0 slower, ✅ 36 unchanged

✅ No significant performance changes detected.

Full results
Benchmark Baseline Current Change Status
api/apply_1,000 0.777ms 0.778ms +0.2%
api/apply_100,000 0.775ms 0.780ms +0.7%
api/apply_10,000,000 0.774ms 0.784ms +1.2%
cli/apply_1,000 1717.849ms 1768.373ms +2.9%
cli/apply_100,000 1717.264ms 1760.658ms +2.5%
cli/apply_10,000,000 1760.412ms 1815.597ms +3.1%
decoding/base64_1,000 0.037ms 0.037ms +1.1%
decoding/base64_100,000 0.898ms 0.898ms -0.0%
decoding/base64_10,000,000 99.110ms 99.925ms +0.8%
decoding/binref_1,000 0.200ms 0.202ms +1.3%
decoding/binref_100,000 0.238ms 0.246ms +3.2%
decoding/binref_10,000,000 10.760ms 10.731ms -0.3%
decoding/json_1,000 0.106ms 0.107ms +0.7%
decoding/json_100,000 8.975ms 9.157ms +2.0%
decoding/json_10,000,000 1069.577ms 1088.502ms +1.8%
encoding/base64_1,000 0.041ms 0.040ms -1.7%
encoding/base64_100,000 0.147ms 0.146ms -0.4%
encoding/base64_10,000,000 25.042ms 25.573ms +2.1%
encoding/binref_1,000 0.304ms 0.301ms -1.0%
encoding/binref_100,000 0.475ms 0.480ms +0.9%
encoding/binref_10,000,000 18.720ms 18.655ms -0.3%
encoding/json_1,000 0.154ms 0.153ms -0.8%
encoding/json_100,000 13.141ms 13.264ms +0.9%
encoding/json_10,000,000 1412.205ms 1414.121ms +0.1%
http/apply_1,000 3.686ms 3.764ms +2.1%
http/apply_100,000 10.499ms 10.089ms -3.9%
http/apply_10,000,000 771.874ms 779.930ms +1.0%
roundtrip/base64_1,000 0.090ms 0.090ms -0.3%
roundtrip/base64_100,000 1.056ms 1.053ms -0.3%
roundtrip/base64_10,000,000 125.623ms 125.896ms +0.2%
roundtrip/binref_1,000 0.527ms 0.526ms -0.3%
roundtrip/binref_100,000 0.726ms 0.724ms -0.2%
roundtrip/binref_10,000,000 30.676ms 30.172ms -1.6%
roundtrip/json_1,000 0.273ms 0.275ms +0.6%
roundtrip/json_100,000 20.145ms 19.840ms -1.5%
roundtrip/json_10,000,000 2499.219ms 2482.646ms -0.7%
  • Runner: Linux 6.17.0-1008-azure x86_64

@nikolasborrel
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@andrinr
Copy link
Copy Markdown
Contributor

andrinr commented Mar 24, 2026

Nice @nikolasborrel!

I am not sure about unpinning all the requirements; in my experience, some used packages, such as pyvsita can be very fragile. I would be ok with doing it, if the respective tesseracts would be built as part of the CI similar to t-jax.

Pre-serve bars_3d as a persistent container and pass it via URL reference ({"type": "url", ...}) instead of {"type": "image", ...}, preventing a new container being spawned on every optimization iteration

That makes a lot of sense, but are you are sure this is the case currently? Ill summon the king of hot @linusseelinger

Add tesseract_core.teardown(tear_all=True) before serving containers to clean up stale containers from previous runs

Not entirely sure about that one either, to me this seems to be on the user side but surely it is helpful. But maybe the user has other tesseracts that are running for other use cases?

@dionhaefner dionhaefner self-assigned this Mar 25, 2026
@andrinr
Copy link
Copy Markdown
Contributor

andrinr commented Mar 25, 2026

@nikolasborrel I dont think tesseracts have a timeout. So I dont think this shoudl be necessary:

"# Individual requests can run for minutes (e.g. Jacobian FD steps), which is fine —\n",
"# the server keeps an active connection open as long as needed. However, uvicorn\n",
"# closes *idle* keep-alive connections after 5s. Between requests, the JAX gradient\n",
"# computation (differentiating through the 36x128x64x64 Jacobian) can take well\n",
"# over 5s, leaving the connection idle long enough for the server to close it.\n",
"# The next request on the stale connection then fails with ConnectionResetError(54).\n",
"# Setting Connection: close makes each request use a fresh TCP connection, avoiding\n",
"# the issue entirely.\n",

@dionhaefner
Copy link
Copy Markdown
Contributor

To me this PR highlights at least 4 (functional / UX) bugs:

  1. logger.log accepts only strings, not arbitrary Python objects (how was this triggered though?).
  2. Containers aren't cleaned up properly when passing images as TesseractReference.
  3. HTTP sessions can time out.
  4. Docker path issues.

To address these properly we'd need independent reproducers for each. This is not all on you @nikolasborrel, but we do need more input at least on (1) and (4) to figure out why you ran into these issues. (I could reproduce (2) already, haven't looked into (3) yet.)

On top of that there are the changes to the demo itself, that is, upgrading/unpinning all dependencies and general cleanup/optimization. Those can be valuable too, but are probably best reviewed together once the bugs ☝️ are squashed.

@nikolasborrel
Copy link
Copy Markdown
Author

nikolasborrel commented Mar 25, 2026

uvicorn

I did a minimal example to reproduce, but could not. So I think you are right - I thought the issue was caused byunicorn used under the hood by Tesseract having a default timeout of 5s as described here: https://uvicorn.dev/settings/#timeouts

It resolved the issues I had, but must be something else. I will investigate further - stay tuned!

UPDATE: cannot reproduce this issue anymore

@nikolasborrel
Copy link
Copy Markdown
Author

To me this PR highlights at least 4 (functional / UX) bugs:

  1. logger.log accepts only strings, not arbitrary Python objects (how was this triggered though?).

I cannot reproduce anymore and have reverted. I was having this issue before removing pinned versions and using the tesseract-core repo code instead of using a released version.

  1. Containers aren't cleaned up properly when passing images as TesseractReference.

I was under the impression that this was intended and that the solution is to reference a running Tesseract through http (I will not investigate this further).

  1. HTTP sessions can time out.

Cannot reproduce anymore - I have reverted the Disable HTTP keep-alive change.

  1. Docker path issues.

The reason is that the shell startup files are not sourced when running the notebook (specifically on macOS) and hence /usr/local/bin was missing from env PATH. I have simplified the cell code.

The remaining issues are hence 2) and eventually 3).

@dionhaefner
Copy link
Copy Markdown
Contributor

Re 1: I can reproduce it (just add logger.info({}) anywhere). I'm just wondering when that would occur naturally, since we currently only use loggers internally. Regardless it could make sense to be defensive and cast messages to strings instead of crashing.

Re 4: Why do you not have /usr/local/bin on PATH by default? I do on OSX 🤔

@nikolasborrel
Copy link
Copy Markdown
Author

Re 4: Why do you not have /usr/local/bin on PATH by default? I do on OSX 🤔

Turned out it was a shell issue on my side and should not happen in general on OSX. I have removed this check. However, the error message thrown when docker is not available is not very clear, we could improve this.

dionhaefner added a commit that referenced this pull request Mar 26, 2026
dionhaefner added a commit that referenced this pull request Mar 27, 2026
…ance when HTTP sessions time out (#543)

#### Relevant issue or PR

Stuff that cropped up in #541

#### Description of changes

**Bug 1: `logger.info({})` crashes RichLogger**
- `PrefixFormatter.format()` called `escape(record.msg)` without `str()`
conversion first
- Fix: `escape(str(record.msg))`, matching vanilla logging behavior

**Bug 2: `TesseractReference` type="image" leaks containers in loops**
- `atexit.register(self.teardown)` held a strong ref to `self`,
preventing GC
- Fix: replace with `weakref.finalize(self, engine.teardown,
container_name)`

**Bug 3: Stale HTTP keep-alive connections cause `ConnectionError`**
- Race between urllib3's `is_connection_dropped` check and uvicorn
closing idle connections
- Fix: catch `ConnectionError` in `HTTPClient._request` and retry once


#### Testing done

New tests reproducing the bugs, fail first (before pushing fixes) then
pass.
Copy link
Copy Markdown
Contributor

@dionhaefner dionhaefner Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep all deps pinned please, just update them to current versions. (This gives us at least some level of confidence that the demo keeps working.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting here because GH rich diff doesn't allow me to add comments inline.

  1. What's the logging setup for? I can't see any log messages in the notebook so seems redundant?
  2. If you're up for it, you could try using from_image for design_tess as well since we're now passing in the bars tess via URL. Might require adding them to a shared network though (see doc: document how to use tesseract serve --network parameter #530).
  3. Could add a tqdm or rich progress bar to the optimization loop?

@nikolasborrel nikolasborrel force-pushed the nikolasborrel_fix_rocket_fin_example branch from 92c2fce to fdb7912 Compare April 1, 2026 22:46
@nikolasborrel nikolasborrel force-pushed the nikolasborrel_fix_rocket_fin_example branch from fdb7912 to 438982f Compare April 1, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants