Updatecode by ritikaugale · Pull Request #2 · StructuralEquationModels/VoxelWiseSEM.jl

ritikaugale · 2026-06-03T13:28:08Z

No description provided.

Maximilian-Stefan-Ernst · 2026-06-09T07:15:39Z

+#    aws s3 sync \
+#    --no-sign-request \
+#    s3://openneuro.org/ds000224 \
+#    ./data/ds000224 \
+#    --exclude "*" \
+#    --include "sub-MSC01/ses-struct*/anat/*T1w*" \
+#    --include "sub-MSC01/ses-struct*/anat/*T2w*" \
+#    --include "sub-MSC02/ses-struct*/anat/*T1w*" \
+#    --include "sub-MSC02/ses-struct*/anat/*T2w*" \
+#    --include "participants.tsv" \
+#    --include "dataset_description.json"


We should find a variant that does not rely on a python package. Since the dataset is licensed as CC0, we could just redistribute the relevant files as julia artifacts (https://pkgdocs.julialang.org/v1/artifacts/).

Maximilian-Stefan-Ernst · 2026-06-09T07:27:23Z

+ref_path = joinpath(
+    dataset_dir,
+    measurements[1, :subject],
+    measurements[1, :session],
+    "anat",
+    measurements[1, :file]
+)


Maybe we should save the whole file path from the directory root in measurements

I have updated generate_measurements to save the full BIDS relative path under the :file column.

Maximilian-Stefan-Ernst · 2026-06-09T09:36:40Z


    n_row = maximum(measurements.subject_number)
    n_col = maximum(measurements.session_number)
+    n_voxels = isempty(coordinates) ? 0 : maximum(coordinates.voxel)


Since coordinates indexes the voxels by their original id, this produces a large array where all voxels preceding the ones in coordinates are missing.

changed the first dimension of vw_data to nrow(coordinates) rather than the maximum voxel ID and then added a voxel_idx column to coordinates to map each voxel to its respective index in the compact array.

So, a array with the size of number of voxels is created.

Maximilian-Stefan-Ernst · 2026-06-09T09:42:45Z

+# For this tutorial the array shape will be (max_voxel_index, 2, 2)
+# where 2 subjects and 2 sessions each contribute one T1w scan.
+
+vw_data = voxel_wise_data(dataset_dir, measurements, coordinates)


Atm, this generates an array with mostly missings - see my comment in voxel_wise_data.jl

Made changes now the array is sized exactly to the number of active brain voxels, so there are no unnecessary missings in arrray.

Maximilian-Stefan-Ernst · 2026-06-09T09:46:30Z

+
+# Save the log. condition_filename turns the named tuple into a filename string,
+# e.g. (modality="T1w",) → "modality_T1w.jld2"
+save_log(log, (modality = "T1w",))


This is throwing an error:

ERROR: MethodError: no method matching names(::@NamedTuple{modality::String})
The function names exists, but no method is defined for this combination of argument types.

Maximilian-Stefan-Ernst · 2026-06-09T09:47:56Z

+save_voxel_wise_data(vw_data, "data/vw_data.jld2")
+
+############################################################################################
+# STEP 3 — Preprocessing and logging


Maybe we could

add an outlier removal step

use a few more voxels not only in the middle of the brain, such that outliers etc. are actually detected and removed in this tutorial

Added outlier removal step and Expanded the mask to a larger size, 11×11×11 mask, with 1,331 voxels, so the outlier detection step worked.

Maximilian-Stefan-Ernst · 2026-06-09T10:04:39Z

+# For this tutorial we create a small
+# 5×5×5 voxel mask in the centre of the volume so the pipeline runs quickly on any machine.
+


We can create this mask and distribute it as an artefact together with the data - this way, it does not have to be created during the tutorial

Maximilian-Stefan-Ernst · 2026-06-09T10:17:43Z

+    # transpose to (n_sessions × n_subjects) as expected by the SEM
+    model_vox = replace_observed(
+        model;
+        data          = voxel_matrix',


maybe we should have the data in the correct shape from the beginning, so we dont have to transpose?

Maximilian-Stefan-Ernst · 2026-06-09T10:29:22Z

+println("\nresults (first 5 rows):")
+println(first(results, 5))


The results are super unstable because we only use data from 2 participants and 2 sessions - I think we should use a bit more data to at least have sensible results in this tutorial

Co-authored-by: Maximilian Ernst <34346372+Maximilian-Stefan-Ernst@users.noreply.github.com>

ritikaugale added 3 commits June 1, 2026 11:16

refactor: update step 1 and 2 processing

d046ec4

feat:Addpackage analysis file

82c14fe

docs:Add tutorial script

73ab00a

ritikaugale requested a review from Maximilian-Stefan-Ernst June 3, 2026 13:28

ritikaugale assigned Maximilian-Stefan-Ernst Jun 3, 2026

fix: update gitignore and generate_measurement.jl

f3695ef