206 dimensionality reduction on proteins by ferbsx · Pull Request #390 · cschlaffner/PROTzilla

ferbsx · 2026-04-29T10:43:33Z

Description

fixes #206
Added dimensionality reduction on protein level.

Changes

Before: dimensionality reduction only done on sample level (per default, not customisable)
Now: added drop down to choose which level the dimensionality reduction should be applied on.

Options:

Sample [default]
Protein ID

Adapted scatter plot requirements to allow plotting of the results based on Protein ID.

Testing

Former tests are updated to pass with the new structure. New tests added for all functionalities.

PR checklist

Development

If necessary, I have updated the documentation (README, docstrings, etc.)
If necessary, I have created / updated tests.

Mergeability

main-branch has been merged into local branch to resolve conflicts
The tests and linter have passed AFTER local merge [only GSEA tests are failing which has been discussed]
The backend code has been formatted with black
The frontend code has been formatted with pnpm format and checked with pnpm lint

Code review

I have self-reviewed my code.
At least one other developer reviewed and approved the changes

…Protein ID

hendraet

Works as intended, but needs some minor refinements.

However, in its current state, the PR might not be as useful as imagined because I underspecified the issue. What would be needed to increase usefulness:

Hover annotations for each data point in the scatter plot. Currently, we lose all information about Samples/Protein IDs in the scatter plot, which would be especially helpful in the protein case. (Could be an easy fix, currently we just deliberately exclude this information from the plot, but it is passed to the function)
Having no metadata available to color proteins in a scatter plot is also less than ideal. However, we currently cannot import any metadata that does not include a Sample column, so one could probably only do it in a hacky way.

hendraet · 2026-05-13T05:59:46Z

                ),
+                DropdownField(
+                    name="sample_name",
+                    label="Choose the column that contains the sample information",


Calling it sample and the variable sample_name might confuse the user because it is too close to the actual "Sample" column and it might not be clear that "Protein ID" could also be a valid "sample column" in this case.

hendraet · 2026-05-13T06:29:38Z


    :return: returns a dictionary containing a list with a plotly figure and/or a list of messages
    """
    if isinstance(metadata_df, pd.DataFrame):


If you want to plot proteins instead of samples, currently, you cannot connect a metadata dataframe - otherwise, this if and its error are triggered. I feel like we should make it more transparent to the users that they shouldn't connect metadata in this case

hendraet · 2026-05-13T06:37:47Z

-    return pd.pivot(
-        intensity_df, index="Sample", columns="Protein ID", values=values_name
-    )
+    return pd.pivot(intensity_df, index=index, columns=columns, values=values_name)


Maybe we should add some guardrails to make sure that index and columns are not the same, because this will lead to a cryptic error message. (This should ideally never happen, but if I had a penny for every time I didn't check assumptions because they could never possibly happen...)

hendraet · 2026-05-13T06:46:02Z

    input_df: pd.DataFrame,
    metadata_df: pd.DataFrame | None = None,
    metadata_column: str | None = None,
+    sample_name: str = "Sample",


See comment above, I feel like sample_name is misleading.
However, it seems like it is only for metadata processing. Since metadata is enforced to include a "Sample" column (different problem), and using "Protein ID" together with metadata will lead to errors anyway, one could probably also revert the whole sample_name completely.

hendraet · 2026-05-13T06:58:43Z

+    "df_name,n_components,method",
+    [
+        ("dimension_reduction_df", 2, TSNEMethod.exact.value),
+        ("dimension_reduction_four_proteins_df", 2, TSNEMethod.exact.value),


Would like to have all of these tests also test 3 components

hendraet · 2026-05-13T06:59:27Z

                "The column selected for annotation is not present in the corresponding metadata dataframe.",
            )

+    if sample_name not in input_df.columns:


Would like to see a test that checks the raising of this ValueError

hendraet · 2026-05-13T07:00:57Z

        )


+@pytest.mark.parametrize("n_components", [2])


Either use more than one value for n_compontents or don't parametrize at all

ferbsx and others added 8 commits April 29, 2026 12:31

add option to select protein IDs as the dimensionality reduction value

ef746c1

update test for dimentionsality reduction to inclue new funtion format

de1f012

change default dimension reduction value in dropdown

5470a5a

adapt wording for test_tsne_perplexity

06e708b

black formatting

c0dc7da

adding tests

215ec10

updating test based on changes made to underlying function

953b986

minor typo

156274c

jorisfu added the hackathon Viable issue for the April 2026 PROTzilla hackathon label Apr 29, 2026

Yanjo96 added 3 commits April 30, 2026 08:18

adding Dropdown for selecting the sample name for scatter plot

46f5cee

adding parameter to make required optional for choices from metadata

1c28001

remove hardcoded Sample from scatter_plot to allow other values like …

5e23b4e

…Protein ID

Yanjo96 marked this pull request as ready for review April 30, 2026 06:37

ferbsx requested a review from hendraet April 30, 2026 06:38

ferbsx assigned ferbsx and Yanjo96 Apr 30, 2026

ferbsx linked an issue Apr 30, 2026 that may be closed by this pull request

Dimensionality Reduction on Proteins #206

Open

3 tasks

Elena-kal requested review from tE3m May 6, 2026 09:34

hendraet reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

206 dimensionality reduction on proteins#390

206 dimensionality reduction on proteins#390
ferbsx wants to merge 11 commits into
devfrom
206-dimensionality-reduction-on-proteins

ferbsx commented Apr 29, 2026 •

edited

Loading

Uh oh!

hendraet left a comment

Uh oh!

hendraet May 13, 2026

Uh oh!

hendraet May 13, 2026

Uh oh!

hendraet May 13, 2026

Uh oh!

hendraet May 13, 2026

Uh oh!

hendraet May 13, 2026

Uh oh!

hendraet May 13, 2026

Uh oh!

hendraet May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		)


		@pytest.mark.parametrize("n_components", [2])

Conversation

ferbsx commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Testing

PR checklist

Uh oh!

hendraet left a comment

Choose a reason for hiding this comment

Uh oh!

hendraet May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hendraet May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hendraet May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hendraet May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hendraet May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hendraet May 13, 2026

Choose a reason for hiding this comment

Uh oh!

hendraet May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ferbsx commented Apr 29, 2026 •

edited

Loading