Alek050 · Alek050 · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026 · Mar 24, 2026
diff --git a/.github/scripts/clean_kernelspecs.py b/.github/scripts/clean_kernelspecs.py
@@ -0,0 +1,14 @@
+import nbformat
+import glob
+
+for nb_path in glob.glob("**/*.ipynb", recursive=True):
+    with open(nb_path) as f:
+        nb = nbformat.read(f, as_version=4)
+    nb['metadata']['kernelspec'] = {
+        "name": "python3",
+        "display_name": "Python 3",
+        "language": "python"
+    }
+    with open(nb_path, 'w') as f:
+        nbformat.write(nb, f)
+
diff --git a/.github/workflows/build_jb.yml b/.github/workflows/build_jb.yml
@@ -24,5 +24,8 @@ jobs:
         run: |
           pip install -r practicals_jn_book/requirements.txt
 
+      - name: Clean notebook kernelspecs
+        run: python .github/scripts/clean_kernelspecs.py
+
       - name: Build documentation (only on macos-latest)
         run: jupyter-book build practicals_jn_book --all -W
diff --git a/big_data_environment.yml b/big_data_environment.yml
@@ -3,7 +3,7 @@ channels:
   - conda-forge
 dependencies:
   - python>=3.9
-  - pandas>=2.2
+  - pandas>=3.0.1
   - numpy>=2.2
   - openpyxl>=3.1
   - pyarrow>=19.0

diff --git a/practicals_jn_book/requirements.txt b/practicals_jn_book/requirements.txt
@@ -1,8 +1,9 @@
-pandas>=2.2
+pandas>=3.0.1
 numpy>=2.2
 scikit-learn==1.6.1
 seaborn==0.13.2
 scipy>=1.15
 matplotlib==3.10.0
 jupyter-book==1.0
 pyarrow>=19.0
+nbformat
diff --git a/practicals_jn_book/week_1/finalbook_part1.ipynb b/practicals_jn_book/week_1/finalbook_part1.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 11,
    "metadata": {
     "tags": [
      "hide-input"
@@ -13,9 +13,9 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "My Python version is: 3.11.1\n",
-      "My Numpy version is: 1.26.4\n",
-      "My Pandas version is: 2.2.2\n"
+      "My Python version is: 3.13.12\n",
+      "My Numpy version is: 2.4.2\n",
+      "My Pandas version is: 3.0.1\n"
      ]
     }
    ],
@@ -86,8 +86,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 12,
    "metadata": {
+    "scrolled": true,
     "tags": [
      "hide-input"
     ]
@@ -171,7 +172,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 13,
    "metadata": {
     "scrolled": true,
     "tags": [
@@ -198,7 +199,7 @@
       "4     Dennis Cornelius\n",
       "5          Brett Gibbs\n",
       "6           John Haack\n",
-      "Name: Name, dtype: object \n",
+      "Name: Name, dtype: str \n",
       "\n",
       "                Name\n",
       "0  Andrzej Stanaszek\n",
@@ -248,7 +249,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 14,
    "metadata": {
     "scrolled": true,
     "tags": [
@@ -294,91 +295,6 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Copy\n",
-    "\n",
-    "We already briefly mentioned that slicing in Pandas (and most other Python objects) returns a view not a copy. This might be a little counter intuitive if you're not familiar with general purpose programming languages (MATLAB is not). We do not want to mess with our lifter_df, so we will make a new dataframe for this assignment with three columns, ranging from 1-10:\n",
-    "```python\n",
-    "df1 = pd.DataFrame({\"X\": list(range(10)), \"Y\": list(range(10)), \"Z\": list(range(10))})\n",
-    "```\n",
-    "\n",
-    "### Assignment 3\n",
-    "\n",
-    "- **Make a slice of the first five rows using .iloc or .loc and assign it to a new variable.**\n",
-    "\n",
-    "- **Select all samples with .iloc or .loc and set all samples in the new variable to 0 and print the DataFrame.**\n",
-    "\n",
-    "- **Now print the original DataFrame. What do you notice?**\n",
-    "\n",
-    "You should get something like this:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {
-    "scrolled": true,
-    "tags": [
-     "hide-input"
-    ]
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Sliced df:\n",
-      "    X  Y  Z\n",
-      "0  0  0  0\n",
-      "1  0  0  0\n",
-      "2  0  0  0\n",
-      "3  0  0  0\n",
-      "4  0  0  0 \n",
-      "\n",
-      "Original df:\n",
-      "    X  Y  Z\n",
-      "0  0  0  0\n",
-      "1  0  0  0\n",
-      "2  0  0  0\n",
-      "3  0  0  0\n",
-      "4  0  0  0\n",
-      "5  5  5  5\n",
-      "6  6  6  6\n",
-      "7  7  7  7\n",
-      "8  8  8  8\n",
-      "9  9  9  9\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/var/folders/d6/sgv22vx10fb8mj7yrljzpkch0000gn/T/ipykernel_9819/483482325.py:3: SettingWithCopyWarning: \n",
-      "A value is trying to be set on a copy of a slice from a DataFrame\n",
-      "\n",
-      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
-      "  df2.loc[:] = 0\n"
-     ]
-    }
-   ],
-   "source": [
-    "df1 = pd.DataFrame({\"X\": list(range(10)), \"Y\": list(range(10)), \"Z\": list(range(10))})\n",
-    "df2 = df1.iloc[:5, :]\n",
-    "df2.loc[:] = 0\n",
-    "print(\"Sliced df:\\n\", df2, \"\\n\") # \\n gives you an empty line after your print statement for readability\n",
-    "print(\"Original df:\\n\", df1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "```{warning}\n",
-    "Oh no, you've not only altered df1, but also df2. This is because the slicing operation gave you a view into the DataFrame, but not a copy of the data. Again, this saves a lot of memory, but it can mess up your data! Luckily Pandas gives us a warning when we try to do this!\n",
-    "```\n",
-    "\n",
-    "To prevent this problem you can use the ``.copy()`` method which returns you a copy and not a view.\n",
-    "\n",
-    "Note: whether Pandas returns a copy or a view is actually a pretty delicate topic, but just assume you get a view and use ``.copy()`` when you plan on changing the contents of the DataFrame.\n",
     "\n",
     "## Accessors\n",
     "\n",
@@ -398,7 +314,7 @@
     "\n",
     "Cleaning up strings is a common operation in data science. Always check (your column names) for unwanted whitespace!\n",
     "\n",
-    "### Assignment 4\n",
+    "### Assignment 3\n",
     "\n",
     "Consider an entry like this: {\"Name\": \"ALEXEY Kuzmin\", \"Age\": 34, \"Totalkg\": 527.25}. We can add it to the dataframe using the [concat](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) method: \n",
     "\n",
@@ -423,7 +339,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 15,
    "metadata": {
     "tags": [
      "hide-input"
@@ -521,7 +437,7 @@
        "7      Alexey Kuzmin  34.0   527.25"
       ]
      },
-     "execution_count": 26,
+     "execution_count": 15,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -538,7 +454,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Assignment 5\n",
+    "### Assignment 4\n",
     "\n",
     "````{margin}\n",
     "```{admonition} Tip\n",
@@ -556,7 +472,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 16,
    "metadata": {
     "tags": [
      "hide-input"
@@ -672,7 +588,7 @@
        "7      Alexey Kuzmin    Alexey      Kuzmin  34.0   527.25"
       ]
      },
-     "execution_count": 27,
+     "execution_count": 16,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -713,7 +629,7 @@
     "df_lifters = df_lifters.dropna().sort_values(by=\"Totalkg\", ascending=False)\n",
     "```\n",
     "\n",
-    "### Assignment 6\n",
+    "### Assignment 5\n",
     "\n",
     "- **First sort all the data by Totalkg score, make sure the Totalkg is on top of your DataFrame. Print out the DataFrame. What do you notice?**\n",
     "\n",
@@ -728,7 +644,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 17,
    "metadata": {
     "tags": [
      "hide-input"
@@ -798,9 +714,9 @@
   "celltoolbar": "Tags",
   "hide_input": false,
   "kernelspec": {
-   "display_name": "big_data_environment",
+   "display_name": "Python [conda env:big_data_environment]",
    "language": "python",
-   "name": "python3"
+   "name": "conda-env-big_data_environment-py"
   },
   "language_info": {
    "codemirror_mode": {
@@ -812,7 +728,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.13.2"
+   "version": "3.13.12"
   },
   "toc": {
    "base_numbering": 1,