EFS-OpenSource
diff --git a/‎docs/source/data.rst‎
Lines changed: 41 additions & 0 deletions b/‎docs/source/data.rst‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎examples/classification.ipynb‎
Lines changed: 46 additions & 23 deletions b/‎examples/classification.ipynb‎
Lines changed: 46 additions & 23 deletions
@@ -5,6 +5,47 @@ The Thetis evaluation library only requires the output of an AI model on a dedic
 Thus, the application requires the ground truth target labels (not the data itself!) and the according AI predictions.
 We give a detailed overview about the required data format in the following.
 
+Description
+-----------
+Thetis offers the option to also add a description of your AI solution,
+which is required for the creation of a technical documentation in accordance
+with Article 11 and Annex IV of the AI Act. These details **cannot** be
+automatically filled out and must be provided manually by the user to ensure
+the documentation's completeness and transparency.
+
+If certain fields in the description (as described below) are not filled in,
+they also remain empty in the technical documentation. Note that all keys must
+be specified in the dictionary, even if you intend to leave them empty in the
+report. In this case, select an empty character string as the value.
+
+Thetis expects a description in form of a dictionary with the following points
+(expected keywords are given in parentheses):
+    * title of your AI solution ("title"),
+    * model provider ("issuer"),
+    * internal contact person ("contact_intern"),
+    * external contact person ("contact_extern"),
+    * purpose of the model ("purpose"),
+    * dependencies ("requirements"),
+    * forms of distribution ("forms"),
+    * hardware details ("hardware") and
+    * UI description ("ui").
+
+An example python dictionary could look like this:
+
+.. code-block:: python
+
+    description: dict[str, str]={
+        "title": "Income Prediction (Demo)",
+        "issuer": "XYZ Demo Solutions GmbH",
+        "contact_intern": "Jon Doe",
+        "contact_extern": "Jane Doe",
+        "purpose": "The goal of this AI system is to estimate, based on demographic data, whether a person's income ...",
+        "requirements": "The system relies heavily on specific software versions and hardware requirements ...",
+        "forms": "The AI system is provided as a REST API, which can be operated in containerized environments (Docker, Kubernetes). ...",
+        "hardware": "The AI system is operated on powerful servers with ...",
+        "ui": "The system's user interface is designed so that insurance companies ...",
+    }
+
 Binary classification
 ---------------------
 In the case of binary classification, Thetis expects two instances of a `Pandas DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`__ :code:`pd.DataFrame`
 
@@ -53,23 +53,21 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
-   "outputs": [],
+   "cell_type": "code",
    "source": [
     "import logging\n",
-    "import os\n",
     "import sys\n",
     "\n",
-    "\n",
     "# Configure root logger as catch-all logging config\n",
     "logger = logging.getLogger(\"Thetis\")\n",
     "logger.setLevel(logging.INFO)\n",
     "handler = logging.StreamHandler(sys.stderr)\n",
     "handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))\n",
     "logger.addHandler(handler)"
-   ]
+   ],
+   "outputs": [],
+   "execution_count": null
   },
   {
    "cell_type": "markdown",
@@ -92,9 +90,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
-   "outputs": [],
    "source": [
     "import pandas as pd\n",
     "from sklearn.datasets import fetch_openml\n",
@@ -115,7 +111,9 @@
     "categorical_columns = [\"workclass\", \"occupation\"]\n",
     "df_train_cleared[categorical_columns] = df_train_cleared[categorical_columns].apply(lambda col: pd.Categorical(col).codes)\n",
     "df_test_cleared[categorical_columns] = df_test_cleared[categorical_columns].apply(lambda col: pd.Categorical(col).codes)"
-   ]
+   ],
+   "outputs": [],
+   "execution_count": null
   },
   {
    "cell_type": "markdown",
@@ -133,9 +131,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
-   "outputs": [],
    "source": [
     "from sklearn.ensemble import RandomForestClassifier\n",
     "\n",
@@ -146,7 +142,9 @@
     "# finally, make predictions on the validation dataset\n",
     "confidence = classifier.predict_proba(pd.get_dummies(df_test_cleared))\n",
     "labels = classifier.predict(pd.get_dummies(df_test_cleared))"
-   ]
+   ],
+   "outputs": [],
+   "execution_count": null
   },
   {
    "cell_type": "markdown",
@@ -168,14 +166,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
-   "outputs": [],
    "source": [
     "# use sensitive attributes during safety evaluation\n",
     "annotations = pd.DataFrame({\"target\": target_test, \"race\": df_test[\"race\"], \"sex\": df_test[\"sex\"]})\n",
     "predictions = pd.DataFrame({\"labels\": labels, \"confidence\": confidence[:, 1]}, index=annotations.index)"
-   ]
+   ],
+   "outputs": [],
+   "execution_count": null
   },
   {
    "cell_type": "markdown",
@@ -190,9 +188,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
-   "outputs": [],
    "source": [
     "# optional: store prediction and ground truth data on disk\n",
     "annotations.to_csv(\"adult_annotations.csv\")\n",
@@ -202,7 +198,9 @@
     "# important: specify \"index_col\" since Thetis matches the predictions/annotations by their indices\n",
     "loaded_annotations = pd.read_csv(\"adult_annotations.csv\", index_col=0)\n",
     "loaded_predictions = pd.read_csv(\"adult_predictions.csv\", index_col=0)"
-   ]
+   ],
+   "outputs": [],
+   "execution_count": null
   },
   {
    "cell_type": "markdown",
@@ -225,28 +223,53 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
+   "cell_type": "markdown",
+   "source": "Add a dictionary containing the AI solution information required for an AI law compliant technical report:"
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
+   "source": [
+    "description: dict[str, str]={\n",
+    "    \"title\": \"Income Prediction (Demo)\",\n",
+    "    \"issuer\": \"XYZ Demo Solutions GmbH\",\n",
+    "    \"contact_intern\": \"Jon Doe\",\n",
+    "    \"contact_extern\": \"Jane Doe\",\n",
+    "    \"purpose\": \"The goal of this AI system is to estimate, based on demographic data, whether a person's income ...\",\n",
+    "    \"requirements\": \"The system relies heavily on specific software versions and hardware requirements ...\",\n",
+    "    \"forms\": \"The AI system is provided as a REST API, which can be operated in containerized environments (Docker, Kubernetes). ...\",\n",
+    "    \"hardware\": \"The AI system is operated on powerful servers with ...\",\n",
+    "    \"ui\": \"The system's user interface is designed so that ...\",\n",
+    "}"
+   ],
    "outputs": [],
+   "execution_count": null
+  },
+  {
+   "metadata": {},
+   "cell_type": "code",
    "source": [
     "from thetis import thetis\n",
     "\n",
     "\n",
     "result = thetis(\n",
     "   config=\"demo_config_classification.yaml\",\n",
+    "   description=description,\n",
     "   annotations=annotations,\n",
     "   predictions=predictions,\n",
     "   output_dir=\"./output\",\n",
     "   license_file_path=\"demo_license_classification.dat\"\n",
     ")"
-   ]
+   ],
+   "outputs": [],
+   "execution_count": null
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
+   "cell_type": "code",
    "outputs": [],
+   "execution_count": null,
    "source": [
     "from IPython.display import IFrame\n",
     "IFrame(\"./output/report.pdf\", width=800, height=1024)"