Skip to content

Commit 8b172d2

Browse files
committed
Pull request #26: Release/0.2.3
Merge in FO00039/thetis-public-github from release/0.2.3 to main
1 parent 9793317 commit 8b172d2

9 files changed

Lines changed: 210 additions & 77 deletions

File tree

docs/source/data.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,47 @@ The Thetis evaluation library only requires the output of an AI model on a dedic
55
Thus, the application requires the ground truth target labels (not the data itself!) and the according AI predictions.
66
We give a detailed overview about the required data format in the following.
77

8+
Description
9+
-----------
10+
Thetis offers the option to also add a description of your AI solution,
11+
which is required for the creation of a technical documentation in accordance
12+
with Article 11 and Annex IV of the AI Act. These details **cannot** be
13+
automatically filled out and must be provided manually by the user to ensure
14+
the documentation's completeness and transparency.
15+
16+
If certain fields in the description (as described below) are not filled in,
17+
they also remain empty in the technical documentation. Note that all keys must
18+
be specified in the dictionary, even if you intend to leave them empty in the
19+
report. In this case, select an empty character string as the value.
20+
21+
Thetis expects a description in form of a dictionary with the following points
22+
(expected keywords are given in parentheses):
23+
* title of your AI solution ("title"),
24+
* model provider ("issuer"),
25+
* internal contact person ("contact_intern"),
26+
* external contact person ("contact_extern"),
27+
* purpose of the model ("purpose"),
28+
* dependencies ("requirements"),
29+
* forms of distribution ("forms"),
30+
* hardware details ("hardware") and
31+
* UI description ("ui").
32+
33+
An example python dictionary could look like this:
34+
35+
.. code-block:: python
36+
37+
description: dict[str, str]={
38+
"title": "Income Prediction (Demo)",
39+
"issuer": "XYZ Demo Solutions GmbH",
40+
"contact_intern": "Jon Doe",
41+
"contact_extern": "Jane Doe",
42+
"purpose": "The goal of this AI system is to estimate, based on demographic data, whether a person's income ...",
43+
"requirements": "The system relies heavily on specific software versions and hardware requirements ...",
44+
"forms": "The AI system is provided as a REST API, which can be operated in containerized environments (Docker, Kubernetes). ...",
45+
"hardware": "The AI system is operated on powerful servers with ...",
46+
"ui": "The system's user interface is designed so that insurance companies ...",
47+
}
48+
849
Binary classification
950
---------------------
1051
In the case of binary classification, Thetis expects two instances of a `Pandas DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`__ :code:`pd.DataFrame`

examples/classification.ipynb

Lines changed: 46 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -53,23 +53,21 @@
5353
]
5454
},
5555
{
56-
"cell_type": "code",
57-
"execution_count": null,
5856
"metadata": {},
59-
"outputs": [],
57+
"cell_type": "code",
6058
"source": [
6159
"import logging\n",
62-
"import os\n",
6360
"import sys\n",
6461
"\n",
65-
"\n",
6662
"# Configure root logger as catch-all logging config\n",
6763
"logger = logging.getLogger(\"Thetis\")\n",
6864
"logger.setLevel(logging.INFO)\n",
6965
"handler = logging.StreamHandler(sys.stderr)\n",
7066
"handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))\n",
7167
"logger.addHandler(handler)"
72-
]
68+
],
69+
"outputs": [],
70+
"execution_count": null
7371
},
7472
{
7573
"cell_type": "markdown",
@@ -92,9 +90,7 @@
9290
},
9391
{
9492
"cell_type": "code",
95-
"execution_count": null,
9693
"metadata": {},
97-
"outputs": [],
9894
"source": [
9995
"import pandas as pd\n",
10096
"from sklearn.datasets import fetch_openml\n",
@@ -115,7 +111,9 @@
115111
"categorical_columns = [\"workclass\", \"occupation\"]\n",
116112
"df_train_cleared[categorical_columns] = df_train_cleared[categorical_columns].apply(lambda col: pd.Categorical(col).codes)\n",
117113
"df_test_cleared[categorical_columns] = df_test_cleared[categorical_columns].apply(lambda col: pd.Categorical(col).codes)"
118-
]
114+
],
115+
"outputs": [],
116+
"execution_count": null
119117
},
120118
{
121119
"cell_type": "markdown",
@@ -133,9 +131,7 @@
133131
},
134132
{
135133
"cell_type": "code",
136-
"execution_count": null,
137134
"metadata": {},
138-
"outputs": [],
139135
"source": [
140136
"from sklearn.ensemble import RandomForestClassifier\n",
141137
"\n",
@@ -146,7 +142,9 @@
146142
"# finally, make predictions on the validation dataset\n",
147143
"confidence = classifier.predict_proba(pd.get_dummies(df_test_cleared))\n",
148144
"labels = classifier.predict(pd.get_dummies(df_test_cleared))"
149-
]
145+
],
146+
"outputs": [],
147+
"execution_count": null
150148
},
151149
{
152150
"cell_type": "markdown",
@@ -168,14 +166,14 @@
168166
},
169167
{
170168
"cell_type": "code",
171-
"execution_count": null,
172169
"metadata": {},
173-
"outputs": [],
174170
"source": [
175171
"# use sensitive attributes during safety evaluation\n",
176172
"annotations = pd.DataFrame({\"target\": target_test, \"race\": df_test[\"race\"], \"sex\": df_test[\"sex\"]})\n",
177173
"predictions = pd.DataFrame({\"labels\": labels, \"confidence\": confidence[:, 1]}, index=annotations.index)"
178-
]
174+
],
175+
"outputs": [],
176+
"execution_count": null
179177
},
180178
{
181179
"cell_type": "markdown",
@@ -190,9 +188,7 @@
190188
},
191189
{
192190
"cell_type": "code",
193-
"execution_count": null,
194191
"metadata": {},
195-
"outputs": [],
196192
"source": [
197193
"# optional: store prediction and ground truth data on disk\n",
198194
"annotations.to_csv(\"adult_annotations.csv\")\n",
@@ -202,7 +198,9 @@
202198
"# important: specify \"index_col\" since Thetis matches the predictions/annotations by their indices\n",
203199
"loaded_annotations = pd.read_csv(\"adult_annotations.csv\", index_col=0)\n",
204200
"loaded_predictions = pd.read_csv(\"adult_predictions.csv\", index_col=0)"
205-
]
201+
],
202+
"outputs": [],
203+
"execution_count": null
206204
},
207205
{
208206
"cell_type": "markdown",
@@ -225,28 +223,53 @@
225223
]
226224
},
227225
{
228-
"cell_type": "code",
229-
"execution_count": null,
230226
"metadata": {},
227+
"cell_type": "markdown",
228+
"source": "Add a dictionary containing the AI solution information required for an AI law compliant technical report:"
229+
},
230+
{
231+
"metadata": {},
232+
"cell_type": "code",
233+
"source": [
234+
"description: dict[str, str]={\n",
235+
" \"title\": \"Income Prediction (Demo)\",\n",
236+
" \"issuer\": \"XYZ Demo Solutions GmbH\",\n",
237+
" \"contact_intern\": \"Jon Doe\",\n",
238+
" \"contact_extern\": \"Jane Doe\",\n",
239+
" \"purpose\": \"The goal of this AI system is to estimate, based on demographic data, whether a person's income ...\",\n",
240+
" \"requirements\": \"The system relies heavily on specific software versions and hardware requirements ...\",\n",
241+
" \"forms\": \"The AI system is provided as a REST API, which can be operated in containerized environments (Docker, Kubernetes). ...\",\n",
242+
" \"hardware\": \"The AI system is operated on powerful servers with ...\",\n",
243+
" \"ui\": \"The system's user interface is designed so that ...\",\n",
244+
"}"
245+
],
231246
"outputs": [],
247+
"execution_count": null
248+
},
249+
{
250+
"metadata": {},
251+
"cell_type": "code",
232252
"source": [
233253
"from thetis import thetis\n",
234254
"\n",
235255
"\n",
236256
"result = thetis(\n",
237257
" config=\"demo_config_classification.yaml\",\n",
258+
" description=description,\n",
238259
" annotations=annotations,\n",
239260
" predictions=predictions,\n",
240261
" output_dir=\"./output\",\n",
241262
" license_file_path=\"demo_license_classification.dat\"\n",
242263
")"
243-
]
264+
],
265+
"outputs": [],
266+
"execution_count": null
244267
},
245268
{
246-
"cell_type": "code",
247-
"execution_count": null,
248269
"metadata": {},
270+
"cell_type": "code",
249271
"outputs": [],
272+
"execution_count": null,
250273
"source": [
251274
"from IPython.display import IFrame\n",
252275
"IFrame(\"./output/report.pdf\", width=800, height=1024)"

0 commit comments

Comments
 (0)