diff --git a/medcat-trainer/docs/admin_setup.md b/medcat-trainer/docs/admin_setup.md index e25f739d1..96bf88e7b 100644 --- a/medcat-trainer/docs/admin_setup.md +++ b/medcat-trainer/docs/admin_setup.md @@ -1,25 +1,48 @@ # Administrator Setup -1\. The container runs a vanilla [django](https://www.djangoproject.com/) app, that upon initially loaded -will create a defaulted administrator user with details: +This page covers first-login admin hardening and user setup. -
-    username: admin
-    password: admin
-
+## 1) Configure bootstrap admin credentials -2\. We strongly recommend creating a new admin user before using the trainer in 'production' and storing sensitive -clinical documents on the trainer. To add a new user navigate to select `http://localhost:8001/admin/` and select 'Users'. +Before first startup in production-like environments, set: -![](_static/img/users-select.png) +- `MCTRAINER_BOOTSTRAP_ADMIN_USERNAME` +- `MCTRAINER_BOOTSTRAP_ADMIN_EMAIL` +- `MCTRAINER_BOOTSTRAP_ADMIN_PASSWORD` -3\. Select 'Add User' and complete the form with a new username / password. +If not set, MedCATtrainer defaults to `admin` / `admin`, which is not suitable +for production. -![](_static/img/add-new-users.png) +## 2) Sign in and create operational admin users -4\. Once created, select the new user, and tick the 'Staff Status' or 'Superuser Status' to allow the user to -access the admin app. +You can manage users from: -5\. Remove the default admin user by navigating to step 2, select the user and the action +- **Project Admin UI** (`/project-admin`) for day-to-day project operations +- **Django Admin** (`/admin`) for full platform administration -![](_static/img/remove-default-user.png) +In Django admin (`/admin`), create at least one dedicated administrator account +and grant: + +- `Staff status` for admin access +- `Superuser status` for full unrestricted access + +## 3) Create annotator users + +Create users for annotators and add them to project membership lists. +Annotators do not need staff/superuser flags. + +## 4) Remove or rotate bootstrap credentials + +After creating named administrator accounts: + +- remove the default bootstrap account if it is no longer needed, or +- rotate its password and store credentials securely. + +## 5) If using OIDC + +When `USE_OIDC=1`, user permissions are mapped from IdP roles: + +- `medcattrainer_superuser` -> Django superuser + staff +- `medcattrainer_staff` -> Django staff + +Ensure role assignment is correct in Keycloak before onboarding users. diff --git a/medcat-trainer/docs/advanced_usage.md b/medcat-trainer/docs/advanced_usage.md index 60bb35b3d..f1c2687e8 100644 --- a/medcat-trainer/docs/advanced_usage.md +++ b/medcat-trainer/docs/advanced_usage.md @@ -1,3 +1,78 @@ # Advanced Usage -- ReST API Usage for bulk dataset / project creation: available in: docs/API_Examples.ipynb \ No newline at end of file +This page covers API-first workflows and power-user features. + +## Notebook examples + +The repository includes notebook examples: + +- `notebook_docs/API_Examples.ipynb` +- `notebook_docs/Processing_Annotations.ipynb` + +These are useful for bulk project creation, export processing, and automation. + +## REST API basics + +Base path: `/api/` + +Common endpoints: + +- `GET/POST /api/project-annotate-entities/` +- `GET/POST /api/datasets/` +- `GET/POST /api/modelpacks/` +- `GET/POST /api/concept-dbs/` +- `GET/POST /api/vocabs/` + +### Authentication + +- Local auth token: `POST /api/api-token-auth/` +- OIDC bearer token (if enabled): send `Authorization: Bearer ` + +## Project Admin API endpoints + +Modern project-admin UI uses dedicated endpoints: + +- `GET /api/project-admin/projects/` +- `POST /api/project-admin/projects/create/` +- `GET/PUT/DELETE /api/project-admin/projects//` +- `POST /api/project-admin/projects//clone/` +- `POST /api/project-admin/projects//reset/` + +## Metrics APIs + +- `GET/POST /api/metrics-job/` (list jobs / submit new report) +- `DELETE /api/metrics-job//` (remove report job) +- `GET/PUT /api/metrics//` (fetch report / rename report) + +Only compatible projects should be combined (same underlying model +configuration) when generating reports. + +## Concept exploration and filter export + +Use the **Concepts** view (`/model-explore`) to: + +- browse hierarchy paths, +- choose parent concepts, +- generate and export JSON filter lists for project CUI filters. + +Related API endpoints: + +- `POST /api/generate-concept-filter/` +- `POST /api/generate-concept-filter-json/` +- `GET /api/model-concept-children//` + +## Remote model service projects + +Projects can use a remote MedCAT model service instead of local model loading +by setting: + +- `use_model_service = true` +- `model_service_url = ` + +Operational note: train-on-submit updates are not applied for remote model +service projects. + +## Python client + +For scripting and CI pipelines, see [client.md](client.md) and the `mctclient` +package. diff --git a/medcat-trainer/docs/annotator_guide.md b/medcat-trainer/docs/annotator_guide.md index 981b3e1ed..32e659031 100644 --- a/medcat-trainer/docs/annotator_guide.md +++ b/medcat-trainer/docs/annotator_guide.md @@ -1,72 +1,93 @@ # Annotation Interface -The annotation interface can be split into 5 sections. +The annotator view is designed for fast review and correction of model +predictions. ![](_static/img/main-annotation-interface.png) -## Section 1 - Document Summary List -A list of documents to be completed in this project. Currently selected documents are highlighted in blue -left border. Submitted documents are marked with a ![tick_mark](_static/img/tick_mark.png). - -## Section 2 - Clinical Text -The selected documents text, highlighted with each concept recognised by the configured MedCAT model. -Highlighted spans of text indicate status of the annotation: -- Grey: A User has *not reviewed* this span that has been recognised and linked by MedCAT to a CDB concept. -- Blue: A User has reviewed the span and marked it as ***correct*** in terms of its linked MedCAT concept. -- Red: A User has reviewed the span and marked it as **incorrect** in terms of its linked MedCAT concept. -- Dark Red: A User has reviewed the span and marked it to **terminate**, meaning the text span should never again - link to this text span, this informs MedCAT that -- Turqoise: A User has reviewed the span and marked it as an **alternative** linked concept. The user has used the - 'Concept Picker' to choose the correct concept that should be linked. - -### Additional Annotations -MedCAT may miss text spans that are acronyms, abbreviations or misspellings of concepts. Missing annotations can be -added to the text by directly highlighting the text span, right clicking, selecting 'Add Term', searching for -concept (via ID, or name), and selecting Add Term: - -![](_static/img/add-annotation-text.png) -> ![](_static/img/add-annotation-menu.png) -> ![](_static/img/add-annotation-concept-pick.png) - -Select: -- Add Term: to add this annotation to the text span and link the selected concept -- Cancel: (Shortcut esc): to cancel adding the annotation to the text. - -![](_static/img/add-annotation-select-concept.png) - -## Section 3 - Action Bar - -### Concept Navigation -Navigating between the list of concepts as they appear in the document: -- Action buttons, left and right -- Left and right arrow keys on keyboard -- Directly clicking on the concept within the text. - - -### Concept Status Buttons -A concept can be marked with only one status. Status is recorded but only sent to MedCAT for -training on **submit** of the document and if the projects configured with "Train Model On Submit" is ticked. - -### Submit Button -Submit is disabled until all concepts have been reviewed and marked with a status. Clicking submit will produce -a submission confirmation dialog with an annotation summary. Confirming submission will send all new annotations -to MedCATTrainer middle tier, and re-train the MedCAT model. The following document will be selected and annotated -by the newly trained MedCAT model - -## Section 4 - Header Toolbar -Lists the current name of the document under review and the number of remaining documents to annotate in this project -action buttons for: -- ![](_static/img/summary-button.png): Summary of current annotations. f A similar view is shown before confirmation of submission of the annotations -- ![](_static/img/help-button.png): Help dialog, showing shortcuts for document & concept navigation, concept annotation and submission. -- ![](_static/img/reset-button.png): Reset document. If an annotation is incorrectly added, or incorrectly submitted resetting the document will - clear all previous annotations and their status. - -## Section 5 - Concept Summary -Lists the current selected concepts details. - -|Concept Detail| Description | -|--------------| ------------| -|Annotated Text| The text span linked to the concept| -|Name | The linked concept name from within the MedCAT CDB| -|Type ID | The higher level group of concepts that this concept sits under. This may be 'N/A' depending if your CDB has Type IDs or not.| -|Concept ID | The unique identifier for this linked concept from the MedCAT CDB.| -|Accuracy | The MedCAT found accuracy of the linked concept for this span. Text spans will have an accuracy 1.0, if they are uniquely identified by that name in the CDB| -|Description | The MedCAT associated description of the concept. SNOMED-CT does not provide descriptions of concepts, only alternative names whereas UMLS does provide descriptions| \ No newline at end of file +## 1) Document list + +The left panel shows documents in the project dataset. + +- Current document is highlighted. +- Prepared documents (model predictions generated) are marked. +- Submitted/validated documents are marked as complete. + +## 2) Clinical text + +The center panel displays document text with detected concept spans. + +Select spans by clicking them directly, then apply one status from the task bar. + +### Supported concept statuses + +- **Correct** +- **Incorrect** +- **Terminate** (if enabled for project) +- **Alternative** (choose a different concept) +- **Irrelevant** (if enabled for project) + +Only one status can be active per concept at a time. + +### Adding missing annotations + +If the model missed a mention: + +1. Highlight text in the document. +2. Right-click and choose **Add Term**. +3. Search/select a concept in the concept picker. +4. Confirm to create the annotation. + +Projects with `add_new_entities` enabled can also create brand-new concepts. + +Overlapping annotations are supported. + +## 3) Task bar and submission + +The task bar contains status buttons and the **Submit** button. + +- Submit is enabled only when required tasks are completed for all concepts. +- On submit, a confirmation dialog shows an annotation summary. +- If project `train_model_on_submit` is enabled, submitted annotations are used + for incremental model updates (except remote model-service projects). + +## 4) Header actions + +Top-right actions: + +- **Summary**: open document annotation summary. +- **Help**: keyboard shortcuts and project guidance. +- **Reset document**: re-prepare current document and clear document-level + annotation state. + +## 5) Right sidebar (concept details) + +The sidebar shows details for the currently selected concept, including: + +- concept name/CUI +- type IDs/semantic type (if available) +- synonyms and description +- confidence score + +If enabled by project settings, a **Comment** field is also available. + +## Meta annotations and relations + +Depending on project configuration: + +- **Meta Annotation Tasks** appear for relevant concept statuses. +- **Relation** tab appears to create/edit relations between annotated entities. + +## Keyboard shortcuts + +| Shortcut | Action | +|---|---| +| Up / Down | Previous / next document | +| Left / Right (or Space) | Previous / next concept | +| `1` | Correct | +| `2` | Incorrect | +| `3` | Terminate (if enabled) | +| `4` | Alternative | +| `5` | Irrelevant (if enabled) | +| Enter | Submit / confirm submit | +| Esc | Close active modal/cancel active add-term flow | diff --git a/medcat-trainer/docs/client.md b/medcat-trainer/docs/client.md index d5d131325..78b7b0b8c 100644 --- a/medcat-trainer/docs/client.md +++ b/medcat-trainer/docs/client.md @@ -1,88 +1,78 @@ +# MedCATtrainer Python Client ---- +`mctclient` provides a Python wrapper over MedCATtrainer REST APIs for +automation and batch workflows. -# MedCATtrainer Client +## Install -A Python client for interacting with a MedCATTrainer web application instance. This package allows you to manage datasets, concept databases, vocabularies, model packs, users, projects, and more via Python code or the command line. - -## Features - -- Manage datasets, concept databases, vocabularies, and model packs -- Create and manage users and projects -- Retrieve and upload project annotations -- Command-line interface (CLI) for automation - -## Installation - -```sh +```bash pip install mctclient ``` -Or, if installing from source: +## Authenticate -```sh -cd client -python -m build -pip install dist/*.whl -``` - -## Python Usage +The client uses username/password API-token auth. -```sh +```bash export MCTRAINER_USERNAME= export MCTRAINER_PASSWORD= ``` +## Minimal example + ```python -from mctclient import MedCATTrainerSession, MCTDataset, MCTConceptDB, MCTVocab, MCTModelPack, MCTMetaTask, MCTRelTask, MCTUser, MCTProject +from mctclient import ( + MedCATTrainerSession, + MCTDataset, + MCTConceptDB, + MCTVocab, + MCTModelPack, + MCTUser, +) -# Connect to your MedCATTrainer instance session = MedCATTrainerSession(server="http://localhost:8001") -# List all projects +# Inspect existing resources projects = session.get_projects() -for project in projects: - print(project) +model_packs = session.get_model_packs() -# Create a new dataset -dataset = session.create_dataset(name="My Dataset", dataset_file="path/to/data.csv") +# Upload dataset +ds = session.create_dataset(name="Demo Dataset", dataset_file="data.csv") -# Create a new user -user = session.create_user(username="newuser", password="password123") +# Create user (optional) +annotator = session.create_user(username="annotator_1", password="strong-password") -# Create a new project +# Create project using model pack OR cdb+vocab project = session.create_project( - name="My Project", - description="A new annotation project", - members=[user], - dataset=dataset + name="Demo Project", + description="Automated setup", + members=[annotator], + dataset=ds, + modelpack=model_packs[0], ) ``` -### MedCATTrainerSession Methods +## Common methods -- `create_project(name, description, members, dataset, cuis=[], cuis_file=None, concept_db=None, vocab=None, cdb_search_filter=None, modelpack=None, meta_tasks=[], rel_tasks=[])` +- `create_project(...)` - `create_dataset(name, dataset_file)` - `create_user(username, password)` - `create_medcat_model(cdb, vocab)` - `create_medcat_model_pack(model_pack)` - `get_users()` - `get_models()` +- `get_concept_dbs()` +- `get_vocabs()` - `get_model_packs()` - `get_meta_tasks()` - `get_rel_tasks()` - `get_projects()` - `get_datasets()` - `get_project_annos(projects)` +- `upload_projects_export(...)` -Each method returns the corresponding object or a list of objects. - -## License - -This project is licensed under the Apache 2.0 License. - -## Contributing - -Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change. - +## Notes +- `create_project` expects **either** `modelpack` **or** `concept_db + vocab`. +- Wrapper objects (`MCTDataset`, `MCTConceptDB`, etc.) can often be passed by + object or resolved by name. diff --git a/medcat-trainer/docs/demo_page.md b/medcat-trainer/docs/demo_page.md index e4f653725..3fd397c22 100644 --- a/medcat-trainer/docs/demo_page.md +++ b/medcat-trainer/docs/demo_page.md @@ -1,23 +1,35 @@ -# Demo -For demonstration purposes and general testing of a current model a stripped back version of the annotator is provided -via the 'Demo' tab of the main screen. +# Demo / Try Model + +The **Try Model** view (`/demo`) is a lightweight sandbox for testing model +behavior without creating a full annotation project. ![](_static/img/demo_tab.png) -This presents a similar looking annotation screen as a real project, but does not -force usage of a particular dataset, setup of filters, and other project settings. This view does not allow for 'annotating' -identified concepts (or adding new concepts) but allows for users to get a feel for what an existing MedCAT model is -capable of annotating in via an interactive model +## What it is for + +- quick model sanity checks +- ad-hoc text exploration +- testing concept filters before project setup + +It does **not** persist annotation decisions like a project workflow does. + +## Workflow + +1. Select a **Model Pack**. +2. Optionally add CUI filters: + - pick concepts from the concept picker, or + - paste a comma-separated CUI list. +3. Optionally enable **Include sub-concepts**. +4. Enter or paste free text. -![](_static/img/demo_interface.png) +The text is annotated automatically after a short pause or when focus leaves the +editor. -1\. A form to: -- Select the appropriate project model to view concept annotations for. -- Clinical text to annotate and display in 2. -- CUI and TypeID filters can be used to only show concepts of interest in 2. For example for a UMLS CDB this could be - T047 for "Disease or Syndrome". CUI and TypeID filters are combined if entries are included in both form inputs. +## Output panels -2\. Example clinical text is displayed here, with text spans highlighted in blue. Click any annotation to show linked -concept DB details in 3. +- **Main text panel**: highlighted entities in context. +- **Concept Summary**: details for the selected entity. +- **Meta Annotations Summary**: model-predicted meta annotation values (if + available in the model pack). -3\. Linked concept details from selected concepts from the 2. \ No newline at end of file +Double-clicking the rendered text switches back to edit mode. diff --git a/medcat-trainer/docs/installation.md b/medcat-trainer/docs/installation.md index 270e47801..2eda1c2a9 100644 --- a/medcat-trainer/docs/installation.md +++ b/medcat-trainer/docs/installation.md @@ -1,137 +1,131 @@ # Installation -MedCATtrainer is a docker-compose packaged Django application. - -## Download from Dockerhub -Clone the repo, run the default docker-compose file and default env var: -```shell -$ git clone https://github.com/CogStack/cogstack-nlp -$ cd cogstack-nlp/medcat-trainer -$ docker-compose up -``` -This will use the pre-built docker images available on DockerHub. If your internal firewall does on permit access to DockerHub, you can build directly from source. +MedCATtrainer is packaged as a Docker Compose deployment with three core +services: + +- `medcattrainer` (Django API + background workers) +- `nginx` (serves UI and proxies API) +- `solr` (concept search index for concept lookup) + +## Prerequisites + +- Docker Engine +- Docker Compose v2 (`docker compose` command) + +## Quick start (prebuilt images) -To check logs of the MedCATtrainer running containers ```bash -$ docker logs | grep "\[medcattrainer\]" -$ docker logs | grep "\[bg-process\]" -$ docker logs | grep "\[db-backup\]" +git clone https://github.com/CogStack/cogstack-nlp +cd cogstack-nlp/medcat-trainer +docker compose up -d ``` -## MedCAT v0.x models -If you have MedCAT v0.x models, and want to use the trainer please use the following docker-compose file: -This refences the latest built image for the trainer that is still compatible with [MedCAT v0.x.](https://pypi.org/project/medcat/0.4.0.6/) and under. -```shell -$ docker-compose -f docker-compose-mc0x.yml up -``` +Open the app at `http://localhost:8001` (unless you changed `MCTRAINER_PORT`). + +Useful logs: -## Build images from source -The above commands runs the latest release of MedCATtrainer, if you'd prefer to build the Docker images from source, use -```shell -$ docker-compose -f docker-compose-dev.yml up +```bash +docker compose logs -f medcattrainer +docker compose logs -f nginx +docker compose logs -f solr ``` -The webapp Python dependencies are managed with **uv** and **pyproject.toml** (see `medcat-trainer/webapp/pyproject.toml`). To install locally for development: +## Build from source (development) -```shell -$ cd medcat-trainer/webapp -$ uv sync --no-install-project -$ uv run python api/manage.py runserver +```bash +docker compose -f docker-compose-dev.yml up --build ``` -To add or update dependencies, `uv add && uvlock`; commit `uv.lock` for reproducible Docker builds. +This uses the local `webapp/` source tree and is the recommended setup for +development work. -To change environment variables, such as the exposed host ports and language of spaCy model, use: -```shell -$ cp .env-example .env -# Set local configuration in .env -``` +## Legacy MedCAT v0.x support -## Troubleshooting -If the build fails with an error code 137, the virtual machine running the docker -daemon does not have enough memory. Increase the allocated memory to containers in the docker daemon -settings CLI or associated docker GUI. +If you still need the legacy MedCAT v0.x-compatible stack: -On MAC: https://docs.docker.com/docker-for-mac/#memory +```bash +docker compose -f docker-compose-mc0x.yml up -d +``` -On Windows: https://docs.docker.com/docker-for-windows/#resources +## Environment configuration -### (Optional) SMTP Setup +Runtime settings are mainly defined in: -For password resets and other emailing services email environment variables are required to be set up. +- `envs/env` (non-prod defaults) +- `envs/env-prod` (production-oriented defaults) -Personal email accounts can be set up by users to do this, or you can contact someone in CogStack for a cogstack no email credentials. +Host-level Compose variables (for example port overrides) can be set by copying +`.env-example` to `.env` and editing values. -The environment variables required are listed in [Environment Variables.](#optional-environment-variables) +### Common environment variables -Environment Variables are located in envs/env or envs/env-prod, when those are set webapp/frontend/.env must change "VITE_APP_EMAIL" to 1. +| Variable | Description | +|---|---| +| `MCTRAINER_PORT` | Host port for the web UI/API (default `8001`). | +| `SOLR_PORT` | Host port for Solr admin (default `8983`). | +| `MEDCAT_CONFIG_FILE` | MedCAT config file path inside the container. | +| `LOAD_EXAMPLES` | Load example model pack + dataset + project on startup (`1`/`0`). | +| `REMOTE_MODEL_SERVICE_TIMEOUT` | Timeout (seconds) for remote model-service calls. | +| `MCTRAINER_BOOTSTRAP_ADMIN_USERNAME` | Bootstrap admin username (default `admin`). | +| `MCTRAINER_BOOTSTRAP_ADMIN_EMAIL` | Bootstrap admin email. | +| `MCTRAINER_BOOTSTRAP_ADMIN_PASSWORD` | Bootstrap admin password (change in real deployments). | -### (Optional) Environment Variables -Environment variables are used to configure the app: +### SMTP (optional, for password reset emails) -|Parameter|Description| -|---------|-----------| -|MEDCAT_CONFIG_FILE|MedCAT config file as described [here](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/medcat/config/config.py)| -|BEHIND_RP| If you're running MedCATtrainer, use 1, otherwise this defaults to 0 i.e. False| -|MCTRAINER_PORT|The port to run the trainer app on| -|EMAIL_USER|Email address which will be used to send users emails regarding password resets| -|EMAIL_PASS|The password or authentication key which will be used with the email address| -|EMAIL_HOST|The hostname of the SMTP server which will be used to send email (default: mail.cogstack.org)| -|EMAIL_PORT|The port that the SMTP server is listening to, common numbers are 25, 465, 587 (default: 465)| -|MCTRAINER_BOOTSTRAP_ADMIN_USERNAME|Username for the default admin user created on initial startup (default: admin)| -|MCTRAINER_BOOTSTRAP_ADMIN_EMAIL|Email address for the default admin user created on initial startup (default: admin@example.com)| -|MCTRAINER_BOOTSTRAP_ADMIN_PASSWORD|Password for the default admin user created on initial startup (default: admin)| +Set: -Set these and re-run the docker-compose file. +- `EMAIL_USER` +- `EMAIL_PASS` +- `EMAIL_HOST` +- `EMAIL_PORT` -You'll need to `docker stop` the running containers if you have already run the install. +If SMTP is not configured, password reset workflows will fail. -## OIDC Authentication +## OIDC (Keycloak) authentication (optional) -You can enable OIDC (OpenID Connect) authentication for the MedCAT Trainer. To do so, you must configure the following environment variables: +Set `USE_OIDC=1` and provide: -| Variable | Example | Description | -|-----------------------------------|-------------------------------------------|----------------------------------------------------| -| `USE_OIDC` | `1` | Enable OIDC (1=enabled, 0=traditional auth) | -| `KEYCLOAK_URL` | `https://auth.example.org` | Keycloak base URL | -| `KEYCLOAK_REALM` | `cogstack` | Keycloak realm name | -| `KEYCLOAK_LOGOUT_REDIRECT_URI` | `https://cogstack-launchpad.example.org/` | Where to go after logout | -| `KEYCLOAK_INTERNAL_SERVICE_URL` | `http://keycloak.8080` | Keycloak internal service URL | -| `KEYCLOAK_FRONTEND_CLIENT_ID` | `cogstack-medcattrainer-frontend` | Keycloak Frontend client ID (for token validation) | -| `KEYCLOAK_BACKEND_CLIENT_ID` | `cogstack-medcattrainer-backend` | Keycloak Backend client ID | -| `KEYCLOAK_BACKEND_CLIENT_SECRET` | `***secret***` | Keycloak Backend client secret | +| Variable | Description | +|---|---| +| `KEYCLOAK_URL` | Public Keycloak URL (frontend redirect/login). | +| `KEYCLOAK_REALM` | Keycloak realm name. | +| `KEYCLOAK_LOGOUT_REDIRECT_URI` | URL to redirect users to on logout. | +| `KEYCLOAK_INTERNAL_SERVICE_URL` | Backend-reachable Keycloak URL. | +| `KEYCLOAK_FRONTEND_CLIENT_ID` | Public frontend client ID. | +| `KEYCLOAK_BACKEND_CLIENT_ID` | Confidential backend client ID. | +| `KEYCLOAK_BACKEND_CLIENT_SECRET` | Backend client secret. | -#### Advanced Optional OIDC Settings +Optional token refresh tuning: -| Variable | Default | Description | -|-----------------------------------|---------|-----------------------------------------------------------------------------------| -| `KEYCLOAK_TOKEN_MIN_VALIDITY` | `30` | The interval in seconds between each refresh attempt | -| `KEYCLOAK_TOKEN_REFRESH_INTERVAL` | `20` | Minimum time in seconds the token should remain valid before triggering a refresh | +- `KEYCLOAK_TOKEN_MIN_VALIDITY` (default `30`) +- `KEYCLOAK_TOKEN_REFRESH_INTERVAL` (default `20`) -You can either use the Gateway Auth stack available in cogstack-ops or deploy your own Keycloak instance. +Role mapping: -#### Roles -Currently, there are two roles that can be assigned to users: +- `medcattrainer_superuser` -> Django superuser/staff +- `medcattrainer_staff` -> Django staff -| Keycloak Role | Django Permission | Capabilities | -|---------------|-------------------|--------------| -| `medcattrainer_superuser` | `is_superuser=True`, `is_staff=True` | Full admin access, Django admin, all projects | -| `medcattrainer_staff` | `is_staff=True` | Staff-level access, can manage assigned projects | -| (no role) | Regular user | Can only access assigned projects, no admin | +## PostgreSQL support (optional) +SQLite is default. For larger deployments, set: -### (Optional) Postgres Database Support -MedCAT trainer defaults to a local SQLite database, which is suitable for single-user or small-scale setups. +| Variable | Description | +|---|---| +| `DB_ENGINE` | `sqlite3` or `postgresql` | +| `DB_NAME` | Database name | +| `DB_USER` | Database user | +| `DB_PASSWORD` | Database password | +| `DB_HOST` | Database host/service | +| `DB_PORT` | Database port (default `5432`) | -For larger deployments, or to support multiple replicas of the app for example in Kubernetes, you may want to run a postgresql database. +An example compose file is available at +`docker-compose-example-postgres.yml`. -You can optionally use a postgresql database instead by setting the following env variables. +## Troubleshooting -|Parameter|Description| -|---------|-----------| -|DB_ENGINE|Database engine to use. Either `sqlite3` or `postgresql`. Defaults to `sqlite3` if not set.| -|DB_NAME|Name of the database to connect to.| -|DB_USER|Username to authenticate with the database.| -|DB_PASSWORD|Password to authenticate with the database.| -|DB_HOST|Hostname of the database server (for Postgres, typically the service name in Docker/Kubernetes).| -|DB_PORT|Port the database server is listening on. Defaults to `5432` for Postgres.| \ No newline at end of file +- **Exit code 137 during build/start**: container memory is too low. + Increase Docker memory allocation. +- **Cannot log in with default admin**: verify bootstrap admin env vars and + startup logs. +- **Concept picker empty**: confirm Solr is running and concepts were imported + for the selected CDB. diff --git a/medcat-trainer/docs/main.md b/medcat-trainer/docs/main.md index e1b2966f7..e06bd377f 100644 --- a/medcat-trainer/docs/main.md +++ b/medcat-trainer/docs/main.md @@ -1,7 +1,47 @@ - # Medical oncept Annotation Tool Trainer - -MedCATTrainer is an interface for building, improving and customising a given Named Entity Recognition -and Linking (NER+L) model (MedCAT) for biomedical domain text. +# Medical oncept Annotation Tool Trainer -MedCATTrainer was presented at EMNLP/IJCNLP 2019. +MedCATtrainer is a web application for creating, validating, and improving +MedCAT concept annotation models on biomedical or clinical text. + +It supports both classic active-learning workflows (train on submit) and +review-only workflows (collect annotations without changing the model). + +## What you can do with MedCATtrainer + +- Build annotation projects from CSV/XLSX datasets. +- Use either: + - a **Model Pack** (recommended), or + - a **Concept DB + Vocabulary** pair. +- Optionally use a **remote MedCAT model service** for document preparation. +- Collect concept-level labels: + - Correct + - Incorrect + - Alternative concept + - Terminate + - Irrelevant +- Collect optional **meta annotations** and **relation annotations**. +- Use **Project Groups** for multi-annotator setups. +- Run **metrics reports** across one or more compatible projects. +- Explore concept hierarchies and export concept filters. + +## Typical workflow + +1. Install and configure MedCATtrainer. +2. Create users and upload model artifacts (Model Pack or CDB/Vocab). +3. Create a project and assign annotators. +4. Annotate and submit documents. +5. Export annotations and evaluate with the metrics tools. + +## Documentation map + +- [Installation](installation.md) +- [Administrator Setup](admin_setup.md) +- [Annotation Project Creation and Management](project_admin.md) +- [Project Groups](project_group_admin.md) +- [Annotator Guide](annotator_guide.md) +- [Meta Annotations](meta_annotations.md) +- [Demo / Try Model](demo_page.md) +- [Advanced Usage](advanced_usage.md) +- [Maintenance](maintenance.md) +- [Python Client](client.md) diff --git a/medcat-trainer/docs/maintenance.md b/medcat-trainer/docs/maintenance.md index b21af39b1..6b25dcf4e 100644 --- a/medcat-trainer/docs/maintenance.md +++ b/medcat-trainer/docs/maintenance.md @@ -1,61 +1,64 @@ -# Maintanence +# Maintenance -MedCATtrainer is actively maintained. To ensure you receive the latest -security patches of the software and its dependencies you should regularly -be upgrading to the latest release. +Keep MedCATtrainer regularly updated to receive dependency and security fixes. -The latest stable releases update the `docker-compose.yml` and `docker-compose-prod.yml` files. +## Upgrade workflow -To update these docker compose files, either copy them directly from the [repo](https://github.com/CogStack/cogstack-nlp/tree/main/medcat-trainer) -or update the cloned files via: +```bash +cd medcat-trainer +git pull +docker compose pull +docker compose up -d +``` + +For production compose: -```shell -$ cd MedCATtrainer -$ git pull -$ docker-compose up -# alternatively for prod releases use: -$ docker-compose -f docker-compose-prod.yml up +```bash +docker compose -f docker-compose-prod.yml pull +docker compose -f docker-compose-prod.yml up -d ``` -MedCATtrainer follows [Semver](https://semver.org/), so patch and minor release should always be backwards compatible, -whereas major releases, e.g. v1.x vs 2.x versions signify breaking changes. +Database migrations are applied automatically on container startup. -Neccessary Django DB migrations will automatically applied between releases, which should largely be invisible to an end admin -or annotation user. Nevertheless, migrating ORM / DB models, then rolling back a release can cause issues if values are defaulted -or removed from a later version. +## Operational checks -## Backup and Restore +- Application/API health: `GET /api/health/` +- Container logs: `docker compose logs -f medcattrainer` +- Concept search availability: verify Solr container and project concept import + status. -### Backup -Before updating to a new release, a backup will be created in the `DB_BACKUP_DIR`, as configured in `envs/env`. -A further crontab runs the same backup script at 10pm every night. This does not cause any downtime and will look like -this in the logs: -```shell -medcattrainer-medcattrainer-db-backup-1 | Found backup dir location: /home/api/db-backup and DB_PATH: /home/api/db/db.sqlite3 -medcattrainer-medcattrainer-db-backup-1 | Backed up existing DB to /home/api/db-backup/db-backup-2023-09-26__23-26-01.sqlite3 -medcattrainer-medcattrainer-db-backup-1 | To restore this backup use $ ./restore.sh /home/api/db-backup/db-backup-2023-09-26__23-26-01.sqlite3 -``` +## Backup and restore (SQLite deployments) + +The backup scripts are SQLite-focused (`DB_ENGINE=sqlite3`). + +### Automatic backups + +- A backup is taken on startup before migrations. +- A scheduled backup job also runs regularly. +- Backup location is controlled by: + - `DB_PATH` + - `DB_BACKUP_DIR` + +### Restore process -A backup is also automatically performed each time the service starts, and any migrations are performed, in the events of a new release -introducing a breaking change and corrupting a DB. - -### Restore -If a DB is corrupted or needs to be restored to an existing backed up db use the following commands, whilst the service is running: - -```shell -$ docker ps -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -a2489b0c681b cogstacksystems/medcat-trainer-nginx:v2.11.2 "/docker-entrypoint.…" 4 days ago Up 4 days 80/tcp, 0.0.0.0:8001->8000/tcp, :::8001->8000/tcp medcattrainer-nginx-1 -20fed153d798 solr:8 "docker-entrypoint.s…" 4 days ago Up 4 days 0.0.0.0:8983->8983/tcp, :::8983->8983/tcp mct_solr -2b250a0975fe cogstacksystems/medcat-trainer:v2.11.2 "/home/run.sh" 4 days ago Up 4 days medcattrainer-medcattrainer-1 -$ docker exec -it 2b250a0975fe bash -root@2b250a0975fe:/home/api# cd .. -$ restore_db.sh db-backup-2023-09-25__23-21-39.sqlite3 # run the restore.sh script -Found backup dir location: /home/api/db-backup, found db path: home/api/db/db.sqlite3 -DB file to restore: db-backup-2023-09-25__23-21-39.sqlite3 -Found db-backup-2023-09-25__23-21-39.sqlite3 - y to confirm backup: y # you'll need tp confirm this is the correct file to restore. -Restored db-backup-2023-09-25__23-21-39.sqlite3 to /home/db/db.sqlite3 +1. Enter the running `medcattrainer` container. +2. Run restore script: + +```bash +/home/scripts/restore_db.sh ``` -The `restore_db.sh` script will automatically restore the latest db file, if no file is specified. +If no filename is provided, the latest backup is selected. + +The script prompts for confirmation before overwriting the active DB. + +## Release compatibility + +MedCATtrainer follows semantic versioning: + +- patch/minor versions are expected to be backward compatible, +- major versions may include breaking changes. + +Avoid rollback after schema migrations unless you have tested rollback +procedures and verified data compatibility. diff --git a/medcat-trainer/docs/meta_annotations.md b/medcat-trainer/docs/meta_annotations.md index bafcd8a19..069c6512b 100644 --- a/medcat-trainer/docs/meta_annotations.md +++ b/medcat-trainer/docs/meta_annotations.md @@ -1,42 +1,45 @@ # Meta Annotations -MedCAT is also able to learn project & context specific annotations that overlay on top of the base layer of concept annotations. +Meta annotations are extra labels attached to concept annotations, useful for +context-specific tasks such as temporality, experiencer, assertion, or +hypothetical mentions. -Example use cases of these annotations could be to train models to predict if: +Examples: -- all disease concepts were **experienced** by the patient, a relative, or N/A. -- all symptom concepts are **temporally** reference present day, or are historical. -- all drug concepts are mentions of patients consuming drugs rather than **hypothetical** mentions. -- a complaint for a patient is **primary** or **secondary**. +- `Temporality`: Past / Present / Future +- `Experiencer`: Patient / Family / Other +- `Assertion`: Affirmed / Negated / Possible -MedCATTrainer is configurable (via the administrator app), to allow for the collection of these meta annotations. We -currently have not integrated the active learning components of the concept recognition. +## Configure meta tasks -## Meta Annotation Configuration +Create and manage tasks in Django admin (`/admin`): -To create a new Meta Annotation Task and attach to an existing project: +1. Create **Meta Task Values** (the allowed label options). +2. Create a **Meta Task**: + - name + - values + - default value (optional) + - description + - ordering +3. Attach selected tasks to your project (`Project annotate entities`). -1\. Enter your project configuration settings via the admin page (http://localhost:8001/admin/) +## Model-pack predictions vs manual labels -![](_static/img/select-existing-project.png) +If your project uses a model pack that includes MetaCAT models: -2\. At the bottom of the form, select the + icon to bring up the new Meta Annotation Task Form. +- predicted meta labels may be shown in the annotation UI, +- annotators can validate or override predictions. -![](_static/img/add-new-meta-task.png) +If no prediction is available, annotators can still assign labels manually. -3\. Complete the form and add additional meta task values if required for your task via the '+' icon and the 'values' input. -Values are enumerated options for your specific task. These can be re-used across projects or be project specific. -Ensure the default is one of the corresponding values available. Descriptions appear alongside the tasks in interface -and in full in the help dialog. +## Annotator behavior -![](_static/img/meta-task-form.png) +In the annotation screen, meta tasks appear in the sidebar for eligible concept +statuses (for example Correct/Alternative flows). -4\. Select desired Meta Annotation tasks for the project by holding down (ctrl / cmd) and clicking the meta tasks, -then select 'Save' to save the project changes. +Task values can be toggled/updated and are stored as `MetaAnnotation` records. -![](_static/img/select-tasks.png) +## Reporting -5\. Meta Annotations now appear in the interface for that project under the concept summary. Meta-annotations -only appear for concepts that are correct. - -![](_static/img/meta-tasks-interface.png) \ No newline at end of file +Metrics reports include a **Meta Annotations** tab when meta annotation data is +present, including macro/micro performance summaries by task. diff --git a/medcat-trainer/docs/project_admin.md b/medcat-trainer/docs/project_admin.md index ecf0e994d..7ce879bb5 100644 --- a/medcat-trainer/docs/project_admin.md +++ b/medcat-trainer/docs/project_admin.md @@ -1,205 +1,114 @@ -# Annotation Project Creation +# Annotation Project Creation and Management -## Creating an Annotation Project -Annotation projects are used to inspect, validate and improve concepts recognised & linked by MedCAT. -They can also be used to collect annotations for defined MetaCAT models tasks, and coming soon RelCAT, or relation annotation models. +MedCATtrainer supports two management surfaces: -Using the admin page, a configured admin or superuser can create, edit and delete annotation projects. +- **Project Admin UI** (`/project-admin`) for most project operations. +- **Django Admin** (`/admin`) for advanced actions and low-level data management. -1\. Navigate to `http://localhost:8001/admin/` or the `http://:/admin/` in which you've deployed the Trainer, and select 'Project annotate entities'. +## Create a project (Project Admin UI) -![Main Menu list](_static/img/project_annotate_entities.png) +1. Open `/project-admin`. +2. Go to the **Projects** tab and select **Create New Project**. +3. Fill in: + - **Basic information** (name, dataset, description, guideline link) + - **Model configuration** + - **Annotation settings** + - **Concept filter (optional)** + - **Members** +4. Save. -2\. 'Add Project Annotate Entities' +### Model configuration options -![Add Project Annotate Entities button](_static/img/add_project_annotate_entities.png) -project_admin -3\. Complete the new annotation project form. The table below provides details the purpose of each field: +Pick exactly one of: -|Parameter|Description| -|---------|-----------| -|Name|# Name of the project that appears on the landing page| -|Description| Example projects', # Description as it appears on the landing page| -|Annotation guideline link|A link to a GoogleDoc, MS Sharepoint document etc. that provides this projects annotation guidelines| -|Members | **list** of users that have access to this project, select the '+' to create new users | -|Dataset | The set of documents to be annotated. The dataset tabular schema is described below. | -|Validated Documents| Ignore this list. Use of this list is described in the forthcoming advanced administrator user guide| -|Cuis | (Optional) A list of comma separated Concept Unique Identifiers (CUIs). Use this to only show precise concepts in this project | -|CUI File | (Optional) A JSON formatted list of CUIs. Can be useful if the project should be setup to annotate large CUI lists extracted gathered from introspection of a CDB. **Will be merged with the above 'Cuis' list**| -|Concept DB | A MedCAT Concept Database. This should be the resulting file from a call to the function medcat.cdb.CDB.save_dict('name_of_cdb.dat'). Clicking the '+' icon here opens a dialog to upload a CDB file. | -|vocab | A MedCAT Vocabulary. This should be the resulting file from a call to the function medcat.cdb.utils.Vocab.save_dict('name_of_vocab.dat'). Clicking the '+' icon here opens a dialog to upload a vocab file.| -|cdb_search_filter|CDB ID used to lookup concepts during addition of annotations to a document| -|Require Entity Validation| (Default: True) With this option ticked, annotations in the interface, that are made by MedCAT will appear 'grey' indicating they have not been validated. Document submission is dependent upon all 'grey' annotations to be marked by a user. Unticked ensures all annotations are marked 'valid' by default| -|Train Model On Submit| (Default: True) With this option ticked, each document submission trains the configured MedCAT instance with the marked, and added if any, annotations from this document. Unticked, ensures the MedCAT model does not train between submissions.| -|Add New Entities|(Default: False) With this option ticked, allows users to add entirely new concepts to the existing MedCAT CDB. False ensures this option is not available to users.| -|Restrict Concept Lookup|(Default: False) With this option ticked, restricts the concept lookup (add annotation / alternative concept) to only include those CUIs listed in the above filters (either from CUI / TUI list or uploade 'CUI File' list| -|Terminate Available|(Default: True) With this option ticked, the option to terminate an annotated concept will appear| -|Irrelevant Available|(Default: False) With this option ticked, the option to mark an annotated concept as 'irrlevant' will appear| -|Enable entity annotation comments|(Default: False) With this option ticked, the option to leave a comment for each annotation will appear| -|Tasks| Select from the list 'Meta Annotation' tasks that will appear once a given annotation has been marked correct.| -|Relations|Select from the list of 'Relation Annotation' tasks that will appear for a given concept.| +1. **Model Pack** (recommended), or +2. **Concept DB + Vocabulary** pair. -Datasets can be uploaded in CSV or XLSX format. Example: +You may also enable: -| name | text | -|-------|------------------------| -| Doc 1 | Example document text | -| Doc 2 | More example text | +- **Remote model service** (`use_model_service`) and provide + `model_service_url`. -The **name** column should be the ID (identifier) and unique for that dataset, the **text** column is the text to be annotated. -Example datasets are supplied under docs/example_data/*.csv +Notes: -4\. Click 'Save' to store the new project. +- Remote model service projects do not support interim train-on-submit updates. +- You cannot set Model Pack and CDB/Vocab at the same time. -### Project Home Screen +### Key project settings -Navigate to the home screen (`http://localhost:8001/` or `http://:/` depending on your deployment), login with your username and password setup previously. +| Setting | Description | +|---|---| +| `require_entity_validation` | If enabled, model suggestions must be explicitly reviewed before submit. | +| `train_model_on_submit` | If enabled, validated annotations are used for incremental training on submit. | +| `add_new_entities` | Allows users to add brand-new concepts. | +| `restrict_concept_lookup` | Restricts concept search to project CUI filters. | +| `terminate_available` | Shows terminate action in annotation toolbar. | +| `irrelevant_available` | Shows irrelevant action in annotation toolbar. | +| `enable_entity_annotation_comments` | Enables free-text comments per annotation. | +| `tasks` | Meta annotation tasks available in the annotator UI. | +| `relations` | Relation labels available for relation annotation. | +| `project_locked` | Locks project from further annotation edits. | +| `project_status` | Annotating / Complete / Discontinued. | -![](_static/img/login.png) +## Dataset format -Select your new project to begin annotating documents +Upload CSV or XLSX with at least: -![](_static/img/available-projects.png) +| name | text | +|---|---| +| unique-doc-id | document text to annotate | -#### Admin Options -Admin users have extra options on the home screen: +`name` should be unique per dataset. -1. Concepts Imported - specifies if this projects configured CDB Search Filter has been indexed. This is described in detail below. -2. Model Loaded - MedCAT models are loaded into memory (a python dict) once a project is loaded. Loading many models with larger trainer deployments can occupy a lot of memory. **Note** Clearing cached models may affect other projects using the same model instance. -3. Save Model - Write the in memory model to disk - to save the current in memory model state. This option is generally not advised as full model training should be done outside the trainer instance ideally. +## Project list operations -### Notes -- Example Concept and Vocab databses are freely available on MedCAT [github](https://github.com/CogStack/cogstack-nlp/tree/main/medcat-v2). -Note. UMLS and SNOMED-CT are licensed products so only these smaller trained concept / vocab databases are made available currently. -- More documentation on the creation of UMLS / SNOMED-CT CDBs from respective source data will be released soon. -- Tasks allow for the creation of meta-annotations and their associated set of values an annotator can use. -An example 'meta-annotation' could be 'Temporality'. Values could then be 'Past', 'Present', 'Future'. -- Please NOTE Firefox and IE are currently not supported**. Please use Chrome or Safari. +From the home **Projects** table: -## Concept Picker - CDB Concept Import -The concept picker is used to: -- Pick alternative concepts for an existing recognised span - ![](_static/img/pick-alternative-concept.png) -- Pick a concept during the 'Add Term' process. - ![](_static/img/add-annotation-concept.png) +- Open and annotate a project. +- Run document preparation in the background. +- View model-loaded state and clear model cache. +- Save current model state. +- Select compatible projects and submit a metrics report. -The available list of concepts is populated via a MedCAT CDB and indexed in a [solr](https://solr.apache.org/) search index to enable fast type-ahead style search. +## Concept lookup index (Solr import) -SNOMED-CT / UMLS built databases can contain thousands if not millions of concepts so this process is executed -in asynchronous task to ensure the admin page and app are still available for use. +Concept picker search requires CDB concepts to be imported into Solr. -**This process should only be done once for each concept universe (i.e. SNOMED-CT, UMLS are 2 distinct concept universes)** -per deployment or if the underlying MedCAT CDB changes Concepts will be indexed by there CUI, so importing different -CDB instances that reference the same concept universe will only import -the concepts that are in the set difference. +1. Open `/admin`. +2. Go to **Concept Dbs**. +3. Select one or more CDBs. +4. Run **Import concepts** action. -To make these concepts available to a (or any project): +After import, project list shows whether concepts are indexed for the selected +`cdb_search_filter`. -1\. Open the admin app. (http://localhost:8001/admin/) +## Clone, reset, and delete -2\. Select 'Concept Dbs' -![](_static/img/select-concept-dbs.png) +### In Project Admin UI -3\. Select the Concept DB entry, and choose the action 'Import concepts', then press the 'Go' button. -![](_static/img/import-concepts.png) +- **Clone**: duplicate project configuration under a new name. +- **Reset**: remove annotations and clear prepared/validated document state. +- **Delete**: permanently remove the project. -Once the concept imports are complete the solr search services will contain 'collections' that are used by a Django view -for fast type ahead searching. If you're an admin the project home screen will show if a project has had the selected -'CDB Search Filter' imported into solr. -![](_static/img/concepts-imported-status.png) +### In Django Admin -The Solr admin interface is available on the default port 8983. User guide [here](https://solr.apache.org/guide/solr/latest/getting-started/solr-admin-ui.html) +Equivalent bulk actions are available under **Project annotate entities**. -### Concept Collection Maintenance -The solr search service is designed to index all concepts and their metadata across any number of MedCAT Concept Databases. -By default it is run on the same host as the MedCATtrainer django backend, making it fast to tear down, and upload concept -collections, even if the entirety of SNOMED CT or UMLS is indexed. +## Downloading annotations -To update an index, first delete the outdated concepts from solr via the django admin panel: -1\. Open the admin app. (http://localhost:8001/admin/) +From Django admin (`/admin` -> **Project annotate entities**), use bulk actions +to export annotations: -2\. Select 'Concept Dbs' +- with source text +- without source text +- without source text but with document names -3\. Select the Concept DB entry, and choose the action 'Delete ', then press the 'Go' button. -![](_static/img/delete-indexed-concepts.png) +Notebook examples for downstream processing are in: -This will drop the corresponding collection in the solr search service. This can be also be performed in the solr admin UI by default port 8983. +- `notebook_docs/Processing_Annotations.ipynb` -## Downloading Annotations -Project annotations can be downloaded with or without the source text, especially important if the source text is -particularly sensitive and should be not be shared. +## Saving and downloading model artifacts -1\. Open the admin app. (http://localhost:8001/admin/) - -2\. Select 'Project annotate entities', -![Main Menu list](_static/img/project_annotate_entities.png) - -3\. Select the project(s) to download the annotations for and select the appropriate action for w/ or w/o source text, -then press the 'Go' button. This will download all annotations, the meta-annotations (if any) for all projects selected. -Annotations - -4\. An example jupyter notebook is provided under docs/Processing_Annotations.ipynb. - -![](_static/img/download-annos.png) - -## Clone Project -Cloning Projects is a easy & fast method to create copies of configured projects. This includes the dataset, CDB / vocab -reference, meta annotation settings etc. Annotations themselves will not be copied across. - -1\. Open the admin app. (http://localhost:8001/admin/), and select 'Project annotate entities' (same as above for downloading) - -2\. Select the project(s) to clone, select the 'Clone projects', then press the 'Go' button. -![](_static/img/clone-projects.png) - -NB: Cloning projects will use the same CDB instance. If you're double annotating datasets to then calculate agreement scores (IIA, Cohen's Kappa etc.) -then uncheck "Train Model On Submit" for each of the projects to ensure the model is not trained by each annotator. -If you do want 'online training' of the model, use separate instances of the same model. You can directly upload multiple -instances of the same CDB file appropriately named to achieve this. - -## Reset Project -**Use with caution. Resetting projects deletes all annotations and resets a project to its state upon initial creation.** - -1\. Open the admin app. (http://localhost:8001/admin/), and select 'Project annotate entities' -(same as above for downloading) - -2\. Select the project(s) to reset, then press the 'Go' button. -![](_static/img/reset-projects.png) - - -## Save Models -We strongly suggest models are not saved within MedCATtrainer then directly used. Instead, we suggest you use the collected -annotations from projects to train and test a new MedCAT model. - -However, to save the current state of the model you can use: - -An API call - \:\/save-models/ that can be used to save the current state of -a model. This will overwrite the current CDB file. - -Alternatively, login with an 'admin', (i.e. staff or superuser) account and hit the save model button associated with the project. - - -## Download Models - -1\. Open the admin app. (http://localhost:8001/admin/), and select 'Concept dbs'. - -2\. Click the CDB item you would like to download. - -4\. Click the CDB file, you will be prompted to save down the new CDB file. This file will be of the same format you -have used previously, i.e. you've called medcat.cdb.save_dict(''). - -The saved MedCAT models can be used in any instance a regular MedCAT model may be used. I.e. in a jupyter notebook, -part of a web service, or further fine-tuning in another MedCATTrainer instance. - -The Trainer currently does not support inspection / training / storage of the meta annotation models. These will be -integrated in a forthcoming release. - -![](_static/img/save_cdb.png) - -5\. To load the new dictionary use medcat.cdb.load_dict('') - -## Annotation Guidelines -Annotation guidelines can assist guiding annotators when annotating texts for a MedCATTrainer project. - -Example Guidelines can be found [here](https://docs.google.com/document/d/1xxelBOYbyVzJ7vLlztP2q1Kw9F5Vr1pRwblgrXPS7QM/edit?usp=sharing). - -An initial guideline can be refined using specific examples from your dataset in a pilot project containing a handful of documents. +For online-learning projects, admins can save current model state from the +project list. In general, offline retraining from exported annotations is still +recommended for production model releases. diff --git a/medcat-trainer/docs/project_group_admin.md b/medcat-trainer/docs/project_group_admin.md index 19a351f21..d15e88049 100644 --- a/medcat-trainer/docs/project_group_admin.md +++ b/medcat-trainer/docs/project_group_admin.md @@ -1,50 +1,61 @@ +# Annotation Project Groups -# Annotation Project Group Creation -Annotation projects often involve more than one annotator. +Project Groups help coordinate multi-annotator projects at scale. -Project Group instances allow the creation and management of a group of annotation projects from one screen. +They allow admins to define shared configuration once and apply it across a set +of associated annotation projects. -## Annotation Project Group Creation +## When to use Project Groups -Creating a Project Group is similar to regular [Annotation Project Creation](project_admin.md), but differs in a few -key ways. +Use a group when you need: -They can be used to group existing projects together, or to create a set of Annotation Projects. +- one project per annotator over the same dataset/configuration, +- centralized settings management for those projects, or +- grouped visibility in the home screen. -## Key Differences from Regular Project Creation +## Creating a Project Group -When completing a Project Group form, **Create Associated Projects** is a key parameter: +Create groups from Django admin (`/admin` -> **Project groups**). -### Create Associated Projects: True -If checked will create an Annotation Project for each Annotator selected in the list. All selected Admins will be included as 'annotators' on each Project created. This saves -the current steps of creating a 'template' projec then cloning, renaming and re-permissioning each project which happens with -regular project creation for multiple annotators. Each Project will be called ** - **. +The key option is **Create Associated Projects**. +### `create_associated_projects = true` -### Create Associated Projects: False -If False the only important parameters will be ProjectGroup Name and description. All other parameters will be ignored. The expectation here is -that the projects that are to be grouped already exist, and each Project will be added to the new Project Group manually. +On initial save, MedCATtrainer automatically creates one +`ProjectAnnotateEntities` per selected annotator. -## Best Practise -Project Groups provide a convenience method for the creating managing grorups of Annotation Projects. Changes such as CUI filters, and projec settings changed once -in the group will flow down into the associated Annotation Projects. +Naming pattern: +- ` - ` +Administrators are added to each generated project, plus the corresponding +annotator. -## Using Annotation Project Groups -Regular, non-admin Users of MedCATTrainer, i.e. regular annotators, will not see the option to view Project Groups. +### `create_associated_projects = false` -Admin users will see an action bar as shown: -![Project Groups Available](_static/img/project-groups-view-available.png) +No child projects are created automatically. Use this if you already have +projects and want to group them manually. -Selecting this view will show all available Project Groups to the logged in user. -![Project Groups view](_static/img/project-groups-view.png) +## Updating a group -Selecting a group now opens a lightbox with the list of projects in this group: -![Project Group Contents](_static/img/projects-in-group.png) +When associated projects exist and remain aligned, group-level edits propagate +to child project settings (for example CUI filters, model settings, tasks). +If child projects were manually added/removed outside the expected structure, +automatic propagation may fail and projects should then be edited individually. -## Other Benefits of Project Groups +## Using groups in the UI -Further enhancements will allow metric further comparisson between projects in a group, gamification, standard annotation metric reporting (e.g. IIA / Cohen's Kappa statistics etc.) +Admins can switch between **Single Projects** and **Project Groups** from the +home page and inspect projects within each group. + +Regular annotators typically work with individual projects and may not need the +group view. + +## Best practices + +- For inter-annotator agreement studies, disable `train_model_on_submit` in + all group projects. +- Keep naming conventions consistent for easy report comparison. +- Prefer one shared configuration template per annotation campaign.