gpu memory does not get released with `max_loaded_models`

Running the example code and watching `watch -n .3 nvidia-smi` you can see that the memory keeps increasing and is not released on the gpu.

Did i miss something here?

```python
model = EasyNMT("opus-mt", max_loaded_models=1)

model.translate("Hallo, das ist ein Satz.", target_lang="en", source_lang="de")
model.translate("Hallo, das ist ein Satz.", target_lang="fr", source_lang="de")

time.sleep(3)
gc.collect()
torch.cuda.empty_cache()
time.sleep(3)

model.translate("Hallo, das ist ein Satz.", target_lang="nl", source_lang="de")
model.translate("Hallo, das ist ein Satz.", target_lang="it", source_lang="de")
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu memory does not get released with `max_loaded_models` #92

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

gpu memory does not get released with max_loaded_models #92

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

gpu memory does not get released with `max_loaded_models` #92