Confusion about encryption

Hello,
Thanks for making your repository publicly available.

I find myself a bit confused, and after having read the whitepaper and codebase I still find myself at a loss.

I tried this example:

```python
import torch

from transformers import  AutoModelForSequenceClassification, AutoTokenizer

# Initialize model and tokenizer
model_name  =  "nesaorg/distilbert-sentiment-encrypted"
model  =  AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer  =  AutoTokenizer.from_pretrained(model_name)

print("Test input 1:")
inputs  =  tokenizer("I feel much safer using the app now that two-factor authentication has been added", return_tensors="pt")
print(inputs)

print("Test input 2:")
inputs = tokenizer("I do not feel much safer now", return_tensors="pt")
print(inputs)
```
Output:

```
Test input 1:
{'input_ids': tensor([[  101, 21666,  7721, 27061,   310, 22734,  1482, 17557, 18129, 18575,
         19416, 19357, 16407, 19291,   709, 10564,  6508,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Test input 2:
{'input_ids': tensor([[  101, 21666,   842,  5552,  7721, 27061,   310, 18129,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}
```

I am not sure I fully understand where the encryption is coming from. My understanding is that the inputs to the model should be protected from the model provider - however, the above tokenization demonstrates that there is a one-to-one mapping between the plaintext tokens and the corresponding ids (e.g. 'I' -> 21666, 'feel' -> 7721). This makes sense given the implementation of the HF tokenizer, but implies that the model provider can trivially recover the plaintext from the user that is supposed to be private. If the tokenizer is secret from the server, a simple statistical analysis based approach can recover the token mappings.

Moreover, the model itself seems to just be a normal distilbert architecture, but with different weights. Hence, I am a bit confused by this example - where is the encryption being applied?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about encryption #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Confusion about encryption #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions