Secure record linkage and data transfer using private set intersection.
This repository contains two applications, a web-based one that allows peer-to-peer exchanges and a command line one that uses SFTP as an intermediary.
- Install Node.js and NPM
- Run
npm install . -w packages/base-lib -w apps/web - Run
npm run -w packages/base-lib build - Run
npm run -w apps/web dev - Visit http://localhost:3000
See apps/web for more details.
This app has a pre-built Docker image that can be used.
To link a file:
- Install Docker Desktop.
- Within the Docker terminal (or in a Windows/Mac/Linux terminal window), run:
docker pull vdorie/psi-link:latest
docker run \
--rm --mount type=bind,src=WORK_PATH,dst=/work \
vdorie/psi-link:latest \
sftp://SFTP_USER:SFTP_PASSWORD@SFTP_HOST:SFTP_PORT/SFTP_PATH \
INPUT_FILE OUTPUT_FILEReplacing each of the following:
WORK_PATH- relative or absolute path on your host machine where the data is and the output should be writtenSFTP_USER,SFTP_PASSWORD,SFTP_HOST,SFTP_PORT- standard SFTP connection informationSFTP_PATH- path from the root of the SFTP server where both parties can read and write; the exchange will happen hereINPUT_FILEOUTPUT_FILE- unless an absolute path is specified, the output file will be written inWORK_PATH
The only content accessible to the container will be that in WORK_PATH, so you are recommended to make a new directory and place it in the file you wish to transfer.
The output file will contain the association-table mapping between each partners' data. It is formatted a csv with columns our_row_id and their_row_id. Each is a 0-based index into each dataset, giving the correspondence between rows in each dataset.
For more information, see apps/cli.
Special characters in passwords can be interpretted incorrectly by your shell. To avoid this, encase the whole connection string in single-quotation marks or escape the problematic characters. As an example of an exchange running from the current directory (indicated by mounting $PWD, or printing the working directory):
docker run --rm --mount type=bind,src=$PWD,dst=/work vdorie/psi-link:latest \
'sftp://user:passw!rd@example.org/psi' input.csv output.csvor
docker run --rm --mount type=bind,src=$PWD,dst=/work vdorie/psi-link:latest \
sftp://user:passw\!rd@example.org/psi input.csv output.csvConnection parameters can also be specified individually as command line flags to the script. Among others, they include:
--server-port- port number of the server--server-username- username for authentication--server-password- password for password-based user authentication; use@pathto read from file--server-private-key- buffer or string that contains a private key for either key-based or hostbased user authentication (OpenSSH format); use@pathto read from file--server-passphrase- for an encrypted private key, this is the passphrase used to decrypt it; use@pathto read from file
Using @paths specifies that the value should be read from a file. For example, to have the script read a password from the file passwd in the working directory, run:
docker run --rm --mount type=bind,src=$PWD,dst=/work vdorie/psi-link:latest \
sftp://user@example.org/psi \
--server-password=@passwd \
input.csv output.csvNote that because Docker prevents the container from accessing any path on your host system that isn't explicitly mounted, if you wish to use a pre-existing private key the program cannot access ~/.ssh by default. In that case, either add a read-only mount to the key folder or copy the key to the working directory.
Docker for Windows requires that the Windows Subsystem for Linux be installed. Docker will ask you to install this the first time it starts up.
To execute commands, launch a terminal from within Docker Desktop by clicking on the >_ icon on the lower-right of the application's status bar.
Paths can be given to Docker using standard Windows-style back-slashes. One exception is at the very end of the string - a trailing back-slash can cause Docker to fail to understand the end of the string. It is safe to remove it as it will still be treated as a directory.
Additionally, the line-continuation markers given in the examples (the \ at the end of each line) above do not parse correctly. Put commands all on one line instead. For example:
docker run --rm --mount type=bind,src='C:\Users\me\Documents\psi-link',dst=/work vdorie/psi-link:latest sftp://user:password@example.org/psi input.csv output.csvThe docker run command contains two parts. The first part includes instructions purely for Docker, telling it what to run and how:
docker run --rm --mount type=bind,src=WORK_PATH,dst=/work vdorie/psi-link:latestThis instructs Docker to:
- run a container
- remove the container when finished, deleting any intermediate artifacts
- mount a path on the host computer inside the container, where it can be read from and written to
- Use the latest tag of the vdorie/psi-link image as the container
The second part is the invocation of the psi-link script and includes any command line options you wish to use. In the first example above it is:
sftp://SFTP_USER:SFTP_PASSWORD@SFTP_HOST:SFTP_PORT/SFTP_PATH INPUT_FILE OUTPUT_FILEHowever, you can place anything here you wish to pass on to the program. For example, to have it print all of its options, execute:
docker run --rm vdorie/psi-link:latest --help