-
Notifications
You must be signed in to change notification settings - Fork 1
Moving data
Moving or copying files on a local machine or between remote servers is a skill well worth practicing.
Most programs for copying files use the following syntax:
<Copy-program> <source> <destination>
where <Copy-program> is the name of the program used for copying the data (e.g. cp, scp, wget or rsync), <source> is the file that should be copied, and <destination> is where the data should be copied to.
This is the standard command for copying files from one part of a filesystem to another. Even though you have most likely used this command before, it is worth checking out the manual pages for cp to learn how to copy files recursively (cp -R) or how to preserve file attributes (cp -p).
This command is used for copying files between computers using an encrypted connection (scp = secure copy). The syntax is the same as for cp except that <source> and <destination> may contain a username and address of a remote computer. Here is an example:
topel@Slartibartfasts:~$ scp alignment.fst mtop@albiorix.bioenv.gu.se:.where I'm copying the source file alignment.fst to the destination mtop@albiorix.bioenv.gu.se:.. Note the similarity to the username and address of Albiorix you used when connecting using ssh. One small but important difference is the characters :. at the end of the address. The string to the left of the : character is the address of the computer I'm copying the file to, and everything on the right side is the path on the remote filesystem where I want to save the copy. Here, the path condensed to a single . representing my home directory, which is the first place I will end up when logging in to Albiorix.
The simplest way of moving data from Albiorix (or another remote server) is to put the file that should be moved in your home directory like this:
[mtop@compute-0-8 mats]$ cp Result.pdf ~In this example, the file called Result.pdf is being moved. The character ~ is a shortcut for your home directory. The next step is to open up a new terminal window on your computer, and run the following command:
topel@Slartibartfasts:~$ scp mtop@albiorix.bioenv.gu.se:Result.pdf .The scp command, as you know, is using the syntax scp <source> <destination> and here the <source> contains the username and the address of Albiorix (mtop@albiorix.bioenv.gu.se) followed by a :, and the name of the file to copy. The destination is abbreviated to a . which represents the current working directory on your local machine (and is most likely your home directory there).
scp is a very powerful and often-used command for moving data between computers, but it suffers from one flaw that makes it unsuitable for moving large files. If the transfer is interrupted (which it is likely to be if large files are moved and/or the network is unstable), scp is unable to restart where it stopped, and will instead start from the beginning again. If you want to move large files, it is therefore better to use the program rsync that only sends the differences between a set of files over the network. This also makes rsync a useful tool for creating and updating backups of your data. The main part of your files will only be copied once, and only the changes you make to the data will be copied next time you run rsync.
Here is an example command that can serve as a starting point:
topel@Slartibartfasts:~$ rsync -hav <TARGET> <DESTINATION>Here, <TARGET> is the file that should be transferred (this can be on another network-attached computer) and <DESTINATION> is where it should be copied to.
The behaviour of rsync can be modified with a variety of options and you can read more about these here.
Downloading data from the web can be done using several different programs. The most common method is to direct a regular web browser to a URL - from there, the download either starts automatically, or you have to save the content displayed manually. This can sometimes be a complicated procedure (e.g. have a look at this file, and imagine the steps required to download it to a remote server). A simpler method is to use the wget program, which will download the file to your working directory in a single step:
[mtop@albiorix files]$ wget --no-check-certificate https://raw.githubusercontent.com/DeWitP/Bioinformatic_Pipelines/master/sequences.fasta