Skip to content
This repository was archived by the owner on Mar 30, 2020. It is now read-only.
This repository was archived by the owner on Mar 30, 2020. It is now read-only.

Finding files created by executable #16

Description

@seandavi

I am working with a software package that takes as a parameter the name of an output directory. Then, a bunch of files are written to that output directory. I cannot see a way to specify those files as outputs since dockerflow appears to ignore the prefix when specifying the outputFile. Here is what the output directory structure looks like.

index/
├── hash.bin
├── header.json
├── indexing.log
├── quasi_index.log
├── rsd.bin
├── sa.bin
├── txpInfo.bin
└── versionInfo.json

Here is the code I am trying to use (with just one of the files above as an example):

	static Task salmonIndex = TaskBuilder.named("salmonIndex")
				.inputFile("fasta")
				.outputFile("indexVersion","index/versionInfo.json")
				.docker("seandavi/salmon")
			    .preemptible(true)
			    .diskSize("20")
			    .memory(14)
			    .cpu(2)
				.script("salmon index --index=index --transcripts=${fasta}")
				.build();
	
	static WorkflowArgs workflowArgs = ArgsBuilder.of()
			.input("fasta", "${fasta}") 
			.output("indexVersion", "${salmonIndex.indexVersion}")
			.build();

And here is the error I am getting. Note that the gsutil cp fails because the index/ in the path appears to be ignored.

(d81d2f3a5be0ea0c): java.lang.RuntimeException: 
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: 
com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: 
com.google.cloud.dataflow.sdk.util.UserCodeException: 
com.google.cloud.genomics.dockerflow.runner.TaskException: Operation 
operations/ENbjmKaCKxj6h6-zxaC7zokBIL3n-N_FEioPcHJvZHVjdGlvblF1ZXVl failed. Details: 10: 
Failed to delocalize files: failed to copy the following files: "/mnt/data/100346066-versionInfo.json 
-> gs://gbseqdata/ockerflow_example/ch1/salmonIndex/index/versionInfo.json (cp failed: gsutil -q -
m cp -L /var/log/google-genomics/out.log /mnt/data/100346066-versionInfo.json 
gs://gbseqdata/ockerflow_example/ch1/salmonIndex/index/versionInfo.json, command failed: 
CommandException: No URLs matched: /mnt/data/100346066-
versionInfo.json\nCommandException: 1 file/object could not be transferred.\n)" at 
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:162

I am likely just misunderstanding some pieces here, but I thought I would just go ahead and ask.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions