- Standardized operator-manual structure is now the active house style for manual refinements.
descriptionshould describe when to use the tool, not summarize the workflow.- Real CLI behavior often differs from autogenerated assumptions; wrapper quirks belong in
Guardrails.
- Many bedtools wrappers reject GNU-style
--help/--versionand behave better with-h. bedToIgvwrites the IGV batch script to stdout;-pathcontrols snapshot directory inside IGV, not the script file destination.tagBamwrites tagged BAM to stdout and requires a payload mode such as-labels,-names, or-scores.subtractBed -wo/-wbchanges output semantics into diagnostic layouts; those outputs are not plain trimmed interval files.
- CPU-specific wrappers mostly share the same operational semantics; differences are primarily binary build/ISA related.
- Compressed read inputs still require
--readFilesCommand zcator equivalent. - Genome FASTA files for index generation must remain uncompressed.
wgsim_eval.plis a multi-command evaluator, not a single-purpose command.vcfutils.plandsamtools.plare command multiplexers with legacy workflows centered on older SAMtools/BCFtools conventions.split-at-intronis an EDirect shell filter that consumes a tag/value stream from stdin, not a generic genomic interval file.hmmsimstudies HMM score distributions on random sequences; it is not a general biological sequence generator.
- Bedtools family has been fully standardized and structurally validated.
- STAR / STARlong plain and CPU-specific wrappers standardized and validated.
- Recent helper-script batch standardized and validated.
- ViennaRNA tools were missing only the standardized
## When To Use This Tooland## Common Patternssections; the rest of their operator-manual structure was already stable. RNAsnoopuses--helpfor usage output. The short option-his a real algorithm parameter (--minimal-stem-length), soRNAsnoop -hfails with an argument error.- The skill folder
rnapkplexmaps to the executableRNAPKplex; the case mismatch is real and worth documenting to avoid false "binary not found" assumptions. RNAplexis most reusable as a query-vs-target scanner with optionalRNAplfoldaccessibility directories rather than as a generic cofolding tool.RNAdosis landscape-summary oriented: it counts structures per energy band rather than enumerating individual suboptimal folds.- Bowtie2 direct executables consistently warn that the wrapper scripts (
bowtie2,bowtie2-build,bowtie2-inspect) are preferred. Those warnings should be documented rather than treated as failures. bowtie2-inspectcan force large-index inspection with--large-index, while the-sand-lexecutables pin the small/large format explicitly.- HISAT2 core wrapper binaries follow the same pattern: direct execution works, but
hisat2,hisat2-build, andhisat2-inspectwrapper scripts are the recommended public entry points. - In HISAT2, spliced alignment is on by default for aligners; fragment-length controls
-I/-Xbecome relevant only with--no-spliced-alignment. hisat2-inspectcan export embedded graph annotations (--snp,--ss,--ss-all,--exon), making it more than a simple FASTA/name inspector.- HISAT2 helper scripts mostly reject
--version;hisat2_read_statistics.pydefaults to sampling 10000 reads, and-n 0means scan the whole file rather than zero reads. hisat2_extract_snps_haplotypes_UCSC.pywrites<base>.snpand<base>.haplotype, plus.ref.testset.faand.alt.testset.fawhen--testsetis used.hisat2_extract_snps_haplotypes_VCF.pywrites<base>.snpand<base>.haplotype, and--extra-filesadditionally emits.refplus_backbone.fa.hisat2_simulate_reads.pylocally emits a non-fatal PythonSyntaxWarningbefore-husage text, and paired-end simulation writes<base>.sam,<base>_1.fa, and<base>_2.fa.- Easel tools generally prefer
-h; most reject--helpand--version, whileesl-mixdchletis a notable exception because top-level--versionworks and the command is really a subcommand dispatcher. esl-constructrebuild modes (-x,-r,-c,--indi,--ffreq,--fmin) all require-o, so they are edit-and-write operations rather than pure inspection.esl-histplotshows a real local docs mismatch: the installed man page says the default output is a survival plot, but the live binary emits histogram-style XY output unless--survis set.esl-maskconsumes a three-columnseqname start endmask file; order must match the sequence file unless-Ris used with an SSI index.esl-selectnis line-level reservoir sampling, not sequence-record sampling.esl-seqrangerequires an SSI-indexed input file, and runtime testing confirmsprocidxis 1-based.esl-shufflehas a smaller docs mismatch: live-hadvertises RNA as the default alphabet in-Gmode, while the man page advises choosing--rna,--dna, or--aminoexplicitly.esl-ssdrawonly draws the first alignment in a Stockholm file and cannot reuse its generated PostScript output as a fresh template.esl-weightcannot currently start in this environment becauselibopenblas.so.0is missing; local man-page and binary-string evidence confirm-g,-p, and-bas the documented weighting modes.hmmbuildis also blocked locally by the same missinglibopenblas.so.0, but the man page confirms important behavior that autogenerated docs missed:msa_filemay be-,hmm_outmay not, and-nonly works for single-alignment input.hmmconvertis a stdout-emitting format converter;-2is a legacy HMMER2 compatibility path, while--outfmt 3/athrough3/fselects named HMMER3 ASCII revisions.hmmemithas a real short-help mismatch: usage text implies a single-HMM input, but runtime testing with a multi-model library emitted one sample per model successfully.hmmlogodoes not render a final image. The real default output is plain text tables beginning with values such asmax expected height = ...andResidue heights.hmmpgmdis a master-worker daemon layer in front ofphmmer,hmmsearch, andhmmscan; its sequence database input must already be in hmmpgmd format, which the man page ties back toesl-reformat.- The
hmmpgmd-shardskill maps to the real executablehmmpgmd_shard. In shard mode, only sequence databases are sharded, and--num_shardson the master must equal the worker count. - Subread-family help handling is inconsistent:
--helpand--versionare not true switches for most commands, and some tools print usage only after complaining about the invalid option. subread-fullscanis a single-read diagnostic scanner, not a FASTQ aligner. Its final argument is a literal read sequence string.sublongrequires a full one-block Subread index and uses-vas its real version flag;--helpand--versionprint usage text but still end as unrecognized-option paths.
blast2sam.plis not a general BLAST formatter. The bundled POD and source show it is specifically a parser for legacy default-formatblastntext output.blast2sam.pl --helpworks only because PerlGetopt::Stdinjects generic help/version handling;-helpis parsed as-h -e -l -pand fails.blast2sam.pl -sprints the aligned query sequence, not necessarily the original raw read sequence, and-demits dummyIquality characters so downstream SAM consumers can tolerate the record.blast2sam.plemits headerless SAM and silently drops unaligned queries instead of outputting unmapped SAM rows.tblastn_vdbfollows normal BLAST+ single-dash controls:-helpand-versionwork, while prior autogenerated--help/--versioncaptures were wrong for the live binary.- Bare
tblastn_vdbinvocation fails withMust specify at least one SRA/WGS database, making a missing-dbthe first diagnostic check. tblastn_vdb -dbexpects an SRA or WGS source name rather than a standard local BLAST database path.tblastn_vdb -sra_modehas materially different search universes:0unaligned reads,1aligned reference sequences,2both.ace2samwrites the SAM body to stdout but emits header text on stderr, and its ACE parser expects strict block ordering plus matchingAF/RDread order.bowtie2sam.plhas no true CLI help:--helpand--versionare treated as filenames. It also chooses a single best Bowtie hit per read-name block rather than emitting all alignments.export2sam.plis one of the few legacy helper scripts in this cluster with real long-option help.--read1is mandatory,--nofilterkeeps failed-purity reads, and--qlogoddsis only for pre-1.3 Solexa-style quality encoding.maq2sam-longandmaq2sam-shortexpose almost no self-description beyondUsage: maq2sam <in.map> [<readGroup>];--helpand--versionare just treated as filenames.novo2sam.plskips comment/QC/NM-style lines and silently drops alignments whose status is notU, so it is not a lossless Novoalign-to-SAM converter.psl2sam.pluses-a/-b/-q/-ronly for computingAS:iand explicitly does not emitNCIGAR operators for intron-like gaps.soap2sam.plandzoom2sam.pluse-ponly to interpret mate relationships; they assume mate adjacency in the input rather than repairing arbitrary reordering.zoom2sam.plrequires a manual read-length argument and emits*for sequence and quality fields because that information is not recovered from the handled Zoom format.interpolate_sam.plis not a generic SAM interpolator; it builds an interpolated per-base count track from a sorted SAM file and assumes simpleM/I/DCIGARs plus an older MAQ-style reference naming convention.sam2vcf.plexpects legacysamtools pileup -cinput on stdin and emits VCFv3.3;--helpworks but--versiondoes not.disambiguate-nucleotidesis a stdin/stdout shell filter that expands IUPAC ambiguity codes, uppercases output, and silently waits for input instead of providing built-in help text.- Most remaining ViennaRNA executables in this batch use the modern
--help/--versioninterface cleanly, unlike many older samtools helper scripts. RNAcofoldandRNAmultifoldboth treat&as the strand separator and continue reading batch input until a single@line or EOF.RNAconsensusis a shell wrapper around a Python implementation.--helpworks, but--versionis not implemented and errors out.RNAconsensus hardconscan consume either RNAalifold stdout or an RNAalifold dot plot, then emit per-sequence constraints suitable for piping intoRNAfold -C.RNAdistancereads structures from stdin, and-B[=file]writes an aligned backtrack of matching substructures.RNApvmincannot start in this environment becauselibopenblas.so.0is missing. Binary strings still reveal the real usage: it reads the sequence from stdin and a SHAPE file as the positional argument.RNApvminexpects SHAPE input lines in the form[position] [nucleotide] [absolute shape reactivity]and writes the resulting perturbation vector to stdout while optimization progress goes to stderr.kinwalker --helpworks, but--versionis unrecognized and falls back to usage text.- EDirect
archive-*wrappers are not safe to probe with--helpor--version; those switches usually still trigger the real setup / refresh path and then fail on missing helpers orEDIRECT_LOCAL_ARCHIVE. archive-nihoccdownloads the NIH Open Citation Collection zip and warns the transfer can take hours;dailyrebuilds index/invert layers, while-indexadditionally merges and posts.archive-nlmnlpbuilds a local PubMed concept archive from PubTator Central plus GeneRIF / gene metadata files, and it also needs a localgotoolchain.archive-nmcdsis a full RefSeq NM CDS archiver that creates a master accession list and CDS offset table; its cleanup flags are hierarchical, with-zapremoving the deepest layers.archive-pidsbuilds localPMCIDpostings from PubMed metadata; it is not a simple one-shot identifier extractor.archive-pmcandarchive-pubmedhave real operational submodes like-missing,-verify, and-index; they should be documented as distinct maintenance paths rather than as generic “download archive” commands.archive-pubmedchanges index content when-stemor-strictis used, and mixed-year local state can force a cleanup path before reindexing.archive-taxonomyalso requires a localgocompiler and uses the same destructive cleanup ladder (-cleanthrough-zap) seen in other archive wrappers.combine-uid-listsis justsort -nu "$@": it always returns a numeric-sorted union with duplicates removed.difference-uid-listscomputes the symmetric difference (FILE1 △ FILE2), not a directional subtraction and not an overlap.exclude-uid-listscomputesFILE1 - FILE2, whileintersect-uid-listscomputes the shared set only.- The UID-list helpers sort their inputs internally, so pre-sorting is optional, but original record order is lost.
difference-uid-lists,exclude-uid-lists, andintersect-uid-listshave no clean help/version interface; passing--helpor--versionleaks through tosort/command can still emit missing-file noise.- Many EDirect converters are only thin shell wrappers around
transmute -?2x; using the wrapper by absolute path is not sufficient unless the rest of the EDirect bin directory is also onPATH. - Those thin converter wrappers usually have no meaningful
--help/--versionoutput. In the current environment, most emit nothing;json2xmlis a special case that converts--help/--versioninto literal XML tags. fsa2xmlemits one<FASTA>block per FASTA record, not a single enclosing document for the whole file.ini2xml,toml2xml, andyaml2xmlcurrently emit a<ConfigFile>root for simple mappings.csv2xmlandtbl2xmlare alsotransmutewrappers, but in trivial smoke tests they produced no output instead of a friendly schema error, so representative-input validation matters.jsonl2xmlis a line loop: every JSONL line is independently converted to its own<root>...</root>fragment, so the combined stream is not a single well-formed XML document unless the caller wraps it.gff2xmlis not a simple format toggle; it pipelinestbl2xml,xtract, andtransmute, splitting the semicolon-delimited GFFAttributesfield into nested XML tags.xml2fsaandxml2tblare fixedxtractrecipes overINSDSeqXML and do not pass through positional filenames, so stdin piping is the reliable invocation pattern.xml2fsabuilds FASTA headers from the first available accession / id / locus field plus the definition text.xml2tblproduces a feature-centric table beginning with>Feature <accession>lines followed by interval and qualifier rows.xml2jsonis a Perl stdin-only converter based onXML::SimpleandJSON::PP; in the current environment it cannot start becauseXML::Simple.pmis missing.fill-aasupports--helpbut not--version; it can annotate only selected event classes via-t, expects sorted VCF input, and falls back from the raw-apath to<prefix><chrom>.fa.gz.fill-an-acrecomputes bothACandANfrom genotype columns and hard-codes diploid counting withrecalc_ac_an(2).fill-fsannotates only against the first ALT allele at multiallelic sites, and the-mmask-character switch only affects the next-b,-v, or-ctarget that follows it on the command line.fill-ref-md5depends ontabix,samtools faidx, andmd5sum;-dalone only works if the dictionary already covers all chromosomes needed by the indexed VCF.filter-columnsis just a tab-delimitedawkpredicate wrapper; the expression must be passed as one quoted shell argument, and it injectsYR/DTvariables automatically.filter-genbankandfilter-recordare thintransmutewrappers whose actual filtering rules live insidetransmute; both reject--help/--versionas unrecognized arguments.filter-stop-wordsis line-oriented and drops stop words entirely by default;-plusemits+placeholders instead. The first command-line token is always consumed, so arbitrary replacement arguments are awkward.extract_exons.pyoutputs unique zero-based exon intervals and merges neighboring exons separated by 5 bp or less before printing.extract_splice_sites.pyuses the same exon-merging rule, then emits unique zero-based splice junction boundaries; with-vit prints transcript/exon/intron summary stats to stderr.gbf2facdshas only two real modes: nucleotide CDS (-na, default) and translated protein (-aa). Both reject--help/--version, and the FASTA headers are metadata-rich rather than bare accessions.gbf2fsais justgbf2xml | xml2fsa, so its behavior and dependencies inherit both wrappers.gbf2infoemits structuredGenBankInfoXML with<info>,<feature>, and<sequence>sections, and internally remaps problematic feature names (for example3'UTR->3_UTR) before wrapping them as XML tags.gbf2refis a thintransmute -g2rwrapper; in the current build--help/--versionfall through to the generic “Unable to create GenBank reference indexer” error.gbf2tblisgbf2xml | xml2tbl, so it emits the same>Feature <accession>table structure asxml2tbl, just starting from GenBank flatfile input.gff-sortis a real EDirect pipeline, not a self-contained sorter. It strips all comment/directive lines, depends ontbl2xml,xtract,transmute, andsort-tablebeing onPATH, and hard-codes a feature priority of gene/pseudogene -> RNA-like ->CDS->exon/intron-> everything else.gff2gffis a stdin/stdout bcftools helper that appends missingID,biotype, andNamefields when it can infer them fromgene_id,gene_type,gene_name,transcript_id, andtranscript_type. Even successful runs print a "Fixed N records" summary to stderr.gff2gff.pyis a much more brittle legacy converter than autogenerated docs implied: it requiresgffutilsjust to start, takes a scratch database path as its second positional argument, writes converted GFF3 to stdout, skipsncRNAgroups, and assumes attributes likeNameandlocus_tag.flattenGTFis a Subread binary that falls through to usage text when probed with--helpor--version. It writes SAF to disk, defaults to-t exon -g gene_id, and-Ckeeps exon edges while still producing non-overlapping output.fuse-rangesexpects four tab-delimited columns where column 3 is strand and column 4 is a comma-separatedstart..endlist. It merges adjacent intervals too, silently drops rows whose first field does not begin with[1-9], and can emit a bogus0 0 1sentinel on empty/non-matching input.fuse-segmentsbehaves similarly for simple start/end tables: it normalizes reversed coordinates, merges adjacent segments, ignores everything after column 2, still needssort-tableonPATH, and shares the same bogus0 0 1empty-input behavior.find-in-geneis reallyfind-in-gene <strand> <min> <max>over stdinGENEXML, despite the shell wrapper only checking for two arguments and printing a misleading "must have start and stop position" error. A fourth argument is accepted but unused.accn-at-a-timeis just a lowercase text tokenizer: it splits on anything outside[A-Za-z0-9_.]and does not validate that the resulting tokens are real accessions.align-columnsis tab-delimited pretty-printing aroundtransmute -align;-helpworks, but-versionis noisy because it shells out toeinfo -version, which does not return a clean version string here.between-two-genesis a localawkblock slicer, not a gene-lookup tool. Its arguments are regex patterns, output is inclusive of the boundary rows, and if the second boundary is missing it prints from the first match to EOF.expand-currentis an operational local-archive rebuild script. It deletes previous derived index files, requiresEDIRECT_LOCAL_ARCHIVEplus helpers likepm-collect, and can still stumble through a broken environment while exiting0.gene2rangetakes a chromosome name and turnsDocumentSummaryXML into sortedGENEXML. It emits XML, not TSV, and still depends onxtract,sort-table, andtbl2xmlbeing onPATH.join-into-groups-ofdefaults to batches of10000and emits comma-joined lines. Because it isxargs-based, embedded whitespace inside identifiers is destructive.just-top-hitsdoes not rank by score. It simply keeps the first N first-column groups from an already grouped table, preserving every row inside those retained groups.amino-acid-compositionis line-based and not FASTA-aware. It will happily count letters from FASTA headers unless those lines are stripped first.annot-tsvuses--helpfor usage. The short flag-his a real option for header-row specification and therefore errors without an argument.asn2refis a compactxtractrecipe overSeq-entrycitation content. It emits citation XML blocks and normalizes page ranges down to the first page value.cit2pmiddefaults to remote matching, supports explicit modes (-eutils,-local,-exact,-verify), treats repeated-authorfields as first and last author, truncates page ranges to the first page, and does not implement-help/-versionas metadata flags.download-ncbi-softwareonly special-casesmagic-blast,datasets/dataformat, andsra-toolkit; it also depends onnquire, and in the current Linux x86_64 branch thesra-toolkitcase reaches an empty suffix path that effectively no-ops while still exiting successfully.download-pmcis a bulk operational downloader, not a metadata probe. It defaults to FTP, can switch to HTTPS, iterates bothbaselineandincrtrees acrossoa_comm,oa_noncomm, andoa_other, and deletes tarballs that fail XML verification after retries.exact-snpwraps the real executableexactSNP; no-argument invocation prints usage,-vis the real version flag, and output is VCF even though some legacy examples still suggest.txt.fasta-sanitize.plhandles both FASTA and FASTQ, sanitizes only the first whitespace-delimited token in each header, and prints rename messages only when a record name actually changes.get_species_taxids.shuses underscores in the real executable name, checks foresearch,efetch, andesummarybefore normal usage handling, and treats-tand-nas mutually exclusive output modes.gm2rangesexpects at least five whitespace-delimited columns and preserves minus-strand segments as descendingstop..startstrings, making it a normalization precursor rather than a final interval file.gm2segsdepends onxtract,print-columns,sort-table, andfuse-segmentsbeing onPATH, filters the labelBLASTN - mrnaexactly, emitsRAW/PLS/MNS/CMBreport blocks, and can inherit bogus0 0 1sentinels fromfuse-segmentson empty strand partitions.pair-at-a-timeis a text-stream bigram helper, not a sequencing read-pair tool. It lowercases text, strips punctuation, collapses consecutive duplicates viauniq, and can fail by absolute path ifword-at-a-timeis not also onPATH.color-chrs.plwrites a single<prefix>.svg, requires-p, and hard-codes human chromosomes1..22plusX; it also treats any existing path on the command line as an input file before option parsing.pma2apaconvertsPubmedArticleXML to either a tab-prefixed APA citation line or a structured<APASet>XML record.--helpis not implemented, and-asciican be combined with either text or XML output.pma2pmeconvertsPubmedArticleXML toPubmed-entryASN.1 text by default, with-xmlexposing the intermediate XML and-stdswitching away from the default compact Medline-style author representation.nhance.shonly special-cases four shortcut flags (-pathway,-gene-to-pathway,-litvar,-citmatch). In the current environment, active shortcut calls fail immediately withEscape: command not found, while--helpor plain invocation can exit silently with no output.print-columnsis just anawk "{print ...}"wrapper over tab-delimited stdin. The expression must be shell-quoted,YRandDTare injected automatically, and--helpwith no stdin produces no useful output.print-missing-subrangesassumes ascending one-column integers, reports gaps as inclusivestart-endspans, anchors the sequence at1, and does not infer any terminal upper bound after the last observed value.quote-grouped-elementsdrops blank lines and turns every literal space into a quoted comma separator; it does not escape embedded quotes and is only CSV-like, not a full CSV serializer.qualfa2fq.plrequires exactly two positional arguments, auto-decompresses only by.gzsuffix, does not verify matching FASTA / QUAL record IDs, converts qualities withscore + 33, and wraps only the quality string at 60 characters.qualityScoresis the real executable behind thequality-scoresskill.-h/--helpare treated as invalid or unrecognized options before usage text is shown,-iand-oare mandatory, the default sample size is10000, and output is one comma-separated quality vector per sampled read.guess-ploidy.pyis a pure plotting wrapper aroundbcftools +guess-ploidy -v. It requires exactly two positional arguments, skips comment lines, uses onlySEXrows, and writes a static PNG through Matplotlib'sAggbackend.hgvs2spdidoes not read raw HGVS strings directly; it expects HGVS XML on stdin and optionally a positional transform table file with accession-to-offset mappings. Without that file, it performs liveefetch/gbf2xmllookups to derive CDS-start offsets.ds2pmeis the docsum analogue ofpma2pme: it converts PubMedDocumentSummaryXML toPubmed-entryASN.1 by default and exposes the intermediate XML with-xml.bsmp2infoemits compact XML, not TSV. It lowercasesharmonized_nameattributes into element names, collapses multiple links into one pipe-delimitedLinkfield, and live BioSample fetches can hit NCBI 429 rate limits.genRandomReadsis the real executable behind thegen-random-readsskill.--helpis reported as an unrecognized option before usage text, omitting--totalReadsdefaults to one million reads with a warning, and the tool has a distinct--summarizeFastamode for transcript-length inventory.ct2dbhas a clean modern help interface (-h,--help,-V) and writes extended FASTA with dot-bracket strings to stdout;--no-pkremoves pseudoknots and--no-modifiedreplaces modified bases withN.datatoolis a large NCBI CLI with required-m moduleFilefor real work, detailed-helpoutput, and single-dash long options such as-version; the local binary reportsdatatool: 2.24.0from the BLAST 2.17.0 package build.poptis a ViennaRNA post-filter forRNAsubopt -soutput. The local binary does not expose a normal help flag, but its embedded usage string explicitly statesRNAsubopt -s < seq | popt.clustalw2uses legacy uppercase option conventions (-HELP,-INFILE=...,-ALIGN,-TREE) and a bare invocation drops into an interactive menu rather than exiting with usage.-hand--helpare both wrong.blst2gmis a stdinxtractwrapper, not a standalone BLAST parser. It errors cleanly on empty stdin, filters only annotations labeledBLASTN - mrna, and emits compact tab-delimited rows with pipe-joined multi-value fields.blst2tknsis another tinyxtractrecipe, this time overSeq-align-set_EXML. It is not a generic BLAST text/tabular converter, does not expose real help, requiresxtractonPATH, and with dependencies available but no input it fails withNo data supplied to xtract from stdin or file.ecommon.shis a shared EDirect shell library, not a meaningful standalone CLI. Running it directly or with-versionis silent, while sourcing it exposes version24.0and functions such asParseCommonArgs,RunWithLogging, andWriteEDirect.ecollectis a UID-source normalizer and PubMed SOLR workaround wrapper, not a record fetcher.-helpis unrecognized,-dbis mandatory,-count/-subsetare PubMed-specific query modes, and final UID output is deduplicated withsort -n | uniqso original order is lost.pmc2infoconverts PMC<article>XML to normalizedPMCInfoXML, including an internal mapping of common section titles such as introduction, results, discussion, and methods. It has no real help mode and depends on bothxtractandtransmutebeing onPATH.pmc2biocconverts PMC<article>XML to BioCcollectionXML, and the source explicitly labels itWORK IN PROGRESS. Likepmc2info, it lacks real help behavior and immediately fails ifxtract/transmuteare unavailable.nquireis the low-level EDirect transport layer, not just an EUtils helper.-h/--helpprint real usage text,-versionreturns24.0, direct GET requests to EUtils work, but an FTP listing smoke test againstftp.ncbi.nlm.nih.govfailed here withcurl: (56) response reading failed.AnalyseDistsis the real executable behind theanalyse-distsskill, and its embedded usage string even misspells itself asAnalyseDist.-h,--help, and--versionall route to the same usage error, and a tiny2 x 2stdin matrix smoke test simply echoed the matrix back, even with-Xn.alimaskcannot currently start becauselibopenblas.so.0is missing, so its skill has to rely on binary-string evidence. Those strings show it requires one of--modelrange,--alirange,--model2ali, or--ali2model, that the mapping modes areno postmsa, that stdin requires--informat, and that--handneeds an RF line.AnalyseSeqsis the real executable behindanalyse-seqs.-h,--help, and--versionall print the same usage banner, the installed man page confirms stdin-driven equal-length sequence blocks terminated by@or%, and live four-sequence runs showed-Xb,-Xn, and-Xwcreatingdemo_box.ps,demo_nj.ps, anddemo_wards.psrespectively. A tiny two-sequence smoke test still exited0with no stdout or sidecar output, and the man page explicitly warns that only Hamming distance is well tested.b2cthas no usable help path in this environment: bareb2ctandb2ct -hwere silent. The confirmed happy path is stdin, for exampleprintf '>test\nAAAA\n.... (0.00)\n' | b2ct, which emitted CT rows to stdout beginning with4 ENERGY = 0.0 test. In contrast, positional file invocation such asb2ct fold.outexited0but produced no stdout and no sidecar file in local smoke tests. An invalid sample with mismatched sequence and structure lengths emittedsequence and structure have unequal length, and binary strings reveal a separateunbalanced bracketserror path.md5fais not a whole-file checksum tool. It emits one digest per FASTA record plus aggregate>orderedand>unorderedlines. Live reordered-FASTA tests showed that swapping record order changes>orderedbut leaves>unorderedunchanged.md5fa -his treated as a filename, and an empty FASTA emitted the normal empty MD5 for>orderedbut an all-zero digest for>unordered.md5sum-liteis a minimal HTSlib-backed checksum helper that hashes files or stdin and printsdigest target, using-for stdin.md5sum-lite -his treated as a filename, error messages are prefixed withmd5sum:, and no real help, version, or GNU-style verification mode was evidenced from live tests or binary strings.plot-ampliconstatsis a real Perl script with built-in usage and a hard dependency ongnuplot5.0+. It reads stdin ifFILEis omitted, uses the first positional argument as a filename prefix, and source inspection shows it generates manyprefix-*.gp,prefix-*.png, andindex.htmloutputs. In this environment, a minimal smoke test failed immediately becausegnuplotwas not installed.plot-bamstatsparsessamtools statsoutput (not raw BAM and notflagstat) and has real options such as-p,-m,-s, and-rin the script source, but local execution is currently blocked before help or plotting because Perl cannot loadURI::Escape.pm.plot-vcfstatsaccepts truebcftools stats.vchkfiles, not ad hoc approximations. A real one-record smoke test generatedplot.py,plot-vcfstats.log, PNG panels, and.datfiles, while a fake hand-written input failed the script's sanity check. Adding-Pskipped the PDF stage cleanly; without it, local runs failed becausepdflatex/tectonicwere missing.propmappeddoes not appear to emit useful stdout by default. In local tests, output was captured only with-oand took the formpath,total,mapped,fraction.-fswitches to fragment counting,-prestricts to properly paired fragments, and--help/--versionare both unrecognized even though the binary still printspropMapped v2.1.1.rchiveis a shell dispatcher, not the real binary. It locates a platform-specific executable such asrchive.Linux.--helpprinted the built-in archive/index help in this environment,-versionprinted24.0, and--versioninstead fell through to a no-input error path.ref-cacheis an HTSlib CRAM reference-caching proxy with a clean-hpath and a useful installed man page. It requires-d, uses short options, defaults to the EBI upstream MD5 service unless-Uis set, and the man page states that it exits silently if another instance is already running on the chosen port.ref2pmidis just a one-line wrapper aroundtransmute -r2p "$@". It has no standalone help/version behavior; calling it without stdin simply fails insidetransmute.refseq-nm-cdsis a heavyweight operational script, not a query helper. Source inspection shows it defaults to human if no species is given, supports aliases such asman,mice, andfish, downloads many*.rna.gbff.gzfiles throughnquire, and writes species-specific outputs likehuman_cds.txt. Unsupported arguments such as--helpand--versionare treated as species names and trigger a noisy shell error from a straybreak.reorder-columnsis a tiny tab-onlyawkwrapper. It has no help path, uses 1-based positional column numbers, and can duplicate columns because it simply expands$Nexpressions into anawkprint list.repairalways writes BAM, can accept SAM with-S, and adds dummy mate records for singleton/unpaired reads unless-dis supplied. Local tests showed dummy mates with sequenceNand qualityA, while-dsuppressed them completely.-h/--helpare invalid-option paths that still print the usage bannerrepair Version 2.1.1.easelis a dispatcher-style HMMER/Easel front end, but in this environment the binary cannot start becauselibopenblas.so.0is missing.readelfconfirms dependencies onlibgsl.so.25,libopenblas.so.0, andlibmpi.so.40, while binary strings still expose the intended top-level interface:easel -h,easel --version,easel <cmd> -h, andeasel <cmd> [<args>...].plot-roh.pydoes not accept rawbcftools rohoutput by itself. The source requires gzipped*.txt.gzfiles containing bothGTrows and eight-columnRGrows, and explicitly says extrabcftools querygenotype lines may be needed. A minimal mixedGT/RGsample produced a3000 x 150PNG locally, while anRG-only sample crashed withIndexErroratrow[7].removeDupis a thresholded location-purge tool, not a duplicate marker. With three reads at one locus and-r 2, local testing removed all three reads.-hand--versionare invalid-option paths that still dump the usage bannerremoveDup Version 2.1.1, and BAM remains the default output unless-Sis set.run-ncbi-converteris a Perl bootstrap wrapper for downloadable NCBI converter binaries. It hardcodesftp.ncbi.nlm.nih.gov, caches into~/.cache/ncbi-convertersunlessNCBI_CONVERTER_DIRis set, and treats the first positional argument as the converter basename used to construct<name>.<platform>.gz. There is no safe local help/version path because even-himmediately attempts FTP access; in this workspace that failed withUnable to connect to FTP server: Bad file descriptor.run-roh.plis more than a simple loop overbcftools roh. It normalizes chromosome names, optionally annotates AF1KG frequencies, writes per-input.bcfintermediates plus.txt.gzand.logoutputs, appends genotype rows viabcftools query, and then merges ROH presence intooutdir/merged.txt.--versionis not implemented.skip-if-file-existsis a newline-delimited path filter, not a conditional command launcher. It simply echoes paths whose-ftest is false. Existing regular files are silently suppressed, while directories still pass through because the script checks only-f.snp2hgvsis a tinyxtract | transmutewrapper over dbSNP docsum XML, not a free-form SNP string converter. With real docsum input for rs104894914 it emitted structured<HGVS>XML containing multiple variant representations, including genomic and coding forms.snp2tblis only three lines of shell, but the composition matters:snp2hgvs | hgvs2spdi "$@" | spdi2tbl. Its-hbehavior is unsafe because argv is forwarded downstream, which locally triggeredcat: invalid option -- 'h'before the no-inputxtractfailure.sort-by-lengthis just a Perl line sorter:print sort { length($a) <=> length($b) } <>. It does not understand FASTA records or other multi-line biological structures.sort-tableis a very thin wrapper around GNUsortwith a forced tab delimiter and an unconditionalgrep '.'prefilter, so blank lines are always dropped.-his passed through as a human-numeric sort flag, not as help.sort-uniq-countdoes not require pre-sorted input because it sorts internally. The wrapper always performs case-insensitive grouping viauniq -i -c, defaults tosort -f, and rewrites the result ascount<TAB>value.sort-uniq-count-rankadds a finalsort -k 1,1nr -k "2$flags"stage on top ofsort-uniq-count, so counts are always ranked descending first. Its apparent help/version flags are unsafe because argv is repurposed into compact sort-flag letters.spdi2tblis a wrapper overxtracton<SPDI>XML, followed bysort-table,cut, anduniq. It emits 8-column rows shaped likersid accession position deleted inserted class type gene, and its class ordering is explicitly normalized asGenomic,Coding,Protein.tbl2prodis not an NCBI feature-table converter despite its name. It expectsspdi2tbl-style 8-column rows, skips genomic variants, fetches the relevant nucleotide or protein record from NCBI, and emits reference (:+) plus altered product rows as a 3-column table after sorting/cutting away the protein ID.test-edirectis a long-running smoke/demo harness for the full EDirect stack. With no arguments it printsEDirect 24.0, platform info, and many titled example sections such asINFO HELP,FIELD EXAMPLE, andLINK EXAMPLE.-testis a special traced pipeline mode, while-h/--versionare unrecognized.test-eutilsis a smaller endpoint-health checker for E-utilities. Help text is real,--versionis not, and-aliveemits a mode header followed by progress dots or failure markers. In a bounded local run,test-eutils -aliveproduced....before the external timeout fired.test-pcremaps to the real executabletest_pcre, which is the standardpcre2testCLI.-helpand-versionboth work, and a minimal stdin script/abc/plus subjectabcproduced a successful0: abcmatch.test-pmc-indexhas no argument parser at all. It always runs a random-ID PMC title roundtrip usingxfetch -db pmcandxsearch -db pmc -title. WithoutEDIRECT_LOCAL_ARCHIVE, it still stumbles intorchive-level errors after printing the missing-environment warning.test-pubmed-indexis even more environment-sensitive: it sourcesxcommon.sh, looks for archive/postings/data folders, exercisesxfetch,xsearch,xinfo,cit2pmid -local, and a localmeshconv.xml-based MeSH climb. WithEDIRECT_LOCAL_ARCHIVEunset, it emits a noisy mix of repeated path errors rather than stopping cleanly.word-at-a-timeis justsed 's/[^a-zA-Z0-9]/ /g; s/^ *//' | tr 'A-Z' 'a-z' | fmt -w 1. It strips punctuation/underscores, lowercases everything, and emits one token per line.xcommon.shis a shared implementation library for the local-archivex*tools, not a meaningful standalone command. Key functions includeFindArchiveFolder,FindPostingsFolder,FindDataFolder,ParseStdin, andGetUIDs, all of which shape the behavior of sibling wrappers.xfetchis a local archive retrieval wrapper overrchive -fetch/rchive -stream, not a remoteefetchclone. WithoutEDIRECT_LOCAL_ARCHIVE, it can still print outer XML wrappers before failing insiderchive.xfilteris a local postings query helper. It tokenizes incoming UIDs withword-at-a-timeand then callsrchive -query; it is not a remoteefilterreplacement.xinfois a local postings inspector. In particular,-fieldsliterally lists directories inside the postings folder. WithEDIRECT_LOCAL_ARCHIVEunset, the failedcd "$postingsBase"can accidentally fall through and list the current working directory.xsearchis the local-search counterpart in the same stack.-querywraps hits inENTREZ_DIRECTXML unless-rawis set,-match/-exact/-titledelegate directly torchive,-words/-pairstokenize viaword-at-a-timeplusfilter-stop-words, and omitting-dbdefaults topubmed.xlinkis a local link resolver overrchive -link, not a remote Entrez linker. It accepts UIDs from stdin,-id,-input, or an upstreamENTREZ_DIRECTmessage;-targetis mandatory; and the currentxlink.inionly defines[pubmed] CITED=pubmed,CITES=pubmed, andPMCID=pmc.xa2multi.plis a minimal Perl SAM filter for BWAXA:Ztags. It prints the original line unchanged, emits one secondary SAM record per alternate hit, copies the mismatch count intoNM:i, reverse-complements sequence/qualities if orientation flips, and explicitly leavesTLEN/ISIZEuncomputed.uniq-tableis a column-pruning AWK script, not a row deduplicator. It uses row 2 as the baseline, marks a column as interesting only when some later row differs, and therefore removes columns that are invariant from row 2 onward. Runninguniq-table -helpjust exposes genericgawkhelp.run_with_lockis a compiled NCBI locking helper, but the local installation is incomplete: bare invocation fails withUnable to exec get_lock. Binary strings and official NCBI source still expose the real option surface:-base,-getter,-log,-map,-reviewer, and a standalone!marker that suppresses exit-status propagation.--helpand--versionare merely reported as unsupported options.seq_cache_populate.plbuilds htslib/CRAMREF_CACHEtrees by uppercasing and whitespace-stripping FASTA sequences, hashing them by MD5, and writing them under hex-split subdirectories. It supports direct FASTA arguments,-find <dir>, or stdin; reruns printAlready exists; and-subdirs 16is rejected even though the error message misleadingly says “less than 15”.subindelexposes only a usage-banner interface: bare invocation prints usage,-his an invalid option, and--versionis unrecognized. The live banner documents-i,-g,-o,-d,-I, and--paired-end, while binary strings suggest the-ovalue is treated more like an output prefix (%s.indel.vcf) than a literal VCF filename.STARlongis a shell dispatcher overSTARlong-avx2,-avx,-sse4.1,-ssse3,-sse3, and-plain.bash -x STARlong --versionshowed this host selectingSTARlong-avx2. Help output is the generic STAR manual, not wrapper-specific text, and reports version2.7.11bwith earliest compatible genome index2.7.4a.project_tree_builderis a compiled NCBI Unix C++ tree generator with full live help and version output. The help text explicitly says the root should end withc++, the subtree can be either a path or a file list, and the tool still requires asolutionargument. Local-dryruntesting with placeholder arguments exited silently with status0.roh-vizis a Perl HTML generator overbcftools rohplusbcftools query. Source inspection shows both-i(ROH file) and-v(VCF/BCF) are mandatory, onlyRGrows from the ROH file are plotted, and the built-in example/error text is wrong because-ris actually the regions filter, not the ROH input flag.systematic-mutationsis a stdin-only bash wrapper that uppercases the first whitespace-delimited field, substitutes every position withA/C/G/Tviatransmute -replace, optionally appends the second field as:<pattern>, then case-insensitively deduplicates the emitted variants. Command-line flags are ignored.vrfs-variancesis a stdin parser forSITErows frombcftools/vrfs. In default mode it printsMEANandVAR2to stderr but also emits the terminal selectedSITErow to stdout. In-smode the current code can duplicate that last site. In-vmode it emits only the variance vector, one value per line.Biomniis present as a local repo andimport biomniworks if the repo is added tosys.path, but deeper tool imports currently fail becauselangchain_coreis missing. The repo expects a dedicatedbiomni_e1environment and API-key configuration for agent workflows.evo2is present as a local repo with examples and aphage_gensubproject, but direct import fails becausevortexis missing and the Docker image is not built yet.RFdiffusionis present as a local repo withtest_rfdiffusion.sh,design_ebola_binder.sh, andmodels/, but the Docker image is not built yet.protein-structurein this workspace is best treated as a gateway/planning skill, not a ready predictor stack:colabfold_batch,pymol,chimera,chimerax,foldseek,mmseqs, andfpocketare all absent fromPATH.sequence-analysiscan safely route to real installed tools:blastn,blastp,makeblastdb,bowtie2,bwa,samtools,hmmscan,hmmbuild,mafft,muscle,hisat2,featureCounts,prodigal,RNAfold,seqkit, and Biopython.bioinformatics-toolkitis best documented as an umbrella router over the verified local CLI families plus repo-backed projects, not as a monolithic executable.phage-designis a local subproject underevo2/phage_gen, with a real Python pipeline script and config template. Its bundled Slurm launcher contains placeholder/path/to/...values and is not runnable as-is.yeast_databaseis a local learning project underprojects/yeast_genome_learning; its primary entrypoints are the staged Bash scripts, whilescripts/pipeline.py --stepsis a compatibility helper that works locally.