Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions README.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -85,4 +85,44 @@ Note that download_and_filter_wikidata and download_and_filter_pageviews take se

4. Commit `dvc.lock` to git.


## Uploading tree to server

1. If you are running the tree building scripts on a different computer to the one running the web server, you will need to push the `completetree_XXXXXX.js`, `completetree_XXXXXX.js.gz`, `cut_position_map_XXXXXX.js`, `cut_position_map_XXXXXX.js.gz`, `dates_XXXXXX.js`, `dates_XXXXXX.js.gz` files onto your server, e.g. by pushing to your local Github repo then pulling the latest github changes to the server.

2. (15 mins) load the CSV tables into the DB. Use the script generated in `data/output_files/import_XXXXXX.sql` to truncate and repopulate ordered_leaves/nodes/etc.

```
echo "SET GLOBAL local_infile=ON;" | mysql -p OneZoom_dev
mysql --local-infile --host localhost --user onezoom --password --database OneZoom_dev < data/output_files/import_XXXXXX.sql
```

3. Check for dups, and if any sponsors are no longer on the tree, using something like the following SQL command:

```
select * from reservations left outer join ordered_leaves on reservations.OTT_ID = ordered_leaves.ott where ordered_leaves.ott is null and reservations.verified_name IS NOT NULL;
select group_concat(id), group_concat(parent), group_concat(name), count(ott) from ordered_leaves group by ott having(count(ott) > 1)
```

### Fill in additional server fields

11. (15 mins) create example pictures for each node by percolating up. This requires the most recent `images_by_ott` table, so either do this on the main server, or (if you are doing it locally) update your `images_by_ott` to the most recent server version.

```
${OZ_DIR}/OZprivate/ServerScripts/Utilities/picProcess.py -v
```

1. (5 mins) percolate the IUCN data up using

```
${OZ_DIR}/OZprivate/ServerScripts/Utilities/IUCNquery.py -v
```

(note that this both updates the IUCN data in the DB and percolates up interior node info)

1. (10 mins) If this is a site with sponsorship (only the main OZ site), set the pricing structure using SET_PRICES.html (accessible from the management pages).
1. (5 mins - this does seem to be necessary for ordered nodes & ordered leaves). Make sure indexes are reset. Look at `OZprivate/ServerScripts/SQL/create_db_indexes.sql` for the SQL to do this - this may involve logging in to the SQL server (e.g. via Sequel Pro on Mac) and pasting all the drop index and create index commands.



For detailed step-by-step documentation, see [oz_tree_build/README.markdown](oz_tree_build/README.markdown).

Large diffs are not rendered by default.

21 changes: 11 additions & 10 deletions dvc.lock
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,8 @@ stages:
deps:
- path: data/OZTreeBuild/AllLife/BespokeTree/include_noAutoOTT/
hash: md5
md5: 8cb57266b725e9893505618bf366af54.dir
size: 1231351
md5: c3c1ebf2453c636e3ffdfcef58722d9c.dir
size: 1231291
nfiles: 56
params:
params.yaml:
Expand All @@ -104,8 +104,8 @@ stages:
outs:
- path: data/OZTreeBuild/AllLife/BespokeTree/include_OT_v16.1/
hash: md5
md5: cfe57e6fbd3572028ac2d83203a96fe4.dir
size: 1534894
md5: c12514e92740949250ecdb4375d6c360.dir
size: 1534814
nfiles: 55
get_open_trees_from_one_zoom:
cmd:
Expand All @@ -115,8 +115,8 @@ stages:
deps:
- path: data/OZTreeBuild/AllLife/BespokeTree/include_OT_v16.1/
hash: md5
md5: cfe57e6fbd3572028ac2d83203a96fe4.dir
size: 1534894
md5: c12514e92740949250ecdb4375d6c360.dir
size: 1534814
nfiles: 55
- path: data/OZTreeBuild/AllLife/OpenTreeParts/OT_required/
hash: md5
Expand Down Expand Up @@ -180,8 +180,8 @@ stages:
deps:
- path: data/OZTreeBuild/AllLife/BespokeTree/include_OT_v16.1/
hash: md5
md5: cfe57e6fbd3572028ac2d83203a96fe4.dir
size: 1534894
md5: c12514e92740949250ecdb4375d6c360.dir
size: 1534814
nfiles: 55
- path: data/OZTreeBuild/AllLife/OpenTreeParts/OpenTree_all/
hash: md5
Expand Down Expand Up @@ -254,8 +254,9 @@ stages:
nfiles: 7
make_js_treefiles:
cmd:
- mkdir -p data/js_output
- make_js_treefiles --outdir data/js_output
- rm -r data/js_output ; mkdir -p data/js_output
- make_js_treefiles --outdir data/js_output
data/output_files/ordered_tree_*.poly
deps:
- path: data/output_files/
hash: md5
Expand Down
7 changes: 5 additions & 2 deletions dvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -193,8 +193,11 @@ stages:

make_js_treefiles:
cmd:
- mkdir -p data/js_output
- make_js_treefiles --outdir data/js_output
- rm -r data/js_output ; mkdir -p data/js_output
- >-
make_js_treefiles
--outdir data/js_output
data/output_files/ordered_tree_*.poly
deps:
- data/output_files/
always_changed: true
35 changes: 0 additions & 35 deletions oz_tree_build/README.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -45,41 +45,6 @@ Then see the section titled "Upload data to the server and check it" below.

Edit `params.yaml` to change the OpenTree version, taxonomy version, build version, etc. DVC will detect the parameter changes and re-run only the affected stages.

### Upload data to the server and check it

8. If you are running the tree building scripts on a different computer to the one running the web server, you will need to push the `completetree_XXXXXX.js`, `completetree_XXXXXX.js.gz`, `cut_position_map_XXXXXX.js`, `cut_position_map_XXXXXX.js.gz`, `dates_XXXXXX.js`, `dates_XXXXXX.js.gz` files onto your server, e.g. by pushing to your local Github repo then pulling the latest github changes to the server.
1. (15 mins) load the CSV tables into the DB, using the SQL commands printed in step 6 (at the end of the `data/output_files/ordered_output.log` file: the lines that start something like `TRUNCATE TABLE ordered_leaves; LOAD DATA LOCAL INFILE ...;` `TRUNCATE TABLE ordered_nodes; LOAD DATA LOCAL INFILE ...;`). Either do so via a GUI utility, or copy the `.csv.mySQL` files to a local directory on the machine running your SQL server (e.g. using `scp -C` for compression) and run your `LOAD DATA LOCAL INFILE` commands on the mysql command line (this may require you to start the command line utility using `mysql --local-infile`, e.g.:

```
mysql --local-infile --host db.MYSERVER.net --user onezoom --password --database onezoom_dev
```

1. Check for dups, and if any sponsors are no longer on the tree, using something like the following SQL command:

```
select * from reservations left outer join ordered_leaves on reservations.OTT_ID = ordered_leaves.ott where ordered_leaves.ott is null and reservations.verified_name IS NOT NULL;
select group_concat(id), group_concat(parent), group_concat(name), count(ott) from ordered_leaves group by ott having(count(ott) > 1)
```

### Fill in additional server fields

11. (15 mins) create example pictures for each node by percolating up. This requires the most recent `images_by_ott` table, so either do this on the main server, or (if you are doing it locally) update your `images_by_ott` to the most recent server version.

```
${OZ_DIR}/OZprivate/ServerScripts/Utilities/picProcess.py -v
```

1. (5 mins) percolate the IUCN data up using

```
${OZ_DIR}/OZprivate/ServerScripts/Utilities/IUCNquery.py -v
```

(note that this both updates the IUCN data in the DB and percolates up interior node info)

1. (10 mins) If this is a site with sponsorship (only the main OZ site), set the pricing structure using SET_PRICES.html (accessible from the management pages).
1. (5 mins - this does seem to be necessary for ordered nodes & ordered leaves). Make sure indexes are reset. Look at `OZprivate/ServerScripts/SQL/create_db_indexes.sql` for the SQL to do this - this may involve logging in to the SQL server (e.g. via Sequel Pro on Mac) and pasting all the drop index and create index commands.

### At last

15. Have a well deserved cup of tea
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@
from dendropy import Node, Tree

from ..images_and_vernaculars.get_wiki_images import get_qid_from_taxa_data
from ..utilities.debug_util import parse_args_and_add_logging_switch
from ..utilities.file_utils import open_file_based_on_extension
from . import OTT_popularity_mapping

Expand Down Expand Up @@ -597,7 +598,6 @@ def output_simplified_tree(tree, taxonomy_file, outdir, version, seed, save_sql=
set_node_ages,
set_real_parent_nodes,
write_brief_newick,
write_preorder_ages,
write_preorder_to_csv,
)

Expand All @@ -606,7 +606,6 @@ def output_simplified_tree(tree, taxonomy_file, outdir, version, seed, save_sql=
Tree.prune_non_species = prune_non_species
Tree.set_node_ages = set_node_ages
Tree.set_real_parent_nodes = set_real_parent_nodes
Tree.write_preorder_ages = write_preorder_ages
Tree.remove_unifurcations_keeping_higher_taxa = remove_unifurcations_keeping_higher_taxa
Tree.write_preorder_to_csv = write_preorder_to_csv
Tree.group_genera_in_polytomies = group_genera_in_polytomies
Expand Down Expand Up @@ -664,8 +663,6 @@ def output_simplified_tree(tree, taxonomy_file, outdir, version, seed, save_sql=
tree.seed_node.write_brief_newick(condensed_newick)
with open(os.path.join(outdir, f"ordered_tree_{version}.poly"), "w+") as condensed_poly:
tree.seed_node.write_brief_newick(condensed_poly, "{}")
with open(os.path.join(outdir, f"ordered_dates_{version}.js"), "w+") as json_dates:
tree.write_preorder_ages(json_dates, format="json")

# these are the extra columns output to the leaf csv file
leaf_extras = OrderedDict()
Expand Down Expand Up @@ -720,19 +717,22 @@ def output_simplified_tree(tree, taxonomy_file, outdir, version, seed, save_sql=
from shutil import copyfile
from subprocess import call

# make CSV files that can be imported into mySQL (subs \\N for null values)
logging.info(" > saving extra file copies in mySQL format: import them using:")
for tab in ["_leaves", "_nodes"]:
fn = os.path.join(outdir, "ordered" + tab + f"_{version}" + ".csv")
sqlfile = fn + ".mySQL"
copyfile(fn, sqlfile)
call(["perl", "-pi", "-e", r"s/,(?=(,|\n))/,\\N/g", sqlfile])
logging.info(
f"sql> TRUNCATE TABLE ordered{tab}; "
f"LOAD DATA LOCAL INFILE '{sqlfile}' REPLACE INTO TABLE `ordered{tab}` "
f"FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' "
f"IGNORE 1 LINES ({open(fn).readline().rstrip()}) SET id = NULL;"
)
with open(os.path.join(outdir, f"import_{version}.sql"), "w", encoding="utf-8") as sql_f:
# make CSV files that can be imported into mySQL (subs \\N for null values)
logging.info(" > saving extra file copies in mySQL format: import them using:")
for tab in ["_leaves", "_nodes"]:
fn = os.path.join(outdir, "ordered" + tab + f"_{version}" + ".csv")
sqlfile = fn + ".mySQL"
copyfile(fn, sqlfile)
call(["perl", "-pi", "-e", r"s/,(?=(,|\n))/,\\N/g", sqlfile])
sql_f.writelines(
[
f"TRUNCATE TABLE ordered{tab};\n"
f"LOAD DATA LOCAL INFILE '{sqlfile}' REPLACE INTO TABLE `ordered{tab}` \n"
f" FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' \n"
f" IGNORE 1 LINES ({open(fn).readline().rstrip()}) SET id = NULL;\n"
]
)


def display_WD_ott_stats(OTT_ptrs):
Expand Down Expand Up @@ -940,12 +940,6 @@ def switch_otts_to_qids(taxa_data_file, tree):
def process_all(args):
random_seed_addition = 42
start = time.time()
if args.verbosity == 0:
logging.basicConfig(stream=sys.stderr, level=logging.WARNING)
elif args.verbosity == 1:
logging.basicConfig(stream=sys.stderr, level=logging.INFO, format="%(message)s")
elif args.verbosity >= 2:
logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
logging.info(f"OneZoom data generation started on {time.asctime(time.localtime(time.time()))}")
skip_popularity = (
args.popularity_file is None
Expand Down Expand Up @@ -1138,15 +1132,8 @@ def main():
type=str,
help="JSON file with persisted data about taxa, typically used for the extinct tree",
)
parser.add_argument(
"--verbosity",
"-v",
action="count",
default=0,
help="verbosity: output extra non-essential info",
)

args = parser.parse_args()
args = parse_args_and_add_logging_switch(parser)
process_all(args)


Expand Down
64 changes: 0 additions & 64 deletions oz_tree_build/taxon_mapping_and_popularity/dendropy_extras.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,70 +240,6 @@ def remove_unifurcations_keeping_higher_taxa(self):
return n_deleted


def write_preorder_ages(self, node_dates_fh, leaf_dates_fh=None, format="tsv"): # noqa A002
"""
Write the dates to one or two files. If no second file is given, only write leaves if
the format is 'json'. The main file is for nodes: any absent dates should be treated
as unknown. The leaves file should be tiny: most leaves should not have a date, and
be treated as extant (0 Ma), unless they have an extinction_date set.

Format can equal 'json', 'csv', or 'tsv'
"""
if format == "json":
start = "{"
end = ["}"]
sep = '":'
join = ['"', '"']

if format == "tsv":
sep = "\t"
end = [""]
start = ""
join = ["", ""]

if format == "csv":
sep = ","
end = [""]
start = ""
join = ["", ""]

leaf_num = 0
node_num = 0
if leaf_dates_fh or format == "json":
if leaf_dates_fh is None:
leaf_dates_fh = node_dates_fh
leaf_dates_fh.write('var tree_date = {"leaves":')
join = ['"', '"']
end = ['},"nodes":', "}}"]

leaf_dates_fh.write(start)
for leaf in self.leaf_node_iter():
# for compactness, we should probably write this in binary, as a series of
# (4-byte int, float); for the moment write it as text format, to be gzipped
leaf_num += 1
if (getattr(leaf, "age", None) is not None) and (leaf.age > 0):
leaf_dates_fh.write(join[0] + str(leaf_num) + sep + str(leaf.age))
if format == "json":
# after first value, start putting initial commas (avoids trailing comma)
join[0] = ',"'
else:
join[0] = "\n"
leaf_dates_fh.write(end[0])
leaf_dates_fh.flush()

node_dates_fh.write(start)
for node in self.preorder_internal_node_iter():
node_num += 1
if getattr(node, "age", None) is not None:
node_dates_fh.write(join[-1] + str(node_num) + sep + str(node.age))
if format == "json":
join[-1] = ',"'
else:
join[-1] = "\n"
node_dates_fh.write(end[-1])
node_dates_fh.flush()


def write_preorder_to_csv(
self,
leaf_file,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@
""" # noqa E501

import argparse
import collections
import json
import logging
import os
import re
import sys
Expand All @@ -31,6 +33,8 @@

from dendropy import Tree

logger = logging.getLogger(__name__)

unambiguous = 0
synonyms = 0
unidentified = 0
Expand Down Expand Up @@ -318,6 +322,9 @@ def lookup_OTT(name_node_dict, context):

if len(remainder):
names = [(n.label).replace("_", " ") for n in remainder]
duplicates = [item for item, count in collections.Counter(names).items() if count > 1]
if len(duplicates) > 0:
logging.error(f"File {f} has multiple nodes labelled: {duplicates}")
lookup_OTT(dict(zip(names, remainder)), context_name)

if args.savein:
Expand Down
Loading
Loading