Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ ARM/
ARM64/
Debug/
Generated[!!-~]Files/
Release/
Release/**
__pycache__/
_site/
arm/
Expand All @@ -63,7 +63,7 @@ doc/
lib/
!tools/cldr/lib
out/
release/
release/**
target/
!docs/processes/release/
tmp/
Expand All @@ -81,6 +81,7 @@ pkgdataMakefile
rules.mk
.DS_Store
.flattened-pom.xml
dependency-reduced-pom.xml

!icu4c/source/samples/csdet/Makefile

Expand Down
1 change: 1 addition & 0 deletions .mvn/maven.config
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@

# Do not display transfer progress when downloading or uploading
--no-transfer-progress
-Dfile.encoding=UTF-8
55 changes: 43 additions & 12 deletions docs/processes/cldr-icu.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,14 @@ for a given version is downloading the zipped sources for the common (`core.zip`
and tools (`tools.zip`) directory subtrees from the Data column in
[CLDR Releases/Downloads](https://cldr.unicode.org/index/downloads)

Besides a standard JDK 11+, the process also requires [ant](https://ant.apache.org) and
[maven](https://maven.apache.org) plus the xml-apis.jar from the
[Apache xalan package](https://xalan.apache.org/xalan-j/downloads.html) _(Is this
latter requirement still true?)_.
Besides a standard JDK 11+, the process also requires [Ant](https://ant.apache.org),
[Maven](https://maven.apache.org), and Python (https://www.python.org).

If you do CLDR development you can configure maven as documented at
WARNING: the Ant scripts will soon be REMOVED.
PLEASE execute all the steps using the new Python workflow.
REPORT any problems you encounter, and switch back to the Ant if you don't have another choice.

If you do CLDR development you can configure Maven as documented at
[CLDR Maven setup](http://cldr.unicode.org/development/maven) (non-Eclipse version).

But for the CLDR to ICU data conversion, or for regular ICU development this is not needed.
Expand Down Expand Up @@ -106,12 +108,12 @@ ticket and a separate PR:

There are several environment variables that need to be defined.

1. Java-, ant-, and maven-related variables
1. Java-, Ant- (TO REMOVE), Maven-, and Python-related variables

* `JAVA_HOME`: Path to JDK (a directory, containing e.g. `bin/java`, `bin/javac`,
etc.); on many systems this can be set using the output of `/usr/libexec/java_home`.

* `ANT_OPTS`: You may want to set `-Xmx8192m` to give Java more memory; otherwise
* `ANT_OPTS`: (TO REMOVE) You may want to set `-Xmx8192m` to give Java more memory; otherwise
it may run out of heap.

* `MAVEN_ARGS`: You may want to set `--no-transfer-progress` to reduce the noise
Expand Down Expand Up @@ -145,9 +147,10 @@ There are several environment variables that need to be defined.

## 1 Environment variables

1a. Java, ant, and maven variables, adjust for your system
1a. Java, Ant (TO REMOVE), Maven, and Python variables, adjust for your system
```sh
export JAVA_HOME=/usr/libexec/java_home
# TO REMOVE
export ANT_OPTS="-Xmx8192m"
export MAVEN_ARGS="--no-transfer-progress"
```
Expand All @@ -172,13 +175,19 @@ export ICU4J_ROOT=$ICU_DIR/icu4j
export TOOLS_ROOT=$ICU_DIR/tools
```

1d. Directory for logs/notes (create if does not exist)
1d. Python variables
```sh
export PYTHONPATH=$ICU_DIR/tools/py
export PYTHONDONTWRITEBYTECODE=1
```

1e. Directory for logs/notes (create if does not exist)
```sh
export NOTES=...(some directory)...
mkdir -p $NOTES
```

1e. The name of the icu data directory for Java (for example `icudt74b`)
1f. The name of the icu data directory for Java (for example `icudt74b`)
```sh
export ICU_DATA_VER=icudt(version)b
```
Expand Down Expand Up @@ -248,6 +257,22 @@ mvn clean install -pl :cldr-all,:cldr-code -DskipTests -DskipITs

5a. Generate the CLDR production data.

**// NEW PROCESS, Python. Please use this!**

This process uses Python with ICU4C's `data/build.py`

* Running `python build.py --cleanprod` is necessary to clean out the production data directory
(usually `$CLDR_TMP_DIR/production`), required if any CLDR data has changed.

```sh
cd $ICU4C_DIR/source/data
python build.py --proddata
```

**// NEW PROCESS - END**

**// TO REMOVE - Don't execute if the above step works.**

This process uses ant with ICU4C's `data/build.xml`

* Running `ant cleanprod` is necessary to clean out the production data directory
Expand All @@ -261,6 +286,7 @@ ant cleanprod
ant setup
ant proddata 2>&1 | tee $NOTES/cldr-newData-proddataLog.txt
```
**// TO REMOVE - END**

> Note, for CLDR development, at this point tests are sometimes run on the
production data, see
Expand Down Expand Up @@ -299,8 +325,13 @@ java -jar target/cldr-to-icu-1.0-SNAPSHOT-jar-with-dependencies.jar --cldrDataDi

5c. Update the CLDR testData files needed by ICU4C/J tests, ensuring
they are representative of the newest CLDR data.

```sh
cd $ICU_DIR/tools/cldr
# NEW PROCESS, Python. Please use this!
python build.py --copy-cldr-testdata

# TO REMOVE. Don't execute if the above step works.
ant copy-cldr-testdata
```

Expand Down Expand Up @@ -453,7 +484,7 @@ cd $ICU4J_ROOT

## 13 Rebuild ICU4J with new data, run tests

13a. Run the tests using the maven build
13a. Run the tests using the Maven build
```sh
cd $ICU4J_ROOT
mvn clean
Expand Down Expand Up @@ -488,7 +519,7 @@ Running a specific test is the same as above:
mvn install --pl :core -DICU.exhaustive=10 -Dtest=ExhaustiveNumberTest
```

## 14 Investigate and fix maven check test failures
## 14 Investigate and fix Maven check test failures

Fix test cases and repeat from step 13, or fix CLDR data and repeat from
step 4, as appropriate, until there are no more failures in ICU4C or ICU4J.
Expand Down
155 changes: 155 additions & 0 deletions icu4c/source/data/build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
#!/usr/bin/env python3 -B
#
# Copyright (C) 2026 and later: Unicode, Inc. and others.
# License & terms of use: http://www.unicode.org/copyright.html
"""Generates data in cldr-staging/production from the cldr main repo"""

import argparse
import os
import sys
import datetime
import subprocess

try:
from libs import icudirs
from libs import icufs
from libs import iculog
from libs import icuproc
except (ModuleNotFoundError, ImportError) as e:
print("Make sure you define PYTHONPATH pointing to the ICU modules:")
print(" export PYTHONPATH=<icu_root>/tools/py")
print("On Windows:")
print(" set PYTHONPATH=<icu_root>\\tools\\py")
sys.exit(1)


basedir = "."
cldr_tmp_dir = None
cldr_prod_dir = None
cldrtools_jar = None
cldr_tmp_dir = None
notes_dir: str = "./notes"


def _init():
"""Initialization. Check folders existence, cldr-code.jar exists, etc."""
iculog.subtitle("init()")
iculog.info(str(datetime.datetime.now()))

cldr_dir = icudirs.cldr_dir()

cldrtools_dir = os.path.join(cldr_dir, "tools")
iculog.info(f"cldr_dir:{cldr_dir}")
iculog.info(f"cldrtools_dir:{cldrtools_dir}")
if not os.path.isdir(cldrtools_dir):
iculog.failure(
"Please make sure that the CLDR tools directory"
" is checked out into CLDR_DIR"
)

dir_to_check = f"{cldrtools_dir}/cldr-code/target/classes"
if not os.path.isdir(dir_to_check):
iculog.failure(f"Can't find {dir_to_check}. Please build cldr-code.jar.")

global cldrtools_jar
cldrtools_jar = f"{cldrtools_dir}/cldr-code/target/cldr-code.jar"
if not os.path.isfile(cldrtools_jar):
iculog.failure(
f"CLDR classes not found in {cldrtools_dir}/cldr-code/target/classes."
" Please build cldr-code.jar."
)

global cldr_tmp_dir
cldr_tmp_dir = icudirs.cldr_prod_dir()
global cldr_prod_dir
cldr_prod_dir = f"{cldr_tmp_dir}/production/"

global notes_dir
notes_dir = os.environ.get("NOTES", "./notes")

subprocess.run("mvn -version", encoding="utf-8", shell=True, check=True)
iculog.info(f"cldr tools dir: {cldrtools_dir}")
iculog.info(f"cldr tools jar: {cldrtools_jar}")
iculog.info(f"CLDR_TMP_DIR: {cldr_tmp_dir} ")
iculog.info(f"cldr.prod_dir (production data): {cldr_prod_dir}")
iculog.info(f"notes_dir: {notes_dir}")


def cleanprod():
"""Remove the data in cldr-staging/production"""
iculog.title("cleanprod()")
icufs.rmdir(f"{cldr_prod_dir}/common")
icufs.rmdir(f"{cldr_prod_dir}/keyboards")


def restoreprod():
"""Restore the git version of data in cldr-staging/production"""
iculog.title("restoreprod()")
if not cldr_prod_dir:
iculog.failure("cldr_prod_dir not configured")
return
old_dir = icufs.pushd(cldr_prod_dir)
icufs.rmdir("common")
icuproc.run_with_logging(
"git checkout -- common",
logfile=os.path.join(notes_dir, "cldr-newData-restorecommonLog.txt"),
)
icufs.rmdir("keyboards")
icuproc.run_with_logging(
"git checkout -- keyboards",
logfile=os.path.join(notes_dir, "cldr-newData-restorekeyboardsLog.txt"),
)
icufs.popd(old_dir)


def proddata():
"""Generates data in cldr-staging/production"""
cleanprod()
iculog.title("proddata()")
iculog.info(f"Rebuilding {cldr_prod_dir} - takes a while!")
# setup prod data
icuproc.run_with_logging(
"java"
f" -cp {cldrtools_jar}"
" org.unicode.cldr.tool.GenerateProductionData"
" -v",
logfile=os.path.join(notes_dir, "cldr-newData-proddataLog.txt"),
)


def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"-c", "--cleanprod", help="remove all build targets", action="store_true"
)
parser.add_argument(
"-p",
"--proddata",
help="Rebuilds files in cldr-staging/production",
action="store_true",
)
parser.add_argument(
"-r",
"--restore",
help="Restore (from git) the filed removed by cleanprod",
action="store_true",
)
cmd = parser.parse_args()

if cmd.cleanprod:
_init()
cleanprod()
elif cmd.proddata:
_init()
proddata()
elif cmd.restore:
_init()
restoreprod()
else:
parser.print_help()

return 0


if __name__ == "__main__":
sys.exit(main())
11 changes: 11 additions & 0 deletions tools/cldr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,21 @@

## CLDR test data

The Python [build.py](build.py) file takes care of copying some CLDR
test data directories to both the ICU4C and ICU4J source trees. To add
more directories to the list, modify the `cldr_test_data` fileset.

ANT-TO-REMOVE-START

WARNING: Ant support WILL BE REMOVED.
Only use this (and report) if the step above fails.

The ant [build.xml](build.xml) file takes care of copying some CLDR
test data directories to both the ICU4C and ICU4J source trees. To add
more directories to the list, modify the `cldrTestData` fileset.

ANT-TO-REMOVE-END

## cldr-to-icu

The cldr-to-icu directory contains tools to convert from CLDR's XML
Expand Down
Loading