The Dockerfile plus scripts in terra-rstudio-anvil builds rstudio#518
The Dockerfile plus scripts in terra-rstudio-anvil builds rstudio#518vjcitn wants to merge 17 commits intoDataBiosphere:masterfrom
Conversation
Explicit build of rstudio on the sudo-enabled base
There was a problem hiding this comment.
Pull request overview
Adds a new terra-rstudio-anvil image build (based on terra-base) with scripts to compile/install R, install and run RStudio Server under s6-overlay (/init entrypoint), plus a large set of optional install scripts (pandoc/quarto, tidyverse/verse, python/jupyter, geospatial, CUDA, etc.). Also rewrites terra-jupyter-bioconductor’s Dockerfile to use the same R + RStudio installation approach.
Changes:
- Introduce
terra-rstudio-anvil/Dockerfileand a suite ofscripts/to install/configure R, RStudio Server, and supporting tools under s6-overlay. - Add init/auth helpers (
init_userconf.sh,init_set_env.sh,pam-helper.sh) and various optional component installers (python/jupyter, tidyverse/verse, geospatial, CUDA, texlive, shiny, etc.). - Replace
terra-jupyter-bioconductor/Dockerfilebase and build steps to match the new R/RStudio build flow.
Reviewed changes
Copilot reviewed 38 out of 39 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| terra-rstudio-anvil/Dockerfile | Builds the new RStudio-based image (R from source + RStudio + s6 init + pandoc/quarto). |
| terra-rstudio-anvil/scripts/bin/install2.r | Adds install2.r helper (littler-based) for installing R packages. |
| terra-rstudio-anvil/scripts/config_R_cuda.sh | Configures CUDA/NVBLAS env + nvblas.conf for R/RStudio. |
| terra-rstudio-anvil/scripts/default_user.sh | Creates the default rstudio user and baseline RStudio prefs. |
| terra-rstudio-anvil/scripts/init_set_env.sh | Propagates container env into R’s Renviron.site for RStudio/Shiny. |
| terra-rstudio-anvil/scripts/init_userconf.sh | Runtime user/password/rootless configuration and auth handling. |
| terra-rstudio-anvil/scripts/install_cuda-10.1.sh | Installs CUDA 10.1 runtime/devel deps and TF1 CUDA 10.0 libs. |
| terra-rstudio-anvil/scripts/install_cuda-11.1.sh | Installs CUDA 11.1 components and related libraries. |
| terra-rstudio-anvil/scripts/install_geospatial.sh | Installs OS geospatial deps + R geospatial packages. |
| terra-rstudio-anvil/scripts/install_julia.sh | Installs Julia + R bindings (JuliaCall/JuliaConnectoR). |
| terra-rstudio-anvil/scripts/install_jupyter.sh | Installs Jupyter + IRkernel + tex support. |
| terra-rstudio-anvil/scripts/install_nvtop.sh | Builds/installs nvtop from source. |
| terra-rstudio-anvil/scripts/install_pandoc.sh | Installs or symlinks pandoc from bundled or upstream sources. |
| terra-rstudio-anvil/scripts/install_pyenv.sh | Installs pyenv and sets up PATH integration. |
| terra-rstudio-anvil/scripts/install_python.sh | Installs python + venv + reticulate integration. |
| terra-rstudio-anvil/scripts/install_quarto.sh | Installs or symlinks Quarto CLI (bundled or upstream). |
| terra-rstudio-anvil/scripts/install_R_ppa.sh | Installs R via CRAN Ubuntu repo (PPA-style). |
| terra-rstudio-anvil/scripts/install_R_source.sh | Compiles and installs R from source. |
| terra-rstudio-anvil/scripts/install_rstudio.sh | Downloads/installs RStudio Server and sets up s6 service scripts. |
| terra-rstudio-anvil/scripts/install_s6init.sh | Installs s6-overlay. |
| terra-rstudio-anvil/scripts/install_shiny_server.sh | Installs Shiny Server and configures s6 service. |
| terra-rstudio-anvil/scripts/install_tensorflow.sh | Installs R keras (and python dependency via install_python). |
| terra-rstudio-anvil/scripts/install_texlive.sh | Installs TeX Live via installer and configures PATH exposure. |
| terra-rstudio-anvil/scripts/install_tf1_cuda_10_0.sh | Installs CUDA 10.0 libs needed by TF 1.15.x. |
| terra-rstudio-anvil/scripts/install_tidyverse.sh | Installs tidyverse + common DB backends and deps. |
| terra-rstudio-anvil/scripts/install_verse.sh | Installs “verse” style deps (LaTeX tooling, blogdown/bookdown stack, etc.). |
| terra-rstudio-anvil/scripts/install_wgrib2.sh | Builds/installs wgrib2 from upstream tarball. |
| terra-rstudio-anvil/scripts/pam-helper.sh | PAM helper for password validation based on env vars. |
| terra-rstudio-anvil/scripts/rsession.sh | Launch wrapper for rsession in standalone/server mode. |
| terra-rstudio-anvil/scripts/setup_R.sh | Sets CRAN defaults, installs OpenBLAS switch, installs littler. |
| terra-rstudio-anvil/scripts/tests/examples_tf.R | Example TensorFlow/Keras MNIST training script. |
| terra-rstudio-anvil/scripts/tests/nvblas.R | NVBLAS benchmark-like test script using LD_PRELOAD. |
| terra-rstudio-anvil/scripts/experimental/batch_user_creation.sh | Experimental batch user creation via env var. |
| terra-rstudio-anvil/scripts/experimental/cuda10.2-tf.sh | Experimental CUDA/TensorRT install snippet. |
| terra-rstudio-anvil/scripts/experimental/install_dev_osgeo.sh | Experimental build-from-source PROJ/GDAL/GEOS + R geospatial packages. |
| terra-rstudio-anvil/scripts/experimental/install_geospatial_unstable.sh | Experimental ubuntugis-unstable + geospatial install. |
| terra-rstudio-anvil/scripts/experimental/install_R_binary.sh | Experimental R binary install script. |
| terra-rstudio-anvil/scripts/experimental/install_rl.sh | Experimental RL venv install script. |
| terra-jupyter-bioconductor/Dockerfile | Replaces previous base image with R-from-source + RStudio install flow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| N <- 2^14 | ||
| M <- matrix(rnorm(N*N), nrow=N, ncol=N) | ||
| M %*% M | ||
| }) |
There was a problem hiding this comment.
N <- 2^14 creates a 16,384 x 16,384 dense matrix (~2+ GiB just for the data) and then multiplies it, which is very likely to OOM or thrash on typical laptops/CI runners. Consider reducing the default size, adding a CLI/env flag to control N, and/or guarding so the benchmark only runs when explicitly requested.
| ## load /etc/environment vars first: | ||
| for line in $( cat /etc/environment ) ; do export $line > /dev/null; done |
There was a problem hiding this comment.
The generated /etc/services.d/rstudio/run uses for line in $(cat /etc/environment) which word-splits on whitespace and will corrupt values containing spaces (and can mis-handle quoting). A safer pattern is to source the file (e.g., set -a; . /etc/environment; set +a) before exec'ing rserver.
| ## load /etc/environment vars first: | |
| for line in $( cat /etc/environment ) ; do export $line > /dev/null; done | |
| ## load /etc/environment vars first in a safe way: | |
| set -a | |
| . /etc/environment | |
| set +a |
| fi | ||
| } | ||
|
|
||
| echo "PYTHON_CONFIGURE_OPTS=${PYTHON_CONFIGURE_OPTS}" >>"${R_HOME}/etc/R_environ" |
There was a problem hiding this comment.
This appends to ${R_HOME}/etc/R_environ, which is not a standard R configuration file name (typically Renviron / Renviron.site). As written, PYTHON_CONFIGURE_OPTS likely won’t be picked up by R/reticulate/pyenv. Write to the correct file (and keep naming consistent with other scripts here using Renviron.site).
| echo "PYTHON_CONFIGURE_OPTS=${PYTHON_CONFIGURE_OPTS}" >>"${R_HOME}/etc/R_environ" | |
| echo "PYTHON_CONFIGURE_OPTS=${PYTHON_CONFIGURE_OPTS}" >>"${R_HOME}/etc/Renviron.site" |
| install.packages('keras', repos='http://cran.us.r-project.org') | ||
| library(keras) |
There was a problem hiding this comment.
This uses an http:// CRAN mirror. Package installation should use HTTPS to avoid downgrade/MitM risks (and to match the rest of the image, which defaults to https://cloud.r-project.org).
Following suggestion of using 3.0.0, but noting existence of 3.3.2 at this time. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@LizBaldo Is there anything else I need to do to have this tested? I don't know what has to happen to have the container open a port in terra so that Rstudio can start when requested. |
I need to investigate what is happening on the Terra side, it might be a quick Leonardo change that is required. I set aside time to look into that on my end early next week and will keep you posted. Are you potentially open to me contributing to this PR and making changes? |
|
Absolutely welcome all comments and changes!! |
There was a problem hiding this comment.
@vjcitn your approach is sound, but there are two blocking issues that will prevent you from running this on Terra:
ENTRYPOINT not overridden
terra-base:1.0.0 defines:
ENTRYPOINT ["/etc/jupyter/bin/jupyter", "notebook"]
The PR sets:
CMD ["/init"]
In Docker, when a child image does not override ENTRYPOINT, CMD is passed as arguments to the parent's entrypoint. The container therefore runs: /etc/jupyter/bin/jupyter notebook /init and RStudio Server never start. The fix is to override the entrypoint explicitly: ENTRYPOINT ["/init"]
Auth not disabled at build time
terra-base disables Jupyter auth unconditionally at build time via jupyter_notebook_config.py (token = '', password = ''). terra-jupyter-r inherits this with no additional auth handling. Terra's Leonardo proxy expects the same pattern for RStudio — auth-none=1 baked into rserver.conf — since it manages authentication itself.
The fix is to write auth-none=1 directly into rserver.conf in install_rstudio.sh at build time, matching how terra-base handles Jupyter auth:
echo "rsession-which-r=${R_BIN}" > /etc/rstudio/rserver.conf
echo "lock-type=advisory" >> /etc/rstudio/file-locks
echo "auth-none=1" >> /etc/rstudio/rserver.conf
The DISABLE_AUTH conditional, disable_auth_rserver.conf, and pam-helper.sh can then be removed as they serve no purpose in the Terra context.
[Minor] www-proxy-prefix not configured
Terra's Leonardo proxy accesses RStudio at a subpath (e.g. /proxy/8787/), not at /. Without www-proxy-prefix set in rserver.conf, static assets and internal redirects will break. The existing Terra RStudio images (e.g. anvil-rstudio-bioconductor) handle this with a startup script that writes the prefix dynamically based on environment variables injected by Leonardo.
| RUN /rocker_scripts/install_rstudio.sh | ||
|
|
||
| EXPOSE 8787 | ||
| CMD ["/init"] |
There was a problem hiding this comment.
This needs to be ENTRYPOINT ["/init"]. In Docker, when both ENTRYPOINT and CMD are present and the child image doesn't override ENTRYPOINT, CMD becomes arguments to the parent's entrypoint. The container actually runs:
/etc/jupyter/bin/jupyter notebook /init
Which is the main reason why Rstudio did not start on Terra
|
|
||
| ENV S6_VERSION="v2.1.0.2" | ||
| ENV RSTUDIO_VERSION="2026.01.1+403" | ||
| ENV DEFAULT_USER="rstudio" |
There was a problem hiding this comment.
ENV DEFAULT_USER="rstudio" sets DEFAULT_USER, not USER.
So ENV USER=jupyter from terra-base is still in the container's environment at runtime. In init_userconf.sh, USER=${DEFAULT_USER} overrides it locally within that script, but that change doesn't
persist to the container environment unless explicitly written to /etc/environment.
Removed pam-helper.sh script from Dockerfile.
Removed copying of pam-helper.sh to rstudio-server bin.
A secret was used in an ARG or ENV ... allowing this for now
|
Leonardo's detectTool pulls the Docker image from the registry at runtime creation time and looks for JUPYTER_HOME or RSTUDIO_HOME in the image's environment variables. So you need to add ENV RSTUDIO_HOME=/home/rstudio in your dockerfiles |
… rstudio in a jetstream vm. however, the user identity is jupyter and it starts in /home/jupyter when we try to use this as a custom env under "rstudio" terra env control, the build process jumps to the jupyter type environment build process
|
@LizBaldo I have now committed changes to Dockerfile and script for installing Rstudio that work on a jetstream system by setting -p 8787:8787 and pointing a browser to localhost:8787. The dockerhub image is to match environment variable setting as closely as I can. If you would like me to add a screencast on the jump from rstudio env to jupyter env, let me know. |
and runs on a laptop provided --entrypoint /init is used