Skip to content

The Dockerfile plus scripts in terra-rstudio-anvil builds rstudio#518

Open
vjcitn wants to merge 17 commits intoDataBiosphere:masterfrom
vjcitn:sudobase
Open

The Dockerfile plus scripts in terra-rstudio-anvil builds rstudio#518
vjcitn wants to merge 17 commits intoDataBiosphere:masterfrom
vjcitn:sudobase

Conversation

@vjcitn
Copy link
Copy Markdown

@vjcitn vjcitn commented Mar 25, 2026

and runs on a laptop provided --entrypoint /init is used

Copilot AI review requested due to automatic review settings March 25, 2026 23:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new terra-rstudio-anvil image build (based on terra-base) with scripts to compile/install R, install and run RStudio Server under s6-overlay (/init entrypoint), plus a large set of optional install scripts (pandoc/quarto, tidyverse/verse, python/jupyter, geospatial, CUDA, etc.). Also rewrites terra-jupyter-bioconductor’s Dockerfile to use the same R + RStudio installation approach.

Changes:

  • Introduce terra-rstudio-anvil/Dockerfile and a suite of scripts/ to install/configure R, RStudio Server, and supporting tools under s6-overlay.
  • Add init/auth helpers (init_userconf.sh, init_set_env.sh, pam-helper.sh) and various optional component installers (python/jupyter, tidyverse/verse, geospatial, CUDA, texlive, shiny, etc.).
  • Replace terra-jupyter-bioconductor/Dockerfile base and build steps to match the new R/RStudio build flow.

Reviewed changes

Copilot reviewed 38 out of 39 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
terra-rstudio-anvil/Dockerfile Builds the new RStudio-based image (R from source + RStudio + s6 init + pandoc/quarto).
terra-rstudio-anvil/scripts/bin/install2.r Adds install2.r helper (littler-based) for installing R packages.
terra-rstudio-anvil/scripts/config_R_cuda.sh Configures CUDA/NVBLAS env + nvblas.conf for R/RStudio.
terra-rstudio-anvil/scripts/default_user.sh Creates the default rstudio user and baseline RStudio prefs.
terra-rstudio-anvil/scripts/init_set_env.sh Propagates container env into R’s Renviron.site for RStudio/Shiny.
terra-rstudio-anvil/scripts/init_userconf.sh Runtime user/password/rootless configuration and auth handling.
terra-rstudio-anvil/scripts/install_cuda-10.1.sh Installs CUDA 10.1 runtime/devel deps and TF1 CUDA 10.0 libs.
terra-rstudio-anvil/scripts/install_cuda-11.1.sh Installs CUDA 11.1 components and related libraries.
terra-rstudio-anvil/scripts/install_geospatial.sh Installs OS geospatial deps + R geospatial packages.
terra-rstudio-anvil/scripts/install_julia.sh Installs Julia + R bindings (JuliaCall/JuliaConnectoR).
terra-rstudio-anvil/scripts/install_jupyter.sh Installs Jupyter + IRkernel + tex support.
terra-rstudio-anvil/scripts/install_nvtop.sh Builds/installs nvtop from source.
terra-rstudio-anvil/scripts/install_pandoc.sh Installs or symlinks pandoc from bundled or upstream sources.
terra-rstudio-anvil/scripts/install_pyenv.sh Installs pyenv and sets up PATH integration.
terra-rstudio-anvil/scripts/install_python.sh Installs python + venv + reticulate integration.
terra-rstudio-anvil/scripts/install_quarto.sh Installs or symlinks Quarto CLI (bundled or upstream).
terra-rstudio-anvil/scripts/install_R_ppa.sh Installs R via CRAN Ubuntu repo (PPA-style).
terra-rstudio-anvil/scripts/install_R_source.sh Compiles and installs R from source.
terra-rstudio-anvil/scripts/install_rstudio.sh Downloads/installs RStudio Server and sets up s6 service scripts.
terra-rstudio-anvil/scripts/install_s6init.sh Installs s6-overlay.
terra-rstudio-anvil/scripts/install_shiny_server.sh Installs Shiny Server and configures s6 service.
terra-rstudio-anvil/scripts/install_tensorflow.sh Installs R keras (and python dependency via install_python).
terra-rstudio-anvil/scripts/install_texlive.sh Installs TeX Live via installer and configures PATH exposure.
terra-rstudio-anvil/scripts/install_tf1_cuda_10_0.sh Installs CUDA 10.0 libs needed by TF 1.15.x.
terra-rstudio-anvil/scripts/install_tidyverse.sh Installs tidyverse + common DB backends and deps.
terra-rstudio-anvil/scripts/install_verse.sh Installs “verse” style deps (LaTeX tooling, blogdown/bookdown stack, etc.).
terra-rstudio-anvil/scripts/install_wgrib2.sh Builds/installs wgrib2 from upstream tarball.
terra-rstudio-anvil/scripts/pam-helper.sh PAM helper for password validation based on env vars.
terra-rstudio-anvil/scripts/rsession.sh Launch wrapper for rsession in standalone/server mode.
terra-rstudio-anvil/scripts/setup_R.sh Sets CRAN defaults, installs OpenBLAS switch, installs littler.
terra-rstudio-anvil/scripts/tests/examples_tf.R Example TensorFlow/Keras MNIST training script.
terra-rstudio-anvil/scripts/tests/nvblas.R NVBLAS benchmark-like test script using LD_PRELOAD.
terra-rstudio-anvil/scripts/experimental/batch_user_creation.sh Experimental batch user creation via env var.
terra-rstudio-anvil/scripts/experimental/cuda10.2-tf.sh Experimental CUDA/TensorRT install snippet.
terra-rstudio-anvil/scripts/experimental/install_dev_osgeo.sh Experimental build-from-source PROJ/GDAL/GEOS + R geospatial packages.
terra-rstudio-anvil/scripts/experimental/install_geospatial_unstable.sh Experimental ubuntugis-unstable + geospatial install.
terra-rstudio-anvil/scripts/experimental/install_R_binary.sh Experimental R binary install script.
terra-rstudio-anvil/scripts/experimental/install_rl.sh Experimental RL venv install script.
terra-jupyter-bioconductor/Dockerfile Replaces previous base image with R-from-source + RStudio install flow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread terra-rstudio-anvil/scripts/install_nvtop.sh Outdated
Comment on lines +6 to +9
N <- 2^14
M <- matrix(rnorm(N*N), nrow=N, ncol=N)
M %*% M
})
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N <- 2^14 creates a 16,384 x 16,384 dense matrix (~2+ GiB just for the data) and then multiplies it, which is very likely to OOM or thrash on typical laptops/CI runners. Consider reducing the default size, adding a CLI/env flag to control N, and/or guarding so the benchmark only runs when explicitly requested.

Copilot uses AI. Check for mistakes.
Comment thread terra-rstudio-anvil/scripts/install_cuda-11.1.sh Outdated
Comment thread terra-rstudio-anvil/scripts/install_geospatial.sh Outdated
Comment thread terra-rstudio-anvil/scripts/init_set_env.sh
Comment on lines +99 to +100
## load /etc/environment vars first:
for line in $( cat /etc/environment ) ; do export $line > /dev/null; done
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated /etc/services.d/rstudio/run uses for line in $(cat /etc/environment) which word-splits on whitespace and will corrupt values containing spaces (and can mis-handle quoting). A safer pattern is to source the file (e.g., set -a; . /etc/environment; set +a) before exec'ing rserver.

Suggested change
## load /etc/environment vars first:
for line in $( cat /etc/environment ) ; do export $line > /dev/null; done
## load /etc/environment vars first in a safe way:
set -a
. /etc/environment
set +a

Copilot uses AI. Check for mistakes.
Comment thread terra-jupyter-bioconductor/Dockerfile
Comment thread terra-rstudio-anvil/scripts/install_rstudio.sh
fi
}

echo "PYTHON_CONFIGURE_OPTS=${PYTHON_CONFIGURE_OPTS}" >>"${R_HOME}/etc/R_environ"
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appends to ${R_HOME}/etc/R_environ, which is not a standard R configuration file name (typically Renviron / Renviron.site). As written, PYTHON_CONFIGURE_OPTS likely won’t be picked up by R/reticulate/pyenv. Write to the correct file (and keep naming consistent with other scripts here using Renviron.site).

Suggested change
echo "PYTHON_CONFIGURE_OPTS=${PYTHON_CONFIGURE_OPTS}" >>"${R_HOME}/etc/R_environ"
echo "PYTHON_CONFIGURE_OPTS=${PYTHON_CONFIGURE_OPTS}" >>"${R_HOME}/etc/Renviron.site"

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +4
install.packages('keras', repos='http://cran.us.r-project.org')
library(keras)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses an http:// CRAN mirror. Package installation should use HTTPS to avoid downgrade/MitM risks (and to match the rest of the image, which defaults to https://cloud.r-project.org).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

vjcitn and others added 5 commits March 26, 2026 05:30
Following suggestion of using 3.0.0, but noting existence of 3.3.2 at this time.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@vjcitn
Copy link
Copy Markdown
Author

vjcitn commented Apr 2, 2026

@LizBaldo Is there anything else I need to do to have this tested? I don't know what has to happen to have the container open a port in terra so that Rstudio can start when requested.

@LizBaldo
Copy link
Copy Markdown
Collaborator

LizBaldo commented Apr 2, 2026

@LizBaldo Is there anything else I need to do to have this tested? I don't know what has to happen to have the container open a port in terra so that Rstudio can start when requested.

I need to investigate what is happening on the Terra side, it might be a quick Leonardo change that is required. I set aside time to look into that on my end early next week and will keep you posted. Are you potentially open to me contributing to this PR and making changes?

@vjcitn
Copy link
Copy Markdown
Author

vjcitn commented Apr 2, 2026

Absolutely welcome all comments and changes!!

Copy link
Copy Markdown
Collaborator

@LizBaldo LizBaldo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vjcitn your approach is sound, but there are two blocking issues that will prevent you from running this on Terra:

ENTRYPOINT not overridden

terra-base:1.0.0 defines:
ENTRYPOINT ["/etc/jupyter/bin/jupyter", "notebook"]

The PR sets:
CMD ["/init"]

In Docker, when a child image does not override ENTRYPOINT, CMD is passed as arguments to the parent's entrypoint. The container therefore runs: /etc/jupyter/bin/jupyter notebook /init and RStudio Server never start. The fix is to override the entrypoint explicitly: ENTRYPOINT ["/init"]

Auth not disabled at build time

terra-base disables Jupyter auth unconditionally at build time via jupyter_notebook_config.py (token = '', password = ''). terra-jupyter-r inherits this with no additional auth handling. Terra's Leonardo proxy expects the same pattern for RStudio — auth-none=1 baked into rserver.conf — since it manages authentication itself.

The fix is to write auth-none=1 directly into rserver.conf in install_rstudio.sh at build time, matching how terra-base handles Jupyter auth:

  echo "rsession-which-r=${R_BIN}" > /etc/rstudio/rserver.conf
  echo "lock-type=advisory" >> /etc/rstudio/file-locks
  echo "auth-none=1" >> /etc/rstudio/rserver.conf

The DISABLE_AUTH conditional, disable_auth_rserver.conf, and pam-helper.sh can then be removed as they serve no purpose in the Terra context.

[Minor] www-proxy-prefix not configured

Terra's Leonardo proxy accesses RStudio at a subpath (e.g. /proxy/8787/), not at /. Without www-proxy-prefix set in rserver.conf, static assets and internal redirects will break. The existing Terra RStudio images (e.g. anvil-rstudio-bioconductor) handle this with a startup script that writes the prefix dynamically based on environment variables injected by Leonardo.

RUN /rocker_scripts/install_rstudio.sh

EXPOSE 8787
CMD ["/init"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be ENTRYPOINT ["/init"]. In Docker, when both ENTRYPOINT and CMD are present and the child image doesn't override ENTRYPOINT, CMD becomes arguments to the parent's entrypoint. The container actually runs:

/etc/jupyter/bin/jupyter notebook /init

Which is the main reason why Rstudio did not start on Terra


ENV S6_VERSION="v2.1.0.2"
ENV RSTUDIO_VERSION="2026.01.1+403"
ENV DEFAULT_USER="rstudio"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENV DEFAULT_USER="rstudio" sets DEFAULT_USER, not USER.

So ENV USER=jupyter from terra-base is still in the container's environment at runtime. In init_userconf.sh, USER=${DEFAULT_USER} overrides it locally within that script, but that change doesn't
persist to the container environment unless explicitly written to /etc/environment.

vjcitn added 4 commits April 6, 2026 14:53
Removed pam-helper.sh script from Dockerfile.
Removed copying of pam-helper.sh to rstudio-server bin.
A secret was used in an ARG or ENV ... allowing this for now
@LizBaldo
Copy link
Copy Markdown
Collaborator

LizBaldo commented Apr 7, 2026

Leonardo's detectTool pulls the Docker image from the registry at runtime creation time and looks for JUPYTER_HOME or RSTUDIO_HOME in the image's environment variables.

So you need to add ENV RSTUDIO_HOME=/home/rstudio in your dockerfiles

vjcitn added 3 commits April 7, 2026 13:24
… rstudio

in a jetstream vm.  however, the user identity is jupyter and it starts in /home/jupyter
when we try to use this as a custom env under "rstudio"  terra env control, the build
process jumps to the jupyter type environment build process
@vjcitn
Copy link
Copy Markdown
Author

vjcitn commented May 6, 2026

@LizBaldo I have now committed changes to Dockerfile and script for installing Rstudio that work on a jetstream system by setting -p 8787:8787 and pointing a browser to localhost:8787. The dockerhub image is vjcitn/rstuanv:0.0.6. When I try to use this as a custom environment in the rstudio control, the build process jumps over to jupyter env building. I have used

docker inspect us.gcr.io/broad-dsp-gcr-public/anvil-rstudio-bioconductor:3.21.0 | \
  python3 -m json.tool | grep -A 300 '"Env"'

to match environment variable setting as closely as I can. If you would like me to add a screencast on the jump from rstudio env to jupyter env, let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants