diff --git a/README.md b/README.md index 7528905d..156737fa 100644 --- a/README.md +++ b/README.md @@ -14,8 +14,9 @@ Solid Queue can be used with SQL databases such as MySQL, PostgreSQL, or SQLite, - [Dashboard UI Setup](#dashboard-ui-setup) - [Incremental adoption](#incremental-adoption) - [High performance requirements](#high-performance-requirements) +- [Workers, dispatchers, and scheduler](#workers-dispatchers-and-scheduler) + - [Fork vs. async mode](#fork-vs-async-mode) - [Configuration](#configuration) - - [Workers, dispatchers, and scheduler](#workers-dispatchers-and-scheduler) - [Queue order and priorities](#queue-order-and-priorities) - [Queues specification and performance](#queues-specification-and-performance) - [Threads, processes, and signals](#threads-processes-and-signals) @@ -179,9 +180,7 @@ end Solid Queue was designed for the highest throughput when used with MySQL 8+, MariaDB 10.6+, or PostgreSQL 9.5+, as they support `FOR UPDATE SKIP LOCKED`. You can use it with older versions, but in that case, you might run into lock waits if you run multiple workers for the same queue. You can also use it with SQLite on smaller applications. -## Configuration - -### Workers, dispatchers, and scheduler +## Workers, dispatchers, and scheduler We have several types of actors in Solid Queue: @@ -190,7 +189,17 @@ We have several types of actors in Solid Queue: - The _scheduler_ manages [recurring tasks](#recurring-tasks), enqueuing jobs for them when they're due. - The _supervisor_ runs workers and dispatchers according to the configuration, controls their heartbeats, and stops and starts them when needed. -Solid Queue's supervisor will fork a separate process for each supervised worker/dispatcher/scheduler. +### Fork vs. async mode + +By default, Solid Queue runs in `fork` mode. This means the supervisor will fork a separate process for each supervised worker/dispatcher/scheduler. This provides the best isolation and performance, but can have additional memory usage. Alternatively, you can run all workers, dispatchers and schedulers in the same process as the supervisor, in different threads, with an `async` mode. You can choose this mode by running `bin/jobs` as: + +``` +bin/jobs --mode async +``` + +Or you can also set the environment variable `SOLID_QUEUE_SUPERVISOR_MODE` to `async`. If you use the `async` mode, the `processes` option in the configuration described below will be ignored. + +## Configuration By default, Solid Queue will try to find your configuration under `config/queue.yml`, but you can set a different path using the environment variable `SOLID_QUEUE_CONFIG` or by using the `-c/--config_file` option with `bin/jobs`, like this: @@ -254,7 +263,7 @@ Here's an overview of the different options: - `threads`: this is the max size of the thread pool that each worker will have to run jobs. Each worker will fetch this number of jobs from their queue(s), at most and will post them to the thread pool to be run. By default, this is `3`. Only workers have this setting. It is recommended to set this value less than or equal to the queue database's connection pool size minus 2, as each worker thread uses one connection, and two additional connections are reserved for polling and heartbeat. -- `processes`: this is the number of worker processes that will be forked by the supervisor with the settings given. By default, this is `1`, just a single process. This setting is useful if you want to dedicate more than one CPU core to a queue or queues with the same configuration. Only workers have this setting. +- `processes`: this is the number of worker processes that will be forked by the supervisor with the settings given. By default, this is `1`, just a single process. This setting is useful if you want to dedicate more than one CPU core to a queue or queues with the same configuration. Only workers have this setting. **Note**: this option will be ignored if [running in `async` mode](#fork-vs-async-mode). - `concurrency_maintenance`: whether the dispatcher will perform the concurrency maintenance work. This is `true` by default, and it's useful if you don't use any [concurrency controls](#concurrency-controls) and want to disable it or if you run multiple dispatchers and want some of them to just dispatch jobs without doing anything else. @@ -334,7 +343,7 @@ queues: back* Workers in Solid Queue use a thread pool to run work in multiple threads, configurable via the `threads` parameter above. Besides this, parallelism can be achieved via multiple processes on one machine (configurable via different workers or the `processes` parameter above) or by horizontal scaling. -The supervisor is in charge of managing these processes, and it responds to the following signals: +The supervisor is in charge of managing these processes, and it responds to the following signals when running in its own process via `bin/jobs` or with [the Puma plugin](#puma-plugin) with the default `fork` mode: - `TERM`, `INT`: starts graceful termination. The supervisor will send a `TERM` signal to its supervised processes, and it'll wait up to `SolidQueue.shutdown_timeout` time until they're done. If any supervised processes are still around by then, it'll send a `QUIT` signal to them to indicate they must exit. - `QUIT`: starts immediate termination. The supervisor will send a `QUIT` signal to its supervised processes, causing them to exit immediately. @@ -603,6 +612,20 @@ that you set in production only. This is what Rails 8's default Puma config look **Note**: phased restarts are not supported currently because the plugin requires [app preloading](https://github.com/puma/puma?tab=readme-ov-file#cluster-mode) to work. +### Running as a fork or asynchronously + +By default, the Puma plugin will fork additional processes for each worker and dispatcher so that they run in different processes. This provides the best isolation and performance, but can have additional memory usage. + +Alternatively, workers and dispatchers can be run within the same Puma process(s). To do so just configure the plugin as: + +```ruby +plugin :solid_queue +solid_queue_mode :async +``` + +Note that in this case, the `processes` configuration option will be ignored. See also [Fork vs. async mode](#fork-vs-async-mode). + + ## Jobs and transactional integrity :warning: Having your jobs in the same ACID-compliant database as your application data enables a powerful yet sharp tool: taking advantage of transactional integrity to ensure some action in your app is not committed unless your job is also committed and vice versa, and ensuring that your job won't be enqueued until the transaction within which you're enqueuing it is committed. This can be very powerful and useful, but it can also backfire if you base some of your logic on this behaviour, and in the future, you move to another active job backend, or if you simply move Solid Queue to its own database, and suddenly the behaviour changes under you. Because this can be quite tricky and many people shouldn't need to worry about it, by default Solid Queue is configured in a different database as the main app. diff --git a/lib/puma/plugin/solid_queue.rb b/lib/puma/plugin/solid_queue.rb index 434b8f65..38806277 100644 --- a/lib/puma/plugin/solid_queue.rb +++ b/lib/puma/plugin/solid_queue.rb @@ -1,5 +1,13 @@ require "puma/plugin" +module Puma + class DSL + def solid_queue_mode(mode = :fork) + @options[:solid_queue_mode] = mode.to_sym + end + end +end + Puma::Plugin.create do attr_reader :puma_pid, :solid_queue_pid, :log_writer, :solid_queue_supervisor @@ -7,35 +15,73 @@ def start(launcher) @log_writer = launcher.log_writer @puma_pid = $$ - in_background do - monitor_solid_queue + if launcher.options[:solid_queue_mode] == :async + start_async(launcher) + else + start_forked(launcher) end + end - if Gem::Version.new(Puma::Const::VERSION) < Gem::Version.new("7") - launcher.events.on_booted do - @solid_queue_pid = fork do - Thread.new { monitor_puma } - SolidQueue::Supervisor.start + private + def start_forked(launcher) + in_background do + monitor_solid_queue + end + + if Gem::Version.new(Puma::Const::VERSION) < Gem::Version.new("7") + launcher.events.on_booted do + @solid_queue_pid = fork do + Thread.new { monitor_puma } + SolidQueue::Supervisor.start(mode: :fork) + end end + + launcher.events.on_stopped { stop_solid_queue_fork } + launcher.events.on_restart { stop_solid_queue_fork } + else + launcher.events.after_booted do + @solid_queue_pid = fork do + Thread.new { monitor_puma } + start_solid_queue(mode: :fork) + end + end + + launcher.events.after_stopped { stop_solid_queue_fork } + launcher.events.before_restart { stop_solid_queue_fork } end + end - launcher.events.on_stopped { stop_solid_queue } - launcher.events.on_restart { stop_solid_queue } - else - launcher.events.after_booted do - @solid_queue_pid = fork do - Thread.new { monitor_puma } - SolidQueue::Supervisor.start + def start_async(launcher) + if Gem::Version.new(Puma::Const::VERSION) < Gem::Version.new("7") + launcher.events.on_booted do + start_solid_queue(mode: :async, standalone: false) + end + + launcher.events.on_stopped { solid_queue_supervisor&.stop } + + launcher.events.on_restart do + solid_queue_supervisor&.stop + start_solid_queue(mode: :async, standalone: false) + end + else + launcher.events.after_booted do + start_solid_queue(mode: :async, standalone: false) + end + + launcher.events.after_stopped { solid_queue_supervisor&.stop } + + launcher.events.before_restart do + solid_queue_supervisor&.stop + start_solid_queue(mode: :async, standalone: false) end end + end - launcher.events.after_stopped { stop_solid_queue } - launcher.events.before_restart { stop_solid_queue } + def start_solid_queue(**options) + @solid_queue_supervisor = SolidQueue::Supervisor.start(**options) end - end - private - def stop_solid_queue + def stop_solid_queue_fork Process.waitpid(solid_queue_pid, Process::WNOHANG) log "Stopping Solid Queue..." Process.kill(:INT, solid_queue_pid) if solid_queue_pid @@ -48,7 +94,7 @@ def monitor_puma end def monitor_solid_queue - monitor(:solid_queue_dead?, "Detected Solid Queue has gone away, stopping Puma...") + monitor(:solid_queue_fork_dead?, "Detected Solid Queue has gone away, stopping Puma...") end def monitor(process_dead, message) @@ -62,7 +108,7 @@ def monitor(process_dead, message) end end - def solid_queue_dead? + def solid_queue_fork_dead? if solid_queue_started? Process.waitpid(solid_queue_pid, Process::WNOHANG) end diff --git a/lib/solid_queue/app_executor.rb b/lib/solid_queue/app_executor.rb index 0580213f..f315f61a 100644 --- a/lib/solid_queue/app_executor.rb +++ b/lib/solid_queue/app_executor.rb @@ -17,5 +17,15 @@ def handle_thread_error(error) SolidQueue.on_thread_error.call(error) end end + + def create_thread(&block) + Thread.new do + Thread.current.name = name + block.call + rescue Exception => exception + handle_thread_error(exception) + raise + end + end end end diff --git a/lib/solid_queue/async_supervisor.rb b/lib/solid_queue/async_supervisor.rb new file mode 100644 index 00000000..0484c9ec --- /dev/null +++ b/lib/solid_queue/async_supervisor.rb @@ -0,0 +1,50 @@ +# frozen_string_literal: true + +module SolidQueue + class AsyncSupervisor < Supervisor + after_shutdown :terminate_gracefully, unless: :standalone? + + def stop + super + @thread&.join + end + + private + def supervise + if standalone? then super + else + @thread = create_thread { super } + end + end + + def check_and_replace_terminated_processes + terminated_threads = process_instances.select { |thread_id, instance| !instance.alive? } + terminated_threads.each { |thread_id, instance| replace_thread(thread_id, instance) } + end + + def replace_thread(thread_id, instance) + SolidQueue.instrument(:replace_thread, supervisor_pid: ::Process.pid) do |payload| + payload[:thread] = instance + + error = Processes::ThreadTerminatedError.new(terminated_instance.name) + release_claimed_jobs_by(terminated_instance, with_error: error) + + start_process(configured_processes.delete(thread_id)) + end + end + + def perform_graceful_termination + process_instances.values.each(&:stop) + + Timer.wait_until(SolidQueue.shutdown_timeout, -> { all_processes_terminated? }) + end + + def perform_immediate_termination + exit! + end + + def all_processes_terminated? + process_instances.values.none?(&:alive?) + end + end +end diff --git a/lib/solid_queue/cli.rb b/lib/solid_queue/cli.rb index 7bfe555b..a2b5ba5e 100644 --- a/lib/solid_queue/cli.rb +++ b/lib/solid_queue/cli.rb @@ -8,6 +8,10 @@ class Cli < Thor desc: "Path to config file (default: #{Configuration::DEFAULT_CONFIG_FILE_PATH}).", banner: "SOLID_QUEUE_CONFIG" + class_option :mode, type: :string, default: "fork", enum: %w[ fork async ], + desc: "Whether to fork processes for workers and dispatchers (fork) or to run these in the same process as the supervisor (async) (default: fork).", + banner: "SOLID_QUEUE_SUPERVISOR_MODE" + class_option :recurring_schedule_file, type: :string, desc: "Path to recurring schedule definition (default: #{Configuration::DEFAULT_RECURRING_SCHEDULE_FILE_PATH}).", banner: "SOLID_QUEUE_RECURRING_SCHEDULE" diff --git a/lib/solid_queue/configuration.rb b/lib/solid_queue/configuration.rb index b0083a17..94169ca7 100644 --- a/lib/solid_queue/configuration.rb +++ b/lib/solid_queue/configuration.rb @@ -56,6 +56,14 @@ def error_messages end end + def mode + @options[:mode].to_s.inquiry + end + + def standalone? + mode.fork? || @options[:standalone] + end + private attr_reader :options @@ -84,6 +92,8 @@ def ensure_correctly_sized_thread_pool def default_options { + mode: ENV["SOLID_QUEUE_SUPERVISOR_MODE"] || :fork, + standalone: true, config_file: Rails.root.join(ENV["SOLID_QUEUE_CONFIG"] || DEFAULT_CONFIG_FILE_PATH), recurring_schedule_file: Rails.root.join(ENV["SOLID_QUEUE_RECURRING_SCHEDULE"] || DEFAULT_RECURRING_SCHEDULE_FILE_PATH), only_work: false, @@ -110,7 +120,12 @@ def skip_recurring_tasks? def workers workers_options.flat_map do |worker_options| - processes = worker_options.fetch(:processes, WORKER_DEFAULTS[:processes]) + processes = if mode.fork? + worker_options.fetch(:processes, WORKER_DEFAULTS[:processes]) + else + 1 + end + processes.times.map { Process.new(:worker, worker_options.with_defaults(WORKER_DEFAULTS)) } end end diff --git a/lib/solid_queue/dispatcher.rb b/lib/solid_queue/dispatcher.rb index 1583e1dd..461ce803 100644 --- a/lib/solid_queue/dispatcher.rb +++ b/lib/solid_queue/dispatcher.rb @@ -3,6 +3,7 @@ module SolidQueue class Dispatcher < Processes::Poller include LifecycleHooks + attr_reader :batch_size after_boot :run_start_hooks diff --git a/lib/solid_queue/fork_supervisor.rb b/lib/solid_queue/fork_supervisor.rb new file mode 100644 index 00000000..c3c87dbe --- /dev/null +++ b/lib/solid_queue/fork_supervisor.rb @@ -0,0 +1,68 @@ +# frozen_string_literal: true + +module SolidQueue + class ForkSupervisor < Supervisor + private + + def perform_graceful_termination + term_forks + + Timer.wait_until(SolidQueue.shutdown_timeout, -> { all_processes_terminated? }) do + reap_terminated_forks + end + end + + def perform_immediate_termination + quit_forks + end + + def term_forks + signal_processes(process_instances.keys, :TERM) + end + + def quit_forks + signal_processes(process_instances.keys, :QUIT) + end + + def check_and_replace_terminated_processes + loop do + pid, status = ::Process.waitpid2(-1, ::Process::WNOHANG) + break unless pid + + replace_fork(pid, status) + end + end + + def reap_terminated_forks + loop do + pid, status = ::Process.waitpid2(-1, ::Process::WNOHANG) + break unless pid + + if (terminated_fork = process_instances.delete(pid)) && !status.exited? || status.exitstatus > 0 + error = Processes::ProcessExitError.new(status) + release_claimed_jobs_by(terminated_fork, with_error: error) + end + + configured_processes.delete(pid) + end + rescue SystemCallError + # All children already reaped + end + + def replace_fork(pid, status) + SolidQueue.instrument(:replace_fork, supervisor_pid: ::Process.pid, pid: pid, status: status) do |payload| + if terminated_fork = process_instances.delete(pid) + payload[:fork] = terminated_fork + error = Processes::ProcessExitError.new(status) + release_claimed_jobs_by(terminated_fork, with_error: error) + + start_process(configured_processes.delete(pid)) + end + end + end + + def all_processes_terminated? + process_instances.empty? + end + end +end diff --git a/lib/solid_queue/processes/runnable.rb b/lib/solid_queue/processes/runnable.rb index 33b441f6..c6e002e4 100644 --- a/lib/solid_queue/processes/runnable.rb +++ b/lib/solid_queue/processes/runnable.rb @@ -7,20 +7,26 @@ module Runnable attr_writer :mode def start - boot - - if running_async? - @thread = create_thread { run } - else + run_in_mode do + boot run end end def stop super - wake_up - @thread&.join + + # When not supervised, block until the thread terminates for backward + # compatibility with code that expects stop to be synchronous. + # When supervised, the supervisor controls the shutdown timeout. + unless supervised? + @thread&.join + end + end + + def alive? + !running_async? || @thread&.alive? end private @@ -30,6 +36,18 @@ def mode (@mode || DEFAULT_MODE).to_s.inquiry end + def run_in_mode(&block) + case + when running_as_fork? + fork(&block) + when running_async? + @thread = create_thread(&block) + @thread.object_id + else + block.call + end + end + def boot SolidQueue.instrument(:start_process, process: self) do run_callbacks(:boot) do @@ -74,16 +92,5 @@ def running_async? def running_as_fork? mode.fork? end - - - def create_thread(&block) - Thread.new do - Thread.current.name = name - block.call - rescue Exception => exception - handle_thread_error(exception) - raise - end - end end end diff --git a/lib/solid_queue/processes/thread_terminated_error.rb b/lib/solid_queue/processes/thread_terminated_error.rb new file mode 100644 index 00000000..7f1d6f7a --- /dev/null +++ b/lib/solid_queue/processes/thread_terminated_error.rb @@ -0,0 +1,11 @@ +# frozen_string_literal: true + +module SolidQueue + module Processes + class ThreadTerminatedError < RuntimeError + def initialize(name) + super("Thread #{name} terminated unexpectedly") + end + end + end +end diff --git a/lib/solid_queue/supervisor.rb b/lib/solid_queue/supervisor.rb index ef9c79d6..ae17ec95 100644 --- a/lib/solid_queue/supervisor.rb +++ b/lib/solid_queue/supervisor.rb @@ -13,17 +13,21 @@ def start(**options) configuration = Configuration.new(**options) if configuration.valid? - new(configuration).tap(&:start) + klass = configuration.mode.fork? ? ForkSupervisor : AsyncSupervisor + klass.new(configuration).tap(&:start) else abort configuration.errors.full_messages.join("\n") + "\nExiting..." end end end + delegate :mode, :standalone?, to: :configuration + def initialize(configuration) @configuration = configuration - @forks = {} + @configured_processes = {} + @process_instances = {} super end @@ -43,8 +47,12 @@ def stop run_stop_hooks end + def kind + "Supervisor(#{mode})" + end + private - attr_reader :configuration, :forks, :configured_processes + attr_reader :configuration, :configured_processes, :process_instances def boot SolidQueue.instrument(:start_process, process: self) do @@ -62,11 +70,13 @@ def supervise loop do break if stopped? - set_procline - process_signal_queue + if standalone? + set_procline + process_signal_queue + end unless stopped? - reap_and_replace_terminated_forks + check_and_replace_terminated_processes interruptible_sleep(1.second) end end @@ -77,30 +87,23 @@ def supervise def start_process(configured_process) process_instance = configured_process.instantiate.tap do |instance| instance.supervised_by process - instance.mode = :fork + instance.mode = mode end - pid = fork do - process_instance.start - end + process_id = process_instance.start - configured_processes[pid] = configured_process - forks[pid] = process_instance + configured_processes[process_id] = configured_process + process_instances[process_id] = process_instance end - def set_procline - procline "supervising #{supervised_processes.join(", ")}" + def check_and_replace_terminated_processes end def terminate_gracefully - SolidQueue.instrument(:graceful_termination, process_id: process_id, supervisor_pid: ::Process.pid, supervised_processes: supervised_processes) do |payload| - term_forks - - Timer.wait_until(SolidQueue.shutdown_timeout, -> { all_forks_terminated? }) do - reap_terminated_forks - end + SolidQueue.instrument(:graceful_termination, process_id: process_id, supervisor_pid: ::Process.pid, supervised_processes: configured_processes.keys) do |payload| + perform_graceful_termination - unless all_forks_terminated? + unless all_processes_terminated? payload[:shutdown_timeout_exceeded] = true terminate_immediately end @@ -108,84 +111,37 @@ def terminate_gracefully end def terminate_immediately - SolidQueue.instrument(:immediate_termination, process_id: process_id, supervisor_pid: ::Process.pid, supervised_processes: supervised_processes) do - quit_forks - end - end - - def shutdown - SolidQueue.instrument(:shutdown_process, process: self) do - run_callbacks(:shutdown) do - stop_maintenance_task - end + SolidQueue.instrument(:immediate_termination, process_id: process_id, supervisor_pid: ::Process.pid, supervised_processes: configured_processes.keys) do + perform_immediate_termination end end - def sync_std_streams - STDOUT.sync = STDERR.sync = true + def perform_graceful_termination + raise NotImplementedError end - def supervised_processes - forks.keys + def perform_immediate_termination + raise NotImplementedError end - def term_forks - signal_processes(forks.keys, :TERM) + def all_processes_terminated? + raise NotImplementedError end - def quit_forks - signal_processes(forks.keys, :QUIT) - end - - def reap_and_replace_terminated_forks - loop do - pid, status = ::Process.waitpid2(-1, ::Process::WNOHANG) - break unless pid - - replace_fork(pid, status) - end - end - - def reap_terminated_forks - loop do - pid, status = ::Process.waitpid2(-1, ::Process::WNOHANG) - break unless pid - - if (terminated_fork = forks.delete(pid)) && (!status.exited? || status.exitstatus > 0) - handle_claimed_jobs_by(terminated_fork, status) - end - - configured_processes.delete(pid) - end - rescue SystemCallError - # All children already reaped - end - - def replace_fork(pid, status) - SolidQueue.instrument(:replace_fork, supervisor_pid: ::Process.pid, pid: pid, status: status) do |payload| - if terminated_fork = forks.delete(pid) - payload[:fork] = terminated_fork - handle_claimed_jobs_by(terminated_fork, status) - - start_process(configured_processes.delete(pid)) + def shutdown + SolidQueue.instrument(:shutdown_process, process: self) do + run_callbacks(:shutdown) do + stop_maintenance_task end end end - # When a supervised fork crashes or exits we need to mark all the - # executions it had claimed as failed so that they can be retried - # by some other worker. - def handle_claimed_jobs_by(terminated_fork, status) - wrap_in_app_executor do - if registered_process = SolidQueue::Process.find_by(name: terminated_fork.name) - error = Processes::ProcessExitError.new(status) - registered_process.fail_all_claimed_executions_with(error) - end - end + def set_procline + procline "supervising #{configured_processes.keys.join(", ")}" end - def all_forks_terminated? - forks.empty? + def sync_std_streams + STDOUT.sync = STDERR.sync = true end end end diff --git a/lib/solid_queue/supervisor/maintenance.rb b/lib/solid_queue/supervisor/maintenance.rb index 1b6b5204..d92569d5 100644 --- a/lib/solid_queue/supervisor/maintenance.rb +++ b/lib/solid_queue/supervisor/maintenance.rb @@ -32,5 +32,16 @@ def fail_orphaned_executions ClaimedExecution.orphaned.fail_all_with(Processes::ProcessMissingError.new) end end + + # When a supervised process crashes or exits we need to mark all the + # executions it had claimed as failed so that they can be retried + # by some other worker. + def release_claimed_jobs_by(terminated_process, with_error:) + wrap_in_app_executor do + if registered_process = SolidQueue::Process.find_by(name: terminated_process.name) + registered_process.fail_all_claimed_executions_with(with_error) + end + end + end end end diff --git a/lib/solid_queue/supervisor/signals.rb b/lib/solid_queue/supervisor/signals.rb index fe0960d5..7bee107d 100644 --- a/lib/solid_queue/supervisor/signals.rb +++ b/lib/solid_queue/supervisor/signals.rb @@ -6,8 +6,8 @@ module Signals extend ActiveSupport::Concern included do - before_boot :register_signal_handlers - after_shutdown :restore_default_signal_handlers + before_boot :register_signal_handlers, if: :standalone? + after_shutdown :restore_default_signal_handlers, if: :standalone? end private diff --git a/lib/solid_queue/timer.rb b/lib/solid_queue/timer.rb index ca16466d..19e691f5 100644 --- a/lib/solid_queue/timer.rb +++ b/lib/solid_queue/timer.rb @@ -4,18 +4,18 @@ module SolidQueue module Timer extend self - def wait_until(timeout, condition, &block) + def wait_until(timeout, condition) if timeout > 0 deadline = monotonic_time_now + timeout while monotonic_time_now < deadline && !condition.call sleep 0.1 - block.call + yield if block_given? end else while !condition.call sleep 0.5 - block.call + yield if block_given? end end end diff --git a/test/dummy/config/puma.rb b/test/dummy/config/puma.rb deleted file mode 100644 index d4f1ae10..00000000 --- a/test/dummy/config/puma.rb +++ /dev/null @@ -1,44 +0,0 @@ -# Puma can serve each request in a thread from an internal thread pool. -# The `threads` method setting takes two numbers: a minimum and maximum. -# Any libraries that use thread pools should be configured to match -# the maximum value specified for Puma. Default is set to 5 threads for minimum -# and maximum; this matches the default thread size of Active Record. -# -max_threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 } -min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count } -threads min_threads_count, max_threads_count - -# Specifies the `worker_timeout` threshold that Puma will use to wait before -# terminating a worker in development environments. -# -worker_timeout 3600 if ENV.fetch("RAILS_ENV", "development") == "development" - -# Specifies the `port` that Puma will listen on to receive requests; default is 3000. -# -port ENV.fetch("PORT") { 3000 } - -# Specifies the `environment` that Puma will run in. -# -environment ENV.fetch("RAILS_ENV") { "development" } - -# Specifies the `pidfile` that Puma will use. -pidfile ENV.fetch("PIDFILE") { "tmp/pids/server.pid" } - -# Specifies the number of `workers` to boot in clustered mode. -# Workers are forked web server processes. If using threads and workers together -# the concurrency of the application would be max `threads` * `workers`. -# Workers do not work on JRuby or Windows (both of which do not support -# processes). -# -# workers ENV.fetch("WEB_CONCURRENCY") { 2 } - -# Use the `preload_app!` method when specifying a `workers` number. -# This directive tells Puma to first boot the application and load code -# before forking the application. This takes advantage of Copy On Write -# process behavior so workers use less memory. -# -# preload_app! - -# Allow puma to be restarted by `bin/rails restart` command. -plugin :tmp_restart -plugin :solid_queue diff --git a/test/dummy/config/puma.rb b/test/dummy/config/puma.rb new file mode 120000 index 00000000..f923a826 --- /dev/null +++ b/test/dummy/config/puma.rb @@ -0,0 +1 @@ +puma_fork.rb \ No newline at end of file diff --git a/test/dummy/config/puma_async.rb b/test/dummy/config/puma_async.rb new file mode 100644 index 00000000..beb65259 --- /dev/null +++ b/test/dummy/config/puma_async.rb @@ -0,0 +1,46 @@ +# Puma can serve each request in a thread from an internal thread pool. +# The `threads` method setting takes two numbers: a minimum and maximum. +# Any libraries that use thread pools should be configured to match +# the maximum value specified for Puma. Default is set to 5 threads for minimum +# and maximum; this matches the default thread size of Active Record. +# +max_threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 } +min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count } +threads min_threads_count, max_threads_count + +# Specifies the `worker_timeout` threshold that Puma will use to wait before +# terminating a worker in development environments. +# +worker_timeout 3600 if ENV.fetch("RAILS_ENV", "development") == "development" + +# Specifies the `port` that Puma will listen on to receive requests; default is 3000. +# +port ENV.fetch("PORT") { 3000 } + +# Specifies the `environment` that Puma will run in. +# +environment ENV.fetch("RAILS_ENV") { "development" } + +# Specifies the `pidfile` that Puma will use. +pidfile ENV.fetch("PIDFILE") { "tmp/pids/server.pid" } + +# Specifies the number of `workers` to boot in clustered mode. +# Workers are forked web server processes. If using threads and workers together +# the concurrency of the application would be max `threads` * `workers`. +# Workers do not work on JRuby or Windows (both of which do not support +# processes). +# +# workers ENV.fetch("WEB_CONCURRENCY") { 2 } + +# Use the `preload_app!` method when specifying a `workers` number. +# This directive tells Puma to first boot the application and load code +# before forking the application. This takes advantage of Copy On Write +# process behavior so workers use less memory. +# +# preload_app! + +# Allow puma to be restarted by `bin/rails restart` command. +plugin :tmp_restart +plugin :solid_queue + +solid_queue_mode :async diff --git a/test/dummy/config/puma_fork.rb b/test/dummy/config/puma_fork.rb new file mode 100644 index 00000000..4cdbbfd1 --- /dev/null +++ b/test/dummy/config/puma_fork.rb @@ -0,0 +1,46 @@ +# Puma can serve each request in a thread from an internal thread pool. +# The `threads` method setting takes two numbers: a minimum and maximum. +# Any libraries that use thread pools should be configured to match +# the maximum value specified for Puma. Default is set to 5 threads for minimum +# and maximum; this matches the default thread size of Active Record. +# +max_threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 } +min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count } +threads min_threads_count, max_threads_count + +# Specifies the `worker_timeout` threshold that Puma will use to wait before +# terminating a worker in development environments. +# +worker_timeout 3600 if ENV.fetch("RAILS_ENV", "development") == "development" + +# Specifies the `port` that Puma will listen on to receive requests; default is 3000. +# +port ENV.fetch("PORT") { 3000 } + +# Specifies the `environment` that Puma will run in. +# +environment ENV.fetch("RAILS_ENV") { "development" } + +# Specifies the `pidfile` that Puma will use. +pidfile ENV.fetch("PIDFILE") { "tmp/pids/server.pid" } + +# Specifies the number of `workers` to boot in clustered mode. +# Workers are forked web server processes. If using threads and workers together +# the concurrency of the application would be max `threads` * `workers`. +# Workers do not work on JRuby or Windows (both of which do not support +# processes). +# +# workers ENV.fetch("WEB_CONCURRENCY") { 2 } + +# Use the `preload_app!` method when specifying a `workers` number. +# This directive tells Puma to first boot the application and load code +# before forking the application. This takes advantage of Copy On Write +# process behavior so workers use less memory. +# +# preload_app! + +# Allow puma to be restarted by `bin/rails restart` command. +plugin :tmp_restart +plugin :solid_queue + +solid_queue_mode :fork diff --git a/test/integration/async_processes_lifecycle_test.rb b/test/integration/async_processes_lifecycle_test.rb new file mode 100644 index 00000000..1d22a2c9 --- /dev/null +++ b/test/integration/async_processes_lifecycle_test.rb @@ -0,0 +1,222 @@ +# frozen_string_literal: true + +require "test_helper" + +class AsyncProcessesLifecycleTest < ActiveSupport::TestCase + self.use_transactional_tests = false + + setup do + @pid = run_supervisor_as_fork(mode: :async, workers: [ { queues: :background }, { queues: :default, threads: 5 } ]) + + wait_for_registered_processes(3, timeout: 3.second) + assert_registered_workers_for(:background, :default, supervisor_pid: @pid) + end + + teardown do + terminate_process(@pid) if process_exists?(@pid) + end + + test "enqueue jobs in multiple queues" do + 6.times { |i| enqueue_store_result_job("job_#{i}") } + 6.times { |i| enqueue_store_result_job("job_#{i}", :default) } + + wait_for_jobs_to_finish_for(2.seconds) + + assert_equal 12, JobResult.count + 6.times { |i| assert_completed_job_results("job_#{i}", :background) } + 6.times { |i| assert_completed_job_results("job_#{i}", :default) } + + terminate_process(@pid) + assert_clean_termination + end + + test "kill supervisor while there are jobs in-flight" do + no_pause = enqueue_store_result_job("no pause") + pause = enqueue_store_result_job("pause", pause: 1.second) + + signal_process(@pid, :KILL, wait: 0.2.seconds) + wait_for_jobs_to_finish_for(2.seconds) + wait_for_registered_processes(1, timeout: 3.second) + + assert_not process_exists?(@pid) + + assert_completed_job_results("no pause") + assert_job_status(no_pause, :finished) + + # In async mode, killing the supervisor kills all threads too, + # so we can't complete in-flight jobs + assert_registered_supervisor + assert_registered_workers_for(:background, :default, supervisor_pid: @pid) + assert_started_job_result("pause") + assert_claimed_jobs + end + + test "term supervisor multiple times" do + 5.times do + signal_process(@pid, :TERM, wait: 0.1.second) + end + + sleep(1.second) + assert_clean_termination + end + + test "quit supervisor while there are jobs in-flight" do + no_pause = enqueue_store_result_job("no pause") + pause = enqueue_store_result_job("pause", pause: 1.second) + + wait_while_with_timeout(1.second) { SolidQueue::ReadyExecution.count > 0 } + + signal_process(@pid, :QUIT, wait: 0.4.second) + wait_for_jobs_to_finish_for(2.seconds, except: pause) + + wait_while_with_timeout(2.seconds) { process_exists?(@pid) } + assert_not process_exists?(@pid) + + # In async mode, QUIT calls exit! which terminates immediately without cleanup. + # The in-flight job remains claimed and the process/workers remain registered. + # A future supervisor will need to prune and fail these orphaned executions. + assert_completed_job_results("no pause") + assert_job_status(no_pause, :finished) + assert_started_job_result("pause") + assert_job_status(pause, :claimed) + + assert_registered_supervisor + assert_registered_workers_for(:background, :default, supervisor_pid: @pid) + assert_claimed_jobs + end + + test "term supervisor while there are jobs in-flight" do + no_pause = enqueue_store_result_job("no pause") + pause = enqueue_store_result_job("pause", pause: 0.2.seconds) + + signal_process(@pid, :TERM, wait: 0.3.second) + wait_for_jobs_to_finish_for(3.seconds) + + assert_completed_job_results("no pause") + assert_completed_job_results("pause") + + assert_job_status(no_pause, :finished) + assert_job_status(pause, :finished) + + wait_for_process_termination_with_timeout(@pid, timeout: 1.second) + assert_clean_termination + end + + test "int supervisor while there are jobs in-flight" do + no_pause = enqueue_store_result_job("no pause") + pause = enqueue_store_result_job("pause", pause: 0.2.seconds) + + signal_process(@pid, :INT, wait: 0.3.second) + wait_for_jobs_to_finish_for(2.second) + + assert_completed_job_results("no pause") + assert_completed_job_results("pause") + + assert_job_status(no_pause, :finished) + assert_job_status(pause, :finished) + + wait_for_process_termination_with_timeout(@pid, timeout: 1.second) + assert_clean_termination + end + + test "term supervisor exceeding timeout while there are jobs in-flight" do + no_pause = enqueue_store_result_job("no pause") + pause = enqueue_store_result_job("pause", pause: SolidQueue.shutdown_timeout + 10.second) + + wait_while_with_timeout(1.second) { SolidQueue::ReadyExecution.count > 1 } + + signal_process(@pid, :TERM, wait: 0.5.second) + wait_for_jobs_to_finish_for(2.seconds, except: pause) + + # exit! exits with status 1 by default + wait_for_process_termination_with_timeout(@pid, timeout: SolidQueue.shutdown_timeout + 5.seconds, exitstatus: 1) + assert_not process_exists?(@pid) + + assert_completed_job_results("no pause") + assert_job_status(no_pause, :finished) + + # When timeout is exceeded, exit! is called without cleanup. + # The in-flight job stays claimed and processes stay registered. + # A future supervisor will need to prune and fail these orphaned executions. + assert_started_job_result("pause") + assert_job_status(pause, :claimed) + + assert_registered_supervisor + assert find_processes_registered_as("Worker").any? { |w| w.metadata["queues"].include?("background") } + assert_claimed_jobs + end + + test "process some jobs that raise errors" do + 2.times { enqueue_store_result_job("no error", :background) } + 2.times { enqueue_store_result_job("no error", :default) } + error1 = enqueue_store_result_job("error", :background, exception: ExpectedTestError) + enqueue_store_result_job("no error", :background, pause: 0.03) + error2 = enqueue_store_result_job("error", :background, exception: ExpectedTestError, pause: 0.05) + 2.times { enqueue_store_result_job("no error", :default, pause: 0.01) } + error3 = enqueue_store_result_job("error", :default, exception: ExpectedTestError) + + wait_for_jobs_to_finish_for(2.second, except: [ error1, error2, error3 ]) + + assert_completed_job_results("no error", :background, 3) + assert_completed_job_results("no error", :default, 4) + + wait_while_with_timeout(1.second) { SolidQueue::FailedExecution.count < 3 } + [ error1, error2, error3 ].each do |job| + assert_job_status(job, :failed) + end + + terminate_process(@pid) + assert_clean_termination + end + + + private + def assert_clean_termination + wait_for_registered_processes 0, timeout: 0.2.second + assert_no_registered_processes + assert_no_claimed_jobs + assert_not process_exists?(@pid) + end + + def assert_registered_workers_for(*queues, supervisor_pid: nil) + workers = find_processes_registered_as("Worker") + registered_queues = workers.map { |process| process.metadata["queues"] }.compact + assert_equal queues.map(&:to_s).sort, registered_queues.sort + if supervisor_pid + assert_equal [ supervisor_pid ], workers.map { |process| process.supervisor.pid }.uniq + end + end + + def assert_registered_supervisor + processes = find_processes_registered_as("Supervisor(async)") + assert_equal 1, processes.count + assert_equal @pid, processes.first.pid + end + + def assert_no_registered_workers + assert_empty find_processes_registered_as("Worker").to_a + end + + def enqueue_store_result_job(value, queue_name = :background, **options) + StoreResultJob.set(queue: queue_name).perform_later(value, **options) + end + + def assert_completed_job_results(value, queue_name = :background, count = 1) + skip_active_record_query_cache do + assert_equal count, JobResult.where(queue_name: queue_name, status: "completed", value: value).count + end + end + + def assert_started_job_result(value, queue_name = :background, count = 1) + skip_active_record_query_cache do + assert_equal count, JobResult.where(queue_name: queue_name, status: "started", value: value).count + end + end + + def assert_job_status(active_job, status) + skip_active_record_query_cache do + job = SolidQueue::Job.find_by(active_job_id: active_job.job_id) + assert job.public_send("#{status}?") + end + end +end diff --git a/test/integration/processes_lifecycle_test.rb b/test/integration/forked_processes_lifecycle_test.rb similarity index 98% rename from test/integration/processes_lifecycle_test.rb rename to test/integration/forked_processes_lifecycle_test.rb index 47d56b4d..561166c5 100644 --- a/test/integration/processes_lifecycle_test.rb +++ b/test/integration/forked_processes_lifecycle_test.rb @@ -2,7 +2,7 @@ require "test_helper" -class ProcessesLifecycleTest < ActiveSupport::TestCase +class ForkedProcessesLifecycleTest < ActiveSupport::TestCase self.use_transactional_tests = false setup do @@ -283,7 +283,7 @@ def assert_registered_workers_for(*queues, supervisor_pid: nil) end def assert_registered_supervisor_with(pid) - processes = find_processes_registered_as("Supervisor") + processes = find_processes_registered_as("Supervisor(fork)") assert_equal 1, processes.count assert_equal pid, processes.first.pid end diff --git a/test/integration/instrumentation_test.rb b/test/integration/instrumentation_test.rb index 4440fe61..fcdf448d 100644 --- a/test/integration/instrumentation_test.rb +++ b/test/integration/instrumentation_test.rb @@ -166,7 +166,7 @@ class InstrumentationTest < ActiveSupport::TestCase SolidQueue::Process.any_instance.expects(:destroy!).raises(error).at_least_once events = subscribed("deregister_process.solid_queue") do - assert_raises RuntimeError do + assert_raises ExpectedTestError do worker = SolidQueue::Worker.new.tap(&:start) wait_for_registered_processes(1, timeout: 1.second) diff --git a/test/integration/lifecycle_hooks_test.rb b/test/integration/lifecycle_hooks_test.rb index da7feedc..7cd04a82 100644 --- a/test/integration/lifecycle_hooks_test.rb +++ b/test/integration/lifecycle_hooks_test.rb @@ -87,7 +87,7 @@ class LifecycleHooksTest < ActiveSupport::TestCase end assert_equal %w[ - supervisor_start supervisor_stop supervisor_exit + forksupervisor_start forksupervisor_stop forksupervisor_exit worker_first_queue_start worker_first_queue_stop worker_first_queue_exit worker_second_queue_start worker_second_queue_stop worker_second_queue_exit dispatcher_100_start dispatcher_100_stop dispatcher_100_exit diff --git a/test/integration/puma/plugin_async_test.rb b/test/integration/puma/plugin_async_test.rb new file mode 100644 index 00000000..551ebd63 --- /dev/null +++ b/test/integration/puma/plugin_async_test.rb @@ -0,0 +1,13 @@ +# frozen_string_literal: true + +require "test_helper" +require_relative "plugin_testing" + +class PluginAsyncTest < ActiveSupport::TestCase + include PluginTesting + + private + def solid_queue_mode + :async + end +end diff --git a/test/integration/puma/plugin_fork_test.rb b/test/integration/puma/plugin_fork_test.rb new file mode 100644 index 00000000..40f1fd18 --- /dev/null +++ b/test/integration/puma/plugin_fork_test.rb @@ -0,0 +1,30 @@ +# frozen_string_literal: true + +require "test_helper" +require_relative "plugin_testing" + +class PluginForkTest < ActiveSupport::TestCase + include PluginTesting + + test "stop puma when solid queue's supervisor dies" do + supervisor = find_processes_registered_as("Supervisor(fork)").first + + signal_process(supervisor.pid, :KILL) + wait_for_process_termination_with_timeout(@pid) + + assert_not process_exists?(@pid) + + # When the supervisor is KILLed, the forked processes become orphans. + # Clean them up manually. + SolidQueue::Process.all.each do |process| + signal_process(process.pid, :KILL) if process_exists?(process.pid) + end + + wait_for_registered_processes 0, timeout: 3.second + end + + private + def solid_queue_mode + :fork + end +end diff --git a/test/integration/puma/plugin_test.rb b/test/integration/puma/plugin_test.rb deleted file mode 100644 index bac98a2b..00000000 --- a/test/integration/puma/plugin_test.rb +++ /dev/null @@ -1,63 +0,0 @@ -# frozen_string_literal: true - -require "test_helper" - -class PluginTest < ActiveSupport::TestCase - self.use_transactional_tests = false - - setup do - FileUtils.mkdir_p Rails.root.join("tmp", "pids") - Dir.chdir("test/dummy") do - cmd = %W[ - bundle exec puma - -b tcp://127.0.0.1:9222 - -C config/puma.rb - -s - config.ru - ] - @pid = fork do - exec(*cmd) - end - end - wait_for_registered_processes 5, timeout: 3.second - end - - teardown do - terminate_process(@pid, signal: :INT) if process_exists?(@pid) - wait_for_registered_processes 0, timeout: 2.seconds - end - - test "perform jobs inside puma's process" do - StoreResultJob.perform_later(:puma_plugin) - - wait_for_jobs_to_finish_for(2.seconds) - assert_equal 1, JobResult.where(queue_name: :background, status: "completed", value: :puma_plugin).count - end - - test "stop the queue on puma's restart" do - signal_process(@pid, :SIGUSR2) - # Ensure the restart finishes before we try to continue with the test - wait_for_registered_processes(0, timeout: 3.second) - wait_for_registered_processes(5, timeout: 3.second) - - StoreResultJob.perform_later(:puma_plugin) - wait_for_jobs_to_finish_for(2.seconds) - assert_equal 1, JobResult.where(queue_name: :background, status: "completed", value: :puma_plugin).count - end - - test "stop puma when solid queue's supervisor dies" do - supervisor = find_processes_registered_as("Supervisor").first - - signal_process(supervisor.pid, :KILL) - wait_for_process_termination_with_timeout(@pid) - - assert_not process_exists?(@pid) - - # Make sure all supervised processes are also terminated - SolidQueue::Process.all.each do |process| - signal_process(process.pid, :KILL) if process_exists?(process.pid) - end - - wait_for_registered_processes 0, timeout: 3.second - end -end diff --git a/test/integration/puma/plugin_testing.rb b/test/integration/puma/plugin_testing.rb new file mode 100644 index 00000000..ec2198f8 --- /dev/null +++ b/test/integration/puma/plugin_testing.rb @@ -0,0 +1,61 @@ +# frozen_string_literal: true + +require "test_helper" + +module PluginTesting + extend ActiveSupport::Concern + extend ActiveSupport::Testing::Declarative + + included do + self.use_transactional_tests = false + + setup do + FileUtils.mkdir_p Rails.root.join("tmp", "pids") + + Dir.chdir("test/dummy") do + cmd = %W[ + bundle exec puma + -b tcp://127.0.0.1:9222 + -C config/puma_#{solid_queue_mode}.rb + -s + config.ru + ] + + @pid = fork do + exec(*cmd) + end + end + + wait_for_registered_processes(5, timeout: 3.second) + end + + teardown do + terminate_process(@pid, signal: :INT) if process_exists?(@pid) + + wait_for_registered_processes 0, timeout: 2.seconds + end + end + + test "perform jobs inside puma's process" do + StoreResultJob.perform_later(:puma_plugin) + + wait_for_jobs_to_finish_for(2.seconds) + assert_equal 1, JobResult.where(queue_name: :background, status: "completed", value: :puma_plugin).count + end + + test "stop the queue on puma's restart" do + signal_process(@pid, :SIGUSR2) + # Ensure the restart finishes before we try to continue with the test + wait_for_registered_processes(0, timeout: 3.second) + wait_for_registered_processes(5, timeout: 3.second) + + StoreResultJob.perform_later(:puma_plugin) + wait_for_jobs_to_finish_for(2.seconds) + assert_equal 1, JobResult.where(queue_name: :background, status: "completed", value: :puma_plugin).count + end + + private + def solid_queue_mode + raise NotImplementedError + end +end diff --git a/test/models/solid_queue/process_test.rb b/test/models/solid_queue/process_test.rb index b69a67de..cd2430ca 100644 --- a/test/models/solid_queue/process_test.rb +++ b/test/models/solid_queue/process_test.rb @@ -34,7 +34,7 @@ class SolidQueue::ProcessTest < ActiveSupport::TestCase end test "prune processes including their supervisor with expired heartbeats and fail claimed executions" do - supervisor = SolidQueue::Process.register(kind: "Supervisor", pid: 42, name: "supervisor-42") + supervisor = SolidQueue::Process.register(kind: "Supervisor(fork)", pid: 42, name: "supervisor-42") process = SolidQueue::Process.register(kind: "Worker", pid: 43, name: "worker-43", supervisor_id: supervisor.id) 3.times { |i| StoreResultJob.set(queue: :new_queue).perform_later(i) } jobs = SolidQueue::Job.last(3) diff --git a/test/test_helper.rb b/test/test_helper.rb index 30caca0e..60bab7a3 100644 --- a/test/test_helper.rb +++ b/test/test_helper.rb @@ -38,6 +38,8 @@ def destroy_records SolidQueue::Process.destroy_all SolidQueue::Semaphore.delete_all SolidQueue::RecurringTask.delete_all + SolidQueue::ScheduledExecution.delete_all + SolidQueue::ReadyExecution.delete_all JobResult.delete_all end diff --git a/test/unit/async_supervisor_test.rb b/test/unit/async_supervisor_test.rb new file mode 100644 index 00000000..46948417 --- /dev/null +++ b/test/unit/async_supervisor_test.rb @@ -0,0 +1,107 @@ +require "test_helper" + +class AsyncSupervisorTest < ActiveSupport::TestCase + self.use_transactional_tests = false + + test "start as non-standalone" do + supervisor = run_supervisor_as_thread + wait_for_registered_processes(4) + + assert_registered_processes(kind: "Supervisor(async)") + assert_registered_processes(kind: "Worker", supervisor_id: supervisor.process_id, count: 2) + assert_registered_processes(kind: "Dispatcher", supervisor_id: supervisor.process_id) + + supervisor.stop + + assert_no_registered_processes + end + + test "start standalone" do + pid = run_supervisor_as_fork(mode: :async) + wait_for_registered_processes(4) + + assert_registered_processes(kind: "Supervisor(async)") + assert_registered_processes(kind: "Worker", supervisor_pid: pid, count: 2) + assert_registered_processes(kind: "Dispatcher", supervisor_pid: pid) + + terminate_process(pid) + assert_no_registered_processes + end + + test "start as non-standalone with provided configuration" do + supervisor = run_supervisor_as_thread(workers: [], dispatchers: [ { batch_size: 100 } ]) + wait_for_registered_processes(2) # supervisor + dispatcher + + assert_registered_processes(kind: "Supervisor(async)") + assert_registered_processes(kind: "Worker", count: 0) + assert_registered_processes(kind: "Dispatcher", supervisor_id: supervisor.process_id) + + supervisor.stop + + assert_no_registered_processes + end + + test "failed orphaned executions as non-standalone" do + simulate_orphaned_executions 3 + + config = { + workers: [ { queues: "background", polling_interval: 10 } ], + dispatchers: [] + } + + supervisor = run_supervisor_as_thread(**config) + wait_for_registered_processes(2) # supervisor + 1 worker + assert_registered_processes(kind: "Supervisor(async)") + + wait_while_with_timeout(1.second) { SolidQueue::ClaimedExecution.count > 0 } + + supervisor.stop + + skip_active_record_query_cache do + assert_equal 0, SolidQueue::ClaimedExecution.count + assert_equal 3, SolidQueue::FailedExecution.count + end + end + + test "failed orphaned executions as standalone" do + simulate_orphaned_executions 3 + + config = { + workers: [ { queues: "background", polling_interval: 10 } ], + dispatchers: [] + } + + pid = run_supervisor_as_fork(mode: :async, **config) + wait_for_registered_processes(2) # supervisor + 1 worker + assert_registered_processes(kind: "Supervisor(async)") + + wait_while_with_timeout(1.second) { SolidQueue::ClaimedExecution.count > 0 } + + terminate_process(pid) + + skip_active_record_query_cache do + assert_equal 0, SolidQueue::ClaimedExecution.count + assert_equal 3, SolidQueue::FailedExecution.count + end + end + + private + def run_supervisor_as_thread(**options) + SolidQueue::Supervisor.start(mode: :async, standalone: false, **options) + end + + def simulate_orphaned_executions(count) + count.times { |i| StoreResultJob.set(queue: :new_queue).perform_later(i) } + process = SolidQueue::Process.register(kind: "Worker", pid: 42, name: "worker-123") + + SolidQueue::ReadyExecution.claim("*", count + 1, process.id) + + assert_equal count, SolidQueue::ClaimedExecution.count + assert_equal 0, SolidQueue::ReadyExecution.count + + assert_equal [ process.id ], SolidQueue::ClaimedExecution.last(3).pluck(:process_id).uniq + + # Simulate orphaned executions by just wiping the claiming process + process.delete + end +end diff --git a/test/unit/dispatcher_test.rb b/test/unit/dispatcher_test.rb index 359bb504..7df0591f 100644 --- a/test/unit/dispatcher_test.rb +++ b/test/unit/dispatcher_test.rb @@ -89,8 +89,10 @@ class DispatcherTest < ActiveSupport::TestCase wait_while_with_timeout(1.second) { SolidQueue::ScheduledExecution.any? } - assert_equal 0, SolidQueue::ScheduledExecution.count - assert_equal 15, SolidQueue::ReadyExecution.count + skip_active_record_query_cache do + assert_equal 0, SolidQueue::ScheduledExecution.count + assert_equal 15, SolidQueue::ReadyExecution.count + end ensure another_dispatcher&.stop end @@ -108,8 +110,10 @@ class DispatcherTest < ActiveSupport::TestCase dispatcher.start wait_while_with_timeout(1.second) { SolidQueue::ScheduledExecution.any? } - assert_equal 0, SolidQueue::ScheduledExecution.count - assert_equal 3, SolidQueue::ReadyExecution.count + skip_active_record_query_cache do + assert_equal 0, SolidQueue::ScheduledExecution.count + assert_equal 3, SolidQueue::ReadyExecution.count + end ensure dispatcher.stop end diff --git a/test/unit/supervisor_test.rb b/test/unit/fork_supervisor_test.rb similarity index 91% rename from test/unit/supervisor_test.rb rename to test/unit/fork_supervisor_test.rb index 7a531ad2..9ec81b51 100644 --- a/test/unit/supervisor_test.rb +++ b/test/unit/fork_supervisor_test.rb @@ -1,6 +1,6 @@ require "test_helper" -class SupervisorTest < ActiveSupport::TestCase +class ForkSupervisorTest < ActiveSupport::TestCase self.use_transactional_tests = false setup do @@ -186,8 +186,8 @@ class SupervisorTest < ActiveSupport::TestCase end # Regression test for supervisor failing to handle claimed jobs when its own - # process record has been pruned (NoMethodError in #handle_claimed_jobs_by). - test "handle_claimed_jobs_by fails claimed executions even if supervisor record is missing" do + # process record has been pruned (NoMethodError in #release_claimed_jobs_by). + test "release_claimed_jobs_by fails claimed executions even if supervisor record is missing" do worker_name = "worker-test-#{SecureRandom.hex(4)}" worker_process = SolidQueue::Process.register(kind: "Worker", pid: 999_999, name: worker_name) @@ -196,20 +196,14 @@ class SupervisorTest < ActiveSupport::TestCase claimed_execution = SolidQueue::ReadyExecution.claim("*", 1, worker_process.id).first terminated_fork = Struct.new(:name).new(worker_name) + supervisor = SolidQueue::ForkSupervisor.allocate + error = RuntimeError.new - DummyStatus = Struct.new(:pid, :exitstatus) do - def signaled? = false - def termsig = nil - end - status = DummyStatus.new(worker_process.pid, 1) - - supervisor = SolidQueue::Supervisor.allocate - - supervisor.send(:handle_claimed_jobs_by, terminated_fork, status) + supervisor.send(:release_claimed_jobs_by, terminated_fork, with_error: error) failed = SolidQueue::FailedExecution.find_by(job_id: claimed_execution.job_id) assert failed.present? - assert_equal "SolidQueue::Processes::ProcessExitError", failed.exception_class + assert_equal "RuntimeError", failed.exception_class end private @@ -223,7 +217,7 @@ def assert_registered_dispatcher(supervisor_pid: nil) def assert_registered_supervisor(pid) skip_active_record_query_cache do - processes = find_processes_registered_as("Supervisor") + processes = find_processes_registered_as("Supervisor(fork)") assert_equal 1, processes.count assert_nil processes.first.supervisor assert_equal pid, processes.first.pid diff --git a/test/unit/process_recovery_test.rb b/test/unit/process_recovery_test.rb index 620fbd51..e3eccdcf 100644 --- a/test/unit/process_recovery_test.rb +++ b/test/unit/process_recovery_test.rb @@ -20,7 +20,7 @@ class ProcessRecoveryTest < ActiveSupport::TestCase @pid = run_supervisor_as_fork(workers: [ { queues: "*", polling_interval: 0.1, processes: 1 } ]) wait_for_registered_processes(2, timeout: 1.second) # Supervisor + 1 worker - supervisor_process = SolidQueue::Process.find_by(kind: "Supervisor", pid: @pid) + supervisor_process = SolidQueue::Process.find_by(kind: "Supervisor(fork)", pid: @pid) assert supervisor_process worker_process = SolidQueue::Process.find_by(kind: "Worker") diff --git a/test/unit/worker_test.rb b/test/unit/worker_test.rb index 8db67912..3d692404 100644 --- a/test/unit/worker_test.rb +++ b/test/unit/worker_test.rb @@ -35,6 +35,7 @@ class WorkerTest < ActiveSupport::TestCase worker = SolidQueue::Worker.new(queues: "background", threads: 3, polling_interval: 0.2).tap(&:start) sleep(1) + # stop calls join internally when not supervised, which re-raises the error assert_raises ExpectedTestError do worker.stop end