chore: include tn_vacuum extension#1175
Conversation
- Introduced the `leaderwatch` extension to manage leader callbacks during block processing, including functionality for acquiring and losing leadership. - Implemented the `tn_vacuum` extension with various triggers (block interval, cron, manual, and digest coupled) to manage vacuum operations based on configurable parameters. - Added configuration loading and initialization for both extensions, ensuring they integrate with the existing hooks and service infrastructure. - Included comprehensive tests for the `leaderwatch` and `tn_vacuum` functionalities to ensure reliability and correctness. This update enhances the system's ability to manage leadership transitions and perform vacuum operations efficiently, contributing to overall performance improvements.
WalkthroughAdds a leaderwatch extension and rewires tn_digest to use it; introduces a new tn_vacuum extension (pg_repack mechanism, metrics, PG-backed run-state, runner/worker lifecycle); builds/installs pg_repack in deployment images; updates many Docker image references to ghcr.io/trufnetwork/kwil-postgres; adjusts ERC20 test injection semantics and test utilities; documentation and go.mod updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App as Node App
participant Reg as extensions/register.go
participant LW as leaderwatch
participant TD as tn_digest
participant TV as tn_vacuum
Note over Reg: init()
Reg->>LW: InitializeExtension()
Reg->>TD: (existing) InitializeExtension()
Reg->>TV: InitializeExtension()
App->>LW: engine_ready
LW-->>App: capture service/logger
loop each end_block
App->>LW: end_block(block)
LW->>LW: determineLeader(app, block)
alt leader change
LW->>TD: OnAcquire/OnLose(block)
LW->>TV: OnAcquire/OnLose(block)
end
LW->>TD: OnEndBlock(block)
LW->>TV: OnEndBlock(block)
end
sequenceDiagram
autonumber
participant App as App
participant TV as tn_vacuum
participant CFG as Config
participant ST as StateStore (PG)
participant MEC as Mechanism (pg_repack)
participant RUN as Runner
participant MET as Metrics
App->>TV: engine_ready
TV->>CFG: LoadConfig(service)
TV->>TV: configure(service, cfg)
TV->>ST: Ensure()/Load()
alt cfg.Enabled
TV->>TV: start worker
else not enabled
Note over TV: idle
end
loop end_block
App->>TV: end_block(block)
alt interval reached
TV->>RUN: Execute({Mechanism, DB, PgRepackJobs, Metrics})
RUN->>MET: RecordVacuumStart
RUN->>MEC: Run({Reason, DB, Jobs})
MEC->>MEC: ensure pg_repack extension
MEC->>MEC: exec pg_repack
MEC-->>RUN: RunReport/err
alt success
RUN->>MET: RecordVacuumComplete
TV->>ST: Save(last_run_height/at)
else error
RUN->>MET: RecordVacuumError
end
else skip
TV->>MET: RecordVacuumSkipped
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (17)
✅ Files skipped from review due to trivial changes (2)
🧰 Additional context used🧠 Learnings (3)📚 Learning: 2025-09-23T18:50:13.274ZApplied to files:
📚 Learning: 2025-09-19T18:59:51.942ZApplied to files:
📚 Learning: 2025-09-19T18:14:01.428ZApplied to files:
🪛 markdownlint-cli2 (0.18.1)docs/node-operator-guide.md496-496: Emphasis used instead of a heading (MD036, no-emphasis-as-heading) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Removed the `leader.go` file and integrated leader acquisition, loss, and end block handling into the `leaderwatch` extension. - Updated tests in `leader_reload_test.go` to utilize the new leader handling functions. - Ensured compatibility with existing hooks while enhancing the management of leadership transitions. This refactor improves code organization and maintains the functionality of leadership management within the system.
- Added a new `pgRepackMechanism` to the `tn_vacuum` extension, which utilizes the `pg_repack` binary for vacuum operations. - Updated the Dockerfile to include the `pg_repack` binary installation alongside the PostgreSQL client tools. - Modified the `compose.yaml` to allow dynamic image tagging for the `tn-db` service. - Enhanced tests to verify the availability of the `pg_repack` binary and its integration within the vacuum mechanism. This addition improves the vacuuming capabilities of the system, ensuring more efficient database maintenance.
- Introduced a new `dbConnFromService` function to extract database connection details from the service configuration. - Updated the `pgRepackMechanism` to utilize the database connection information during preparation and execution. - Modified the `RunnerArgs` structure to include `DBConnConfig`, ensuring that the database connection is passed through the execution flow. - Enhanced tests to validate the integration of database connection handling within the `pg_repack` mechanism. These changes improve the robustness of the vacuum operations by ensuring proper database connectivity and configuration management.
- Refactored the `Config` structure in `config.go` to streamline configuration loading by removing the `TriggerConfig` and related fields. - Updated the `LoadConfig` function to directly handle block interval configuration. - Removed unused trigger implementations (`trigger_block_interval.go`, `trigger_cron.go`, `trigger_digest.go`, `trigger_manual.go`, and `trigger.go`) to clean up the codebase. - Adjusted the `Extension` struct and related methods to reflect the removal of triggers and ensure proper configuration handling. - Enhanced tests to validate the new configuration logic and ensure the mechanism operates correctly without the removed triggers. These changes improve the maintainability of the `tn_vacuum` extension by simplifying its configuration and reducing unnecessary complexity.
This commit modifies the `TestERC20BridgeEndToEnd` function in the ERC20 bridge end-to-end test to allow for a nil recipient parameter when calling the bridge action. This change ensures that the test accurately reflects the intended behavior of bridging tokens to the sender's address, improving the reliability of the test case.
This commit introduces the `tn_vacuum` extension, which automates database maintenance through periodic vacuuming operations using `pg_repack`. Key changes include: - Added configuration options for enabling the vacuum extension and setting the block interval. - Implemented metrics recording for vacuum operations, including counters for starts, completions, errors, and skipped operations. - Enhanced the `pgRepackMechanism` to provide detailed run reports, including duration and tables processed. - Updated installation instructions in the documentation to include `pg_repack` requirements for various platforms. - Added tests to validate the new functionality and ensure metrics are recorded correctly. These improvements enhance database performance and observability, making maintenance more efficient.
This commit introduces state persistence to the `tn_vacuum` extension, allowing it to track the last successful vacuum run's block height and timestamp. Key changes include: - Added a new `runState` struct to encapsulate the last run information. - Implemented a `stateStore` interface and a `pgStateStore` struct for managing state persistence in PostgreSQL. - Enhanced the `Extension` struct to include state management methods for initializing, loading, and saving the run state. - Updated the `README.md` to document the new state persistence feature. - Added tests to validate the state persistence functionality and ensure correct behavior during vacuum operations. These enhancements improve the reliability of the vacuum process by allowing the extension to resume its operations seamlessly after restarts.
This commit modifies the Dockerfile to enhance the build process by introducing a new stage for building `pg_repack` against PostgreSQL 16 client libraries. Key changes include: - Replaced the busybox base image with a PostgreSQL 16-alpine image for building `pg_repack`. - Added necessary build dependencies and commands to compile and install `pg_repack`. - Updated the final image to copy the `pg_repack` binary from the build stage, ensuring it is available for use. These changes streamline the Docker image setup and ensure compatibility with the latest PostgreSQL version.
This commit updates the `go.mod` and `go.sum` files to reference the latest versions of the `kwil-db` and `kwil-db/core` packages, ensuring the project benefits from recent improvements and fixes. Additionally, it introduces a new configuration option `pg_repack_jobs` to the `tn_vacuum` extension, allowing users to limit the concurrency of `pg_repack` operations. The README has been updated to reflect this new configuration option, and tests have been added to validate the new functionality.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (12)
tests/extensions/erc20/erc20_bridge_injection_test.go (1)
37-38: Update the outdated comment.The comment on line 37 states "from user to escrow (lock/credit path)", but the code now injects a deposit from user to user (both parameters are
user). The comment should be updated to reflect the Deposit event semantics.Apply this diff to update the comment:
- // Inject a transfer: from user to escrow (lock/credit path) + // Inject a deposit: credits the user's balancenode/exts/erc20-bridge/erc20/meta_extension_deposit_test.go (1)
52-65: Consider adding constants or helpers for log construction.The deposit log construction uses:
- A hardcoded event topic hash (line 62)
- Manual ABI encoding for the log data (lines 55-57)
While the encoding logic is correct (left-padding address and amount to 32 bytes each), this approach has maintenance risks:
- Magic hash: The topic
0xe1fffcc4...should matchkeccak256("Deposit(address,uint256)")but there's no explicit verification- Fragility: If the event signature changes, this test will silently break
- Duplication risk: Other tests constructing similar logs must repeat this logic
Consider creating a test helper:
// In a test_helpers.go file const depositEventSignature = "Deposit(address,uint256)" var depositEventTopic = ethcommon.HexToHash("0xe1fffcc4923d04b559f4d29a8bfc6cda04eb5b0d3c460751c2402c5c5cc9109c") func makeDepositLog(escrowAddr, recipient ethcommon.Address, amount *big.Int) ethtypes.Log { var data [64]byte copy(data[32-len(recipient.Bytes()):32], recipient.Bytes()) copy(data[64-len(amount.Bytes()):], amount.Bytes()) return ethtypes.Log{ Address: escrowAddr, Topics: []ethcommon.Hash{depositEventTopic}, Data: data[:], } }Then verify the topic in a test:
func TestDepositEventSignature(t *testing.T) { hash := crypto.Keccak256Hash([]byte(depositEventSignature)) require.Equal(t, depositEventTopic, hash) }extensions/tn_vacuum/constants.go (1)
20-25: LGTM! Consider SSL for production.The database connection defaults are appropriate for local development. Ensure that production deployments override
DefaultSSLModeto enable SSL for secure database connections.extensions/tn_vacuum/runner.go (1)
26-28: Consider logging when mechanism is nil.Returning nil silently when
args.Mechanism == nilcould mask configuration issues. Consider at least logging a debug or info message before returning, especially since a logger is often available.Example:
func (r *Runner) Execute(ctx context.Context, args RunnerArgs) error { if args.Mechanism == nil { + logger := r.logger + if logger == nil { + logger = args.Logger + } + if logger != nil { + logger.Debug("vacuum runner: no mechanism configured; skipping execution") + } return nil }extensions/tn_digest/tn_digest.go (1)
147-204: LGTM!The periodic config reload logic is well-structured with appropriate nil-checks, block-height-based throttling, and conditional scheduler restart. The nested conditions at lines 190-199 handle the scheduler recreation correctly, though the logic could be slightly clearer.
Optional: Consider simplifying scheduler restart logic.
Lines 190-199 handle scheduler creation and restart in a nested way. For readability, you might extract the scheduler (re)start into a helper method.
Example:
} else if ext.IsLeader() { // ... service resolution code ... ext.restartSchedulerWithService(ctx, service) }Where
restartSchedulerWithServiceencapsulates the ensure/stop/start sequence.extensions/tn_vacuum/metrics/metrics.go (1)
30-50: Consider alternative OTEL availability check to avoid metric pollution.The test counter created at line 35 is never used or cleaned up, leaving
tn_vacuum.testregistered in the metrics namespace. This pollutes the metric space with an unused counter.Consider alternative approaches:
- Attempt to create the first real metric directly in
NewOTELMetricsand catch the error- Check if the meter provider is a no-op implementation before attempting metric creation
- If you must test, use a more specific test metric name like
tn_vacuum._init_probeto signal it's internalApply this diff to move the check into NewOTELMetrics:
func NewMetricsRecorder(logger log.Logger) MetricsRecorder { - // Try to get the global meter provider meter := otel.GetMeterProvider().Meter("github.com/trufnetwork/kwil-db/extensions/tn_vacuum") - - // Try to create a test metric to verify OTEL is functional - _, err := meter.Int64Counter("tn_vacuum.test") - if err != nil { - logger.Debug("OpenTelemetry not available, metrics disabled") - return NewNoOpMetrics() - } - - // OTEL is available, create real metrics recorder otelMetrics, err := NewOTELMetrics(meter, logger) if err != nil { - logger.Warn("failed to initialize OTEL metrics, falling back to no-op", "error", err) + logger.Debug("OpenTelemetry not available, metrics disabled", "error", err) return NewNoOpMetrics() } - logger.Info("OpenTelemetry metrics initialized successfully") return otelMetrics }This eliminates the test counter and relies on the actual metric creation to detect OTEL availability.
extensions/leaderwatch/leaderwatch.go (1)
164-170: Consider resetting the singleton instance for test isolation.The current implementation only clears
watchersandorder, but leaves the singleton instance andsync.Onceintact. This differs from the pattern used intn_vacuum'sResetForTest(extensions/tn_vacuum/extension.go lines 65-71), which resets theOnceand sets the instance tonil.If tests depend on a fully fresh extension instance (e.g., to test
InitializeExtensionregistration again), the current approach may not provide sufficient isolation.If full reset is desired, apply this diff:
func ResetForTest() { ext := getExtension() ext.mu.Lock() ext.watchers = make(map[string]*watcher) ext.order = nil ext.mu.Unlock() + extOnce = sync.Once{} + extInst = nil }If the current behavior is intentional (preserving the instance across test resets), consider adding a comment explaining why.
extensions/tn_vacuum/mechanism.go (2)
37-43: Consider secrets management for database passwords.DBConnConfig stores the password as a plain string, which means it cannot be zeroed from memory after use and may be visible in memory dumps or swap. While this pattern is common in Go database libraries, consider using a secrets management system (e.g., HashiCorp Vault, AWS Secrets Manager) or environment-based injection with short-lived credentials for production deployments.
22-27: RunRequest mixes generic and mechanism-specific fields.The
PgRepackJobsandPgRepackNoOrderfields are pg_repack-specific but sit alongside generic fields likeReasonandDB. While this works for the current single-mechanism implementation, consider a more extensible design if additional mechanisms are planned (e.g., aMechanismParams map[string]interface{}field or mechanism-specific request types).extensions/tn_vacuum/mechanism_repack.go (3)
110-111: Brittle output parsing: Counting log messages.The code counts occurrences of
"INFO: repacking table"in the output to determine tables processed. This parsing is fragile and will break if pg_repack changes its log format or localization. Consider:
- Parsing pg_repack's exit code or structured output if available
- Documenting the pg_repack version dependency
- Adding a fallback when the count is zero (currently treated as failure, but could be valid for empty databases)
113-118: Zero tables processed could be a false positive.The code treats zero tables processed as a failure (Line 115), which makes sense for detecting silent failures. However, this could be a false positive for:
- Empty databases
- Databases where all tables are excluded by pg_repack's internal logic
- Test environments
Consider whether this should be a warning instead of a failure, or add a configuration option to control this behavior.
169-191: SSL mode disabled: Insecure for production.Line 182 hardcodes
sslmode=disablein the connection string, which transmits credentials and data in cleartext over the network. This is acceptable for local development but poses a security risk in production.Consider:
- Making SSL mode configurable via
DBConnConfig- Defaulting to
sslmode=requireorsslmode=verify-fullfor production- Documenting the security implications in configuration examples
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (38)
compose.yaml(2 hunks)deployments/Dockerfile(2 hunks)docs/node-operator-guide.md(3 hunks)extensions/leaderwatch/constants.go(1 hunks)extensions/leaderwatch/leaderwatch.go(1 hunks)extensions/leaderwatch/leaderwatch_test.go(1 hunks)extensions/register.go(1 hunks)extensions/tn_digest/README.md(1 hunks)extensions/tn_digest/leader.go(0 hunks)extensions/tn_digest/leader_reload_test.go(6 hunks)extensions/tn_digest/tn_digest.go(3 hunks)extensions/tn_vacuum/README.md(1 hunks)extensions/tn_vacuum/config.go(1 hunks)extensions/tn_vacuum/constants.go(1 hunks)extensions/tn_vacuum/extension.go(1 hunks)extensions/tn_vacuum/mechanism.go(1 hunks)extensions/tn_vacuum/mechanism_repack.go(1 hunks)extensions/tn_vacuum/mechanism_repack_test.go(1 hunks)extensions/tn_vacuum/metrics/metrics.go(1 hunks)extensions/tn_vacuum/metrics/metrics_test.go(1 hunks)extensions/tn_vacuum/metrics/noop.go(1 hunks)extensions/tn_vacuum/metrics/otel.go(1 hunks)extensions/tn_vacuum/runner.go(1 hunks)extensions/tn_vacuum/state.go(1 hunks)extensions/tn_vacuum/tn_vacuum.go(1 hunks)extensions/tn_vacuum/vacuum_test.go(1 hunks)go.mod(1 hunks)node/exts/erc20-bridge/erc20/meta_extension_deposit_test.go(1 hunks)tests/extensions/erc20/erc20_bridge_admin_authz_test.go(1 hunks)tests/extensions/erc20/erc20_bridge_end_to_end_test.go(2 hunks)tests/extensions/erc20/erc20_bridge_epoch_test.go(1 hunks)tests/extensions/erc20/erc20_bridge_injection_test.go(2 hunks)tests/extensions/erc20/erc20_bridge_multi_instance_test.go(1 hunks)tests/extensions/erc20/erc20_bridge_transfer_actions_test.go(3 hunks)tests/extensions/erc20/erc20_bridge_transfer_test.go(1 hunks)tests/streams/utils/erc20/helper.go(1 hunks)tests/streams/utils/erc20/inject.go(2 hunks)tests/streams/utils/utils.go(1 hunks)
💤 Files with no reviewable changes (1)
- extensions/tn_digest/leader.go
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-20T12:22:14.250Z
Learnt from: outerlook
PR: trufnetwork/node#1113
File: internal/migrations/020-digest-init.sql:3-11
Timestamp: 2025-08-20T12:22:14.250Z
Learning: In the trufnetwork/node repository, for the digest_config table in internal/migrations/020-digest-init.sql, the maintainer prefers to leave the table without initial seed rows, allowing the extension to handle the empty state case at runtime rather than seeding default configuration values in the migration.
Applied to files:
extensions/tn_digest/README.md
📚 Learning: 2025-09-19T18:14:01.428Z
Learnt from: outerlook
PR: trufnetwork/node#1168
File: .github/workflows/publish-node-image.yaml:19-19
Timestamp: 2025-09-19T18:14:01.428Z
Learning: The ghcr.io/trufnetwork/tn-db image was never released, so no backwards compatibility or transitional measures are needed when renaming to ghcr.io/trufnetwork/node.
Applied to files:
compose.yaml
📚 Learning: 2025-09-19T18:59:51.942Z
Learnt from: outerlook
PR: trufnetwork/node#1168
File: tests/extensions/tn_digest/test_tn_digest.sh:15-15
Timestamp: 2025-09-19T18:59:51.942Z
Learning: In Docker Compose files, service names (like "tn-db:") can remain unchanged even when the underlying image is updated (e.g., from tn-db:local to ghcr.io/trufnetwork/node:local). Service names are internal network identifiers and don't need to match the actual image names.
Applied to files:
compose.yaml
🧬 Code graph analysis (24)
extensions/tn_digest/leader_reload_test.go (1)
extensions/tn_digest/internal/engine_ops.go (1)
NewEngineOperations(31-33)
extensions/tn_digest/tn_digest.go (3)
extensions/tn_digest/constants.go (2)
ExtensionName(4-4)DefaultDigestSchedule(5-5)extensions/leaderwatch/leaderwatch.go (2)
Register(149-162)Callbacks(13-17)extensions/tn_digest/extension.go (1)
GetExtension(51-62)
tests/extensions/erc20/erc20_bridge_transfer_test.go (2)
tests/streams/utils/erc20/inject.go (1)
InjectERC20Transfer(24-85)tests/extensions/erc20/common_test.go (5)
TestChain(18-18)TestEscrowA(20-20)TestERC20(22-22)TestUserA(23-23)TestAmount2(26-26)
tests/streams/utils/erc20/helper.go (1)
tests/streams/utils/erc20/inject.go (1)
InjectERC20Transfer(24-85)
extensions/tn_vacuum/metrics/noop.go (1)
extensions/tn_vacuum/metrics/metrics.go (1)
MetricsRecorder(16-25)
extensions/tn_vacuum/metrics/metrics.go (2)
extensions/tn_vacuum/metrics/noop.go (1)
NewNoOpMetrics(13-15)extensions/tn_vacuum/metrics/otel.go (1)
NewOTELMetrics(31-103)
extensions/tn_vacuum/metrics/metrics_test.go (2)
extensions/tn_vacuum/metrics/noop.go (1)
NewNoOpMetrics(13-15)extensions/tn_vacuum/metrics/metrics.go (2)
ClassifyError(53-77)NewMetricsRecorder(30-50)
tests/extensions/erc20/erc20_bridge_injection_test.go (3)
tests/streams/utils/erc20/inject.go (1)
InjectERC20Transfer(24-85)tests/extensions/erc20/common_test.go (1)
TestExtensionAlias(19-19)tests/streams/utils/erc20/helper.go (1)
GetUserBalance(49-74)
tests/extensions/erc20/erc20_bridge_transfer_actions_test.go (1)
tests/extensions/erc20/common_test.go (4)
TestChain(18-18)TestERC20(22-22)TestUserA(23-23)TestAmount2(26-26)
extensions/tn_vacuum/tn_vacuum.go (5)
extensions/leaderwatch/leaderwatch.go (1)
InitializeExtension(47-54)extensions/tn_vacuum/constants.go (1)
ExtensionName(5-5)extensions/leaderwatch/constants.go (1)
ExtensionName(3-3)extensions/tn_vacuum/extension.go (1)
GetExtension(47-57)extensions/tn_vacuum/config.go (1)
LoadConfig(17-63)
extensions/tn_vacuum/mechanism_repack.go (2)
extensions/tn_vacuum/mechanism.go (5)
DBConnConfig(37-43)Mechanism(10-15)MechanismDeps(17-20)RunRequest(22-27)RunReport(29-35)extensions/tn_vacuum/constants.go (5)
StatusOK(29-29)StatusFailed(30-30)DefaultPostgresHost(22-22)DefaultPostgresPort(23-23)DefaultSSLMode(24-24)
extensions/tn_vacuum/config.go (1)
extensions/tn_vacuum/constants.go (4)
ExtensionName(5-5)ConfigKeyEnabled(15-15)ConfigKeyBlockInterval(16-16)ConfigKeyPgRepackJobs(17-17)
tests/extensions/erc20/erc20_bridge_multi_instance_test.go (2)
tests/streams/utils/erc20/inject.go (1)
InjectERC20Transfer(24-85)tests/extensions/erc20/common_test.go (4)
TestChain(18-18)TestERC20(22-22)TestUserA(23-23)TestAmount1(25-25)
extensions/register.go (3)
extensions/tn_vacuum/tn_vacuum.go (1)
InitializeExtension(13-23)extensions/tn_digest/tn_digest.go (1)
InitializeExtension(17-39)extensions/leaderwatch/leaderwatch.go (1)
InitializeExtension(47-54)
tests/extensions/erc20/erc20_bridge_epoch_test.go (1)
tests/streams/utils/erc20/inject.go (1)
InjectERC20Transfer(24-85)
extensions/leaderwatch/leaderwatch.go (3)
extensions/leaderwatch/constants.go (1)
ExtensionName(3-3)extensions/tn_vacuum/tn_vacuum.go (1)
InitializeExtension(13-23)extensions/tn_vacuum/extension.go (1)
ResetForTest(66-72)
extensions/tn_vacuum/vacuum_test.go (5)
extensions/tn_vacuum/mechanism.go (4)
RunRequest(22-27)MechanismDeps(17-20)RunReport(29-35)Mechanism(10-15)extensions/tn_vacuum/constants.go (5)
StatusOK(29-29)ExtensionName(5-5)ConfigKeyPgRepackJobs(17-17)ConfigKeyEnabled(15-15)ConfigKeyBlockInterval(16-16)extensions/tn_vacuum/extension.go (2)
ResetForTest(66-72)GetExtension(47-57)extensions/tn_vacuum/config.go (1)
Config(11-15)extensions/tn_vacuum/runner.go (2)
Runner(11-13)RunnerArgs(15-23)
extensions/leaderwatch/leaderwatch_test.go (1)
extensions/leaderwatch/leaderwatch.go (3)
ResetForTest(164-170)Register(149-162)Callbacks(13-17)
tests/extensions/erc20/erc20_bridge_admin_authz_test.go (2)
tests/streams/utils/erc20/inject.go (1)
InjectERC20Transfer(24-85)tests/extensions/erc20/common_test.go (5)
TestChain(18-18)TestEscrowA(20-20)TestERC20(22-22)TestUserA(23-23)TestAmount2(26-26)
tests/extensions/erc20/erc20_bridge_end_to_end_test.go (2)
tests/streams/utils/erc20/inject.go (1)
InjectERC20Transfer(24-85)tests/extensions/erc20/common_test.go (5)
TestChain(18-18)TestEscrowA(20-20)TestERC20(22-22)TestUserA(23-23)TestAmount1(25-25)
extensions/tn_vacuum/state.go (1)
extensions/tn_vacuum/mechanism.go (1)
DBConnConfig(37-43)
extensions/tn_vacuum/mechanism.go (1)
extensions/tn_vacuum/mechanism_repack.go (1)
NewPgRepackMechanism(25-27)
extensions/tn_vacuum/runner.go (2)
extensions/tn_vacuum/mechanism.go (3)
Mechanism(10-15)DBConnConfig(37-43)RunRequest(22-27)extensions/tn_vacuum/metrics/metrics.go (2)
MetricsRecorder(16-25)ClassifyError(53-77)
extensions/tn_vacuum/extension.go (4)
extensions/tn_vacuum/config.go (1)
Config(11-15)extensions/tn_vacuum/mechanism.go (3)
Mechanism(10-15)DBConnConfig(37-43)MechanismDeps(17-20)extensions/tn_vacuum/runner.go (2)
Runner(11-13)RunnerArgs(15-23)extensions/tn_vacuum/metrics/metrics.go (2)
MetricsRecorder(16-25)NewMetricsRecorder(30-50)
🪛 markdownlint-cli2 (0.18.1)
docs/node-operator-guide.md
494-494: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: acceptance-test
- GitHub Check: lint
🔇 Additional comments (116)
extensions/tn_digest/README.md (1)
32-32: LGTM! More appropriate default schedule for production.The change from a 10-minute interval to daily at 09:00 is more suitable for a "first-time setup" example. The cron expression is valid and aligns with typical digest/maintenance job patterns.
go.mod (1)
22-23: Approve kwil-db pseudo-version bumpCommit 372c7bcfcb2c exists in trufnetwork/kwil-db (adds Postgres repack & publish workflow for tn_vacuum) and no open breaking-change issues were found, so the pseudo-version bump is valid.
tests/streams/utils/utils.go (1)
1-2: LGTM! Build constraint added correctly.The
//go:build kwiltestconstraint properly gates these test utilities, consistent with the PR's test organization strategy.tests/extensions/erc20/erc20_bridge_admin_authz_test.go (1)
35-36: LGTM! Deposit injection updated to credit user directly.The change from
TestEscrowAtoTestUserAas the sender aligns with the test's goal of giving the user an initial balance. The synthetic Deposit log now represents a user depositing tokens for themselves, which is semantically correct for this authorization boundary test.tests/extensions/erc20/erc20_bridge_multi_instance_test.go (1)
43-44: LGTM! Instance isolation test updated correctly.The parameter change maintains the test's isolation goals by using the
escrowAparameter to target the correct instance while simulating a user self-deposit. The verification logic correctly confirms that onlyaliasAreceives the balance.tests/extensions/erc20/erc20_bridge_epoch_test.go (1)
34-34: LGTM! Epoch flow test updated to reflect user self-deposit.The parameter change correctly simulates an inbound deposit where the user deposits tokens for themselves, aligning with the comment on Line 33 and the subsequent reward verification logic.
tests/streams/utils/erc20/helper.go (1)
40-46: LGTM! Helper function updated to simulate user deposits correctly.The comment and code changes align well: the function now simulates a user depositing tokens for themselves via the RewardDistributor contract. The parameter change from
escrowAddrtouserAddr(Line 45) correctly models the deposit originating from the user rather than the escrow, which matches the Deposit event semantics in the injection logic.tests/extensions/erc20/erc20_bridge_transfer_test.go (1)
34-34: LGTM! Aligns with Deposit event semantics.The change correctly shifts from Transfer to Deposit event simulation, where the recipient (toHex) parameter determines who receives the credit. Having both fromHex and toHex as TestUserA is semantically valid for a deposit scenario where the user deposits and receives credit.
tests/extensions/erc20/erc20_bridge_end_to_end_test.go (2)
55-55: LGTM! Correctly implements Deposit event semantics.The injection now uses Deposit event structure where the recipient (TestUserA) receives the credit directly. The test flow correctly validates the balance after deposit.
137-137: LGTM! Deposit correctly credits the user before bridge operation.The change aligns with Deposit semantics. The test correctly deposits to TestUserA first, then validates custom recipient functionality in the subsequent bridge operation.
tests/extensions/erc20/erc20_bridge_transfer_actions_test.go (3)
34-34: LGTM! Properly uses configured escrow and Deposit semantics.The test correctly retrieves the configured escrow address and injects a deposit that credits TestUserA. The subsequent transfer validation logic remains sound.
125-125: LGTM! Correctly sets up partial balance for validation test.The deposit correctly credits TestUserA with a small amount to validate the insufficient balance error case. Test logic is sound.
174-174: LGTM! Correctly seeds initial balance for multi-transfer test.The deposit correctly credits userA with the initial 10 token balance needed for the subsequent transfer chain validation. Test flow is correct.
tests/extensions/erc20/erc20_bridge_injection_test.go (1)
70-97: Excellent test coverage for Deposit recipient semantics!This new test explicitly validates the key behavior of Deposit events: the recipient (toHex parameter) receives the credit, not the depositor (fromHex parameter). The test correctly asserts:
- Recipient balance equals the deposited value
- Depositor balance remains zero
This provides important regression protection for the Deposit semantics shift.
tests/streams/utils/erc20/inject.go (4)
23-23: LGTM! Improved documentation and input validation.The updated comment correctly describes the Deposit log semantics, and the added address validation provides good defensive programming to catch malformed addresses early in tests.
Also applies to: 35-40
52-55: LGTM! Correct ABI encoding for Deposit event data.The data encoding correctly implements Solidity ABI encoding for non-indexed parameters:
- address: 32 bytes, right-aligned (left-padded with zeros) ✓
- uint256: 32 bytes ✓
The
copy(recipientWord[32-len(to.Bytes()):], to.Bytes())correctly right-aligns the address.
67-67: LGTM! Metadata tag updated to reflect Deposit event.The metadata change from "e20trsnfr" to "rcpdepst" appropriately reflects the shift from Transfer to Deposit event semantics. This aids in test debugging and log identification.
47-50: Unable to verify Deposit event signature hash automatically. Automated tooling failed due to missing cryptographic backends—please manually confirm that keccak256("Deposit(address,uint256)") equals 0xe1fffcc4923d04b559f4d29a8bfc6cda04eb5b0d3c460751c2402c5c5cc9109c.node/exts/erc20-bridge/erc20/meta_extension_deposit_test.go (3)
1-2: LGTM: Build tag appropriately gates integration test.The
kwiltestbuild tag ensures this test only runs when explicitly enabled, which is appropriate for integration tests that require DB setup.
5-17: LGTM: Imports are appropriate and organized.All imports are necessary for the test implementation and properly organized.
20-50: LGTM: Test setup and initialization are well-structured.The test properly:
- Manages resources with deferred cleanup (DB close, transaction rollback)
- Uses Sepolia testnet chain ID (11155111) for realistic test context
- Employs test-friendly addresses with clear patterns (0x...aa, 0x...bb)
- Configures standard ERC20 decimals (18)
Test isolation is ensured via transaction rollback.
compose.yaml (2)
31-31: LGTM: Environment-based image tag substitution.The change from a hardcoded
:localtag to${TN_IMAGE:-local}enables flexible image selection across environments (dev, CI, production) while preserving the existing default behavior for local development.
40-42: LGTM: Formatting improvements.The comment and whitespace adjustments improve readability without changing functionality.
deployments/Dockerfile (3)
33-33: LGTM: Alpine 3.21 base image.The update to Alpine 3.21 is appropriate and aligns with current stable releases.
42-46: LGTM: PostgreSQL client tools and pg_repack binary installation.The installation of
postgresql16-clientand the copy of thepg_repackbinary from the builder stage correctly establishes the runtime environment needed for the tn_vacuum extension's pg_repack mechanism.
20-32: Verify pg_repack version and security posture
- 1.5.2 (released Dec 16, 2024) is the latest upstream release.
- No public CVEs target the upstream tarball; monitor vendor‐packaging advisories (e.g. CVE-2024-0985, CVE-2025-1094).
extensions/leaderwatch/constants.go (1)
1-3: LGTM: Extension name constant.The constant definition is clean and follows the naming conventions used by other extensions in the codebase.
extensions/register.go (2)
5-5: LGTM: New extension imports.The imports for
leaderwatchandtn_vacuumare correctly added.Also applies to: 8-8
12-15: No tn_digest dependency in tn_vacuum initialization Verified thatextensions/tn_vacuum/tn_vacuum.go’sInitializeExtensiononly registers precompile initializers and engine-ready hooks, with no imports or references totn_digestor its symbols; initialization order is safe.extensions/tn_vacuum/mechanism_repack_test.go (1)
5-39: Soft failure detection limited to version mismatches The function only returns an error when stderr contains “does not match database library,” so missing‐extension errors fall through and return nil—test case 2 is correct.extensions/tn_vacuum/README.md (1)
1-120: Documentation looks comprehensive and well-structured.The README provides clear guidance on:
- Configuration options with sensible defaults
- Platform-specific installation instructions
- Metrics for observability
- Common troubleshooting scenarios
The documentation aligns well with the implementation described in the PR context.
extensions/tn_digest/leader_reload_test.go (6)
144-155: LGTM: Proper service initialization added.The addition of
ext.SetService(app.Service)at line 150 ensures the extension has access to the service before testing leadership transitions. This follows the new pattern required by the leaderwatch integration.
157-170: LGTM: Consistent leadership event handling.The test correctly uses
digestLeaderAcquireto simulate leadership acquisition and verifies scheduler initialization. The service is properly set before the leadership event.
172-190: LGTM: Leadership loss scenario properly tested.The test correctly simulates:
- Acquiring leadership via
digestLeaderAcquire- Losing leadership via
digestLeaderLosewith a different leader identity- Proper cleanup with idempotent stop
192-215: LGTM: Configuration reload on leadership events tested.The test properly verifies that configuration reloads trigger scheduler start/stop based on enabled state changes. The use of fake DB to control reload behavior is appropriate.
217-238: LGTM: Disable scenario properly tested.The test correctly verifies that a configuration reload returning disabled state stops the scheduler. Service initialization and hook calls follow the established pattern.
240-253: LGTM: Network parameters-based leader detection verified.The test confirms that leadership is determined using
NetworkParameters.Leaderfrom the block context, which is the correct approach for the leaderwatch integration.docs/node-operator-guide.md (5)
57-57: LGTM: pg_repack dependency added to installation.The addition of
postgresql-16-repackto the Linux installation instructions is necessary for the tn_vacuum extension.
65-65: LGTM: macOS pg_repack installation documented.The Homebrew installation command correctly includes
pg_repackfor macOS users.
488-507: LGTM: Vacuum extension documentation is clear and concise.The new section:
- Clearly explains the purpose of the extension
- Provides quick-start configuration
- Notes that pg_repack is pre-installed in official images
- Links to detailed documentation
The configuration example shows sensible defaults (50000 block interval).
598-598: LGTM: Troubleshooting section updated for pg_repack.The installation instructions in the troubleshooting section are consistent with the prerequisites section.
604-610: LGTM: Cross-platform pg_repack installation covered.All major platforms (Ubuntu/Debian, CentOS/RHEL, macOS) have appropriate pg_repack installation instructions in the troubleshooting section.
extensions/leaderwatch/leaderwatch_test.go (4)
14-38: LGTM: Clean test helpers.The
makeBlockhelper andfakePubKeymock provide a lightweight way to construct block contexts for testing without pulling in heavy dependencies. The fake implementation correctly satisfies the required interfaces.
40-92: LGTM: Comprehensive leadership transition test.This test effectively verifies:
- Callback ordering (first watcher's callbacks fire before second watcher's)
- OnAcquire and OnEndBlock fire when leadership is acquired
- Only OnEndBlock fires when leadership is maintained
- OnLose and OnEndBlock fire when leadership is lost
The event tracking pattern clearly demonstrates the expected behavior.
94-99: LGTM: Duplicate registration properly prevented.The test confirms that attempting to register the same watcher name twice returns an error, preventing configuration mistakes.
101-106: LGTM: ResetForTest enables test isolation.The test verifies that
ResetForTest()clears internal state, allowing the same watcher name to be registered again. This is essential for test suite reliability.extensions/tn_vacuum/tn_vacuum.go (4)
13-23: LGTM: Standard extension initialization pattern.The
InitializeExtensionfunction follows the established pattern seen in the leaderwatch extension:
- Registers precompile initializer
- Registers engine_ready and end_block hooks
- Panics on registration failure (appropriate for initialization-time errors)
25-32: LGTM: Precompile initializer with safe service handling.The initializer correctly:
- Retrieves the singleton extension instance
- Safely checks for nil service before setting logger/service
- Returns an empty precompile (appropriate for extensions that don't provide SQL precompiles)
34-65: LGTM: Robust engine ready hook implementation.The hook demonstrates good practices:
- Defensive nil checks for app and service (lines 36-38, 41-43)
- Configuration loading with graceful degradation on error (lines 50-54)
- Configuration validation via
configure(lines 56-58)- Proper locking before starting worker (lines 60-62)
- Worker only started when extension is enabled (line 59)
67-77: LGTM: Clean end block hook implementation.The hook:
- Guards against nil block (lines 68-70)
- Updates service reference if available (lines 72-74)
- Delegates to
maybeRunfor vacuum scheduling logic (line 75)This separation of concerns keeps the hook lightweight and delegates business logic to the extension.
extensions/tn_vacuum/metrics/metrics_test.go (3)
13-23: LGTM!The test correctly verifies that all no-op metrics methods execute without panicking, which is the expected behavior for a no-op implementation.
25-84: LGTM!Comprehensive table-driven test with excellent coverage of all error classification branches. The test cases align well with the ClassifyError implementation patterns.
86-94: LGTM!The test appropriately validates that NewMetricsRecorder returns a functional metrics recorder, treating the OTEL-vs-NoOp fallback logic as an implementation detail. The smoke test via RecordVacuumStart ensures basic functionality.
extensions/tn_vacuum/state.go (7)
14-41: LGTM!The state persistence types and interface are well-designed. The
ext_schema prefix correctly ensures node-local state that won't affect consensus hashing. The interface provides a clean contract for pluggable persistence backends.
43-55: LGTM!The constructor correctly initializes a dedicated connection pool for state persistence with appropriate error wrapping. The separation of concerns (dedicated pool for state store) is a good design choice.
57-72: LGTM!The schema and table creation logic is correct and idempotent. Using
TIMESTAMPTZfor timestamps is a best practice, and the error wrapping provides clear diagnostics.
74-90: LGTM!The Load method correctly distinguishes between "not found" and "error" cases using the boolean return flag. The UTC normalization on Line 88 is a best practice for consistent time handling across time zones.
92-106: LGTM!The upsert logic correctly handles both insert and update cases for the singleton state row. Using parameterized queries protects against SQL injection, and updating
updated_aton both paths provides accurate audit tracking.
108-112: LGTM!The Close method correctly cleans up the connection pool with an appropriate nil check.
114-122: LGTM!The no-op state store correctly implements the interface with safe fallback behavior. Load returning
false(not found) is semantically correct for a disabled persistence backend.extensions/tn_vacuum/constants.go (4)
3-6: LGTM!The extension name constant is correctly defined with clear documentation of its usage.
8-11: LGTM!The block interval defaults are reasonable. A default of 50,000 blocks provides a sensible balance between vacuum frequency and overhead, while the minimum of 1 allows for testing and manual triggers.
13-18: LGTM!Configuration keys follow a consistent naming convention and are appropriately exported for use across the extension.
27-31: LGTM!The status constants are clear and appropriate for reporting vacuum operation results.
extensions/tn_vacuum/runner.go (2)
11-23: LGTM!Clean struct definitions with appropriate separation between the executor (Runner) and execution parameters (RunnerArgs).
34-78: LGTM!The execution flow is well-structured with appropriate nil-safety checks, optional metrics recording, and conditional report logging. Error classification via
metrics.ClassifyErrorensures consistent error categorization.extensions/tn_vacuum/config.go (4)
11-15: LGTM!Clear configuration structure for the vacuum extension.
17-27: LGTM!Safe default handling ensures the extension works with sensible values when configuration is absent.
29-60: LGTM!Field parsing is robust with appropriate error handling and bounds enforcement. The clamping logic for
PgRepackJobsand the min/default enforcement forBlockIntervalensure safe configuration.
65-75: LGTM!The boolean parsing logic is clear and handles empty strings as false, which is a sensible default for configuration parsing.
extensions/tn_digest/tn_digest.go (4)
12-12: LGTM!Clean integration with the new leaderwatch extension. The callback registration follows the same error-handling pattern as the existing hook registrations.
Also applies to: 28-38
101-103: LGTM!The no-op end-block hook maintains backward compatibility while delegating actual logic to the leaderwatch callbacks.
105-133: LGTM!The leader acquisition logic correctly sets the leader flag, validates prerequisites, and starts the scheduler only when all dependencies are available. The deferral path with debug logging is helpful for troubleshooting.
135-145: LGTM!Clean leader loss handling that properly clears state and stops the scheduler.
extensions/tn_vacuum/vacuum_test.go (15)
15-41: LGTM!The
stubMechanismimplementation correctly tracks preparation, runs, and closure calls. The deterministicRunReportdata is appropriate for testing.
43-74: LGTM!The test mechanisms (
failingMechanism,nilReportMechanism,errorRunMechanism) correctly model error conditions and edge cases needed for comprehensive test coverage.
76-109: LGTM!The
stubStateStoreeffectively simulates state persistence with configurable error injection and tracks all operations for verification.
111-123: LGTM!Test correctly verifies that disabling the extension skips mechanism preparation.
125-160: LGTM!Test validates the full lifecycle: engine readiness triggers mechanism preparation, and block-end hooks execute runs at the configured interval while skipping intermediate blocks.
162-177: LGTM!Test confirms that configuration failures properly leave the mechanism in a nil state, preventing invalid runs.
179-217: LGTM!Test validates that the enhanced
RunReportincludes mechanism name, status, duration, and tables processed.
219-255: LGTM!Test verifies that vacuum operations are correctly skipped when the block interval is not met, and that runs execute when the interval is satisfied.
257-307: LGTM!Test confirms that persisted state is loaded on startup and that subsequent runs respect the loaded last run height when evaluating the block interval.
309-357: LGTM!Test validates state persistence after successful runs and ensures duplicate heights don't trigger redundant saves.
359-376: LGTM!Test confirms the runner correctly handles nil reports, recording start and complete metrics with zero values.
378-408: LGTM!Test verifies that errors during run execution are recorded exactly once through metrics.
410-423: LGTM!Test validates that enqueuing a run when the worker is busy returns false.
425-464: LGTM!The
stubMetricsRecordercorrectly implements theMetricsRecorderinterface and tracks all metric recording calls for test verification.
466-485: LGTM!Helper functions provide clean synchronization primitives for testing asynchronous operations.
extensions/tn_vacuum/metrics/noop.go (1)
1-22: LGTM!The
NoOpMetricsimplementation correctly provides a zero-overhead fallback for theMetricsRecorderinterface when OpenTelemetry is unavailable.extensions/tn_vacuum/extension.go (9)
15-36: LGTM!The
Extensionstruct is well-designed with appropriate synchronization primitives and clear separation of concerns.
47-72: LGTM!Singleton management functions are correctly implemented with proper initialization, defensive nil checks, and test reset capability.
74-107: LGTM!Test helpers and accessors provide proper synchronization and allow dependency injection for testability.
109-139: LGTM!The worker lifecycle is correctly implemented with proper context cancellation, wait group coordination, and idempotency.
212-275: LGTM!The
processRunmethod correctly handles the execution lifecycle: guards against invalid state, executes the run, persists state, and updates metrics. Error handling is appropriate, and therunInProgressflag is properly managed.
277-298: LGTM!The
enqueueRunmethod correctly prevents duplicate runs by checking bothrunInProgressand queue length, with proper worker initialization.
300-324: LGTM!The
configuremethod correctly handles mechanism lifecycle: closes existing mechanism, stores new config, and prepares mechanism only when enabled.
326-377: LGTM!The
maybeRunmethod implements correct interval-based triggering logic with appropriate metrics recording for skipped runs.
379-420: LGTM!The
Closemethod correctly orchestrates shutdown: cancels worker context, closes resources, and waits for worker completion. Resource cleanup is properly ordered to prevent leaks.extensions/tn_vacuum/metrics/metrics.go (2)
14-25: LGTM: Clean interface design.The
MetricsRecorderinterface is well-structured with clear separation between vacuum operation metrics and resource metrics. Method signatures are appropriate and consistent.
52-77: Error classification approach is pragmatic for metric labels.The string-based error matching is fragile compared to typed error checks (e.g.,
errors.Is/errors.As), but it's a reasonable trade-off for generating low-cardinality metric labels from potentially wrapped or external process errors (e.g., frompg_repackbinary execution).If error classification becomes unreliable, consider defining sentinel errors for the most important cases and checking those first before falling back to string matching.
extensions/leaderwatch/leaderwatch.go (6)
13-17: LGTM: Callbacks struct is well-designed.The optional callback fields with consistent signatures provide flexibility for watchers to subscribe only to the events they care about.
37-45: LGTM: Singleton pattern correctly implemented.The
sync.Onceinitialization with fallback logger is appropriate. The logger is upgraded whenengineReadyHookprovidesapp.Service.Logger.
56-65: LGTM: Engine ready hook is safely implemented.The nil guards and mutex protection ensure thread-safe initialization of the extension's service and logger references.
67-135: LGTM: End-block hook correctly implements two-phase callback dispatch.The implementation properly:
- Holds the lock while updating watcher state
- Releases the lock before invoking callbacks to avoid deadlocks
- Preserves watcher ordering via the
orderslice- Always dispatches
OnEndBlockfor all watchersThis is a solid pattern for hook-based extensions.
137-147: LGTM: Leadership determination is correctly guarded.The comprehensive nil checks and identity comparison logic are appropriate. The defensive approach ensures the function returns
falsesafely in all edge cases.
149-162: LGTM: Watcher registration is correctly implemented.The function validates inputs, prevents duplicates, and maintains registration order under mutex protection. Clear error messages aid debugging.
extensions/tn_vacuum/metrics/otel.go (3)
12-28: LGTM: OTELMetrics struct uses appropriate metric types.The choice of counters for events, histograms for distributions, and a gauge for point-in-time state aligns with OpenTelemetry best practices. Clear organization with inline comments.
30-103: LGTM: Metric definitions follow OpenTelemetry conventions.The constructor properly initializes all metrics with:
- Consistent naming (tn_vacuum prefix)
- Clear descriptions
- Appropriate units following OTEL semantic conventions
- Early error returns for robustness
105-142: LGTM: Recording methods correctly implement the MetricsRecorder interface.All methods:
- Apply appropriate attributes for dimensionality
- Reuse attribute sets efficiently (e.g., line 114-120 in
RecordVacuumComplete)- Pass context through for potential tracing correlation
extensions/tn_vacuum/mechanism.go (1)
10-15: LGTM! Clean interface design.The Mechanism interface follows Go best practices with context-aware methods, clear separation of lifecycle phases (Prepare/Run/Close), and proper error handling.
extensions/tn_vacuum/mechanism_repack.go (5)
19-29: LGTM! Clean constructor and type definition.The
pgRepackMechanismtype properly encapsulates dependencies and the constructor returns the interface type for good abstraction.
124-132: LGTM! Error detection is extensible.The
detectPgRepackSoftFailurefunction uses a switch statement for easy addition of more error patterns in the future. The current version mismatch detection looks correct.
134-140: LGTM! Simple and effective error summarization.The function extracts the last line of stderr, which typically contains the most relevant error message.
142-147: LGTM! Simple cleanup.The Close method properly logs shutdown. No resources require explicit cleanup.
149-167: Verify no sensitive details leak via errors
There’s no direct logging ofconnStrhere, but confirm that the errors frompgx.Connectandconn.Execdon’t include the full DSN (especially passwords), or sanitize/redact sensitive fields inbuildConnString.
Time Submission Status
|
This commit enhances the `tn_vacuum` extension by refining error handling and logging mechanisms. Key changes include: - Updated the `initializeState` method to return errors instead of logging warnings, providing better error propagation. - Improved logging messages in the `pgRepackMechanism` to use error level for critical issues, ensuring visibility of failures. - Refactored the `RunnerArgs` and `RunRequest` structures for consistency in field ordering. - Added checks for database configuration in the `configure` method to enforce required settings. These improvements enhance the robustness and maintainability of the extension, ensuring clearer error reporting and better handling of critical conditions.
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
extensions/tn_vacuum/extension.go(1 hunks)extensions/tn_vacuum/mechanism.go(1 hunks)extensions/tn_vacuum/mechanism_repack.go(1 hunks)extensions/tn_vacuum/runner.go(1 hunks)extensions/tn_vacuum/tn_vacuum.go(1 hunks)extensions/tn_vacuum/vacuum_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- extensions/tn_vacuum/extension.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-30T17:45:46.448Z
Learnt from: outerlook
PR: trufnetwork/node#1175
File: extensions/tn_vacuum/mechanism_repack.go:65-74
Timestamp: 2025-09-30T17:45:46.448Z
Learning: In the tn_vacuum extension, DBConnConfig parameters (Database, Host, Port, User) are validated by kwil's configuration layer before reaching the pg_repack mechanism code, so additional validation in mechanism_repack.go is not required.
Applied to files:
extensions/tn_vacuum/mechanism_repack.go
🧬 Code graph analysis (5)
extensions/tn_vacuum/tn_vacuum.go (4)
extensions/leaderwatch/leaderwatch.go (1)
InitializeExtension(47-54)extensions/tn_vacuum/constants.go (1)
ExtensionName(5-5)extensions/tn_vacuum/extension.go (1)
GetExtension(46-56)extensions/tn_vacuum/config.go (1)
LoadConfig(17-63)
extensions/tn_vacuum/vacuum_test.go (5)
extensions/tn_vacuum/mechanism.go (4)
RunRequest(23-27)MechanismDeps(18-21)RunReport(29-35)Mechanism(11-16)extensions/tn_vacuum/constants.go (5)
StatusOK(29-29)ExtensionName(5-5)ConfigKeyPgRepackJobs(17-17)ConfigKeyEnabled(15-15)ConfigKeyBlockInterval(16-16)extensions/tn_vacuum/extension.go (2)
ResetForTest(65-71)GetExtension(46-56)extensions/tn_vacuum/config.go (1)
Config(11-15)extensions/tn_vacuum/runner.go (2)
Runner(11-13)RunnerArgs(15-22)
extensions/tn_vacuum/mechanism.go (1)
extensions/tn_vacuum/mechanism_repack.go (1)
NewPgRepackMechanism(25-27)
extensions/tn_vacuum/runner.go (2)
extensions/tn_vacuum/mechanism.go (3)
Mechanism(11-16)DBConnConfig(37-43)RunRequest(23-27)extensions/tn_vacuum/metrics/metrics.go (2)
MetricsRecorder(16-25)ClassifyError(53-77)
extensions/tn_vacuum/mechanism_repack.go (2)
extensions/tn_vacuum/mechanism.go (5)
DBConnConfig(37-43)Mechanism(11-16)MechanismDeps(18-21)RunRequest(23-27)RunReport(29-35)extensions/tn_vacuum/constants.go (5)
StatusOK(29-29)StatusFailed(30-30)DefaultPostgresHost(22-22)DefaultPostgresPort(23-23)DefaultSSLMode(24-24)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: acceptance-test
🔇 Additional comments (1)
extensions/tn_vacuum/mechanism.go (1)
45-66: Thread-safe factory override looks solid.The added
sync.RWMutexguard cleanly resolves the race onmechanismFactorywhile keeping the helper API unchanged. Nice tidy fix.
This commit improves the thread safety of the `tn_vacuum` extension by introducing `sync.RWMutex` locks in various structures. Key changes include: - Added mutex locks to the `stubMechanism` and `stubStateStore` types to protect shared state during concurrent access. - Refactored methods to use the new locking mechanisms, ensuring safe read and write operations. - Updated test assertions to utilize snapshot methods for verifying state without direct access to the underlying data. These enhancements ensure that the extension operates reliably in concurrent environments, reducing the risk of data races and improving overall stability.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
extensions/tn_vacuum/vacuum_test.go (3)
16-21: Resolved: stubMechanism races eliminated; accessors are concurrency-safe.Mutex + snapshot/getters address prior data races on runs/prepared/closeCnt. Looks good.
#!/bin/bash # Run tn_vacuum tests with the race detector go test -race ./extensions/tn_vacuum -run TestAlso applies to: 32-42, 51-78
113-123: Resolved: stubStateStore read/write races fixed.Locking in Ensure/Load/Save and read-only getters remove unsynchronized access flagged earlier. LGTM.
Also applies to: 125-155, 159-181
532-545: Resolved: stubMetricsRecorder synchronized.All mutating recorders are locked; assigning ext.metrics under ext.mu also avoids races. Solid.
Also applies to: 547-583
🧹 Nitpick comments (6)
extensions/tn_vacuum/vacuum_test.go (6)
584-608: Broaden metrics snapshot to include skipped and error type fields.Several recorder fields aren’t exposed via snapshot, limiting assertions (especially for “skipped” cases).
func (s *stubMetricsRecorder) snapshot() stubMetricsSnapshot { s.mu.RLock() defer s.mu.RUnlock() return stubMetricsSnapshot{ startCount: s.startCount, completeCount: s.completeCount, errorCount: s.errorCount, + skippedCount: s.skippedCount, lastDuration: s.lastDuration, lastTables: s.lastTables, lastMechanism: s.lastMechanism, lastErrorMechanism: s.lastErrorMechanism, + lastErrorType: s.lastErrorType, + lastSkipReason: s.lastSkipReason, lastHeight: s.lastHeight, } } type stubMetricsSnapshot struct { startCount int completeCount int errorCount int + skippedCount int lastDuration time.Duration lastTables int lastMechanism string lastErrorMechanism string + lastErrorType string + lastSkipReason string lastHeight int64 }
211-214: Use config key constants for consistency.These literals duplicate existing constants; prefer ConfigKeyEnabled/ConfigKeyBlockInterval for drift-proof tests.
- "enabled": "true", - "block_interval": "3", + ConfigKeyEnabled: "true", + ConfigKeyBlockInterval: "3", ConfigKeyPgRepackJobs: "2",
235-237: Avoid time.Sleep in negative assertions; use a polling helper to reduce flakiness.Replace fixed sleeps with a helper that verifies “no new runs occurred within duration”.
+// add once near other helpers +func requireNoNewRunsFor(t *testing.T, stub *stubMechanism, baseline int, d time.Duration) { + t.Helper() + deadline := time.Now().Add(d) + for time.Now().Before(deadline) { + if stub.runCount() > baseline { + t.Fatalf("expected no new runs for %s", d) + } + time.Sleep(10 * time.Millisecond) + } +} @@ - time.Sleep(50 * time.Millisecond) - require.Len(t, stub.runsSnapshot(), 1) + requireNoNewRunsFor(t, stub, 1, 50*time.Millisecond) @@ - time.Sleep(50 * time.Millisecond) - require.Len(t, stub.runsSnapshot(), 1, "should not run - interval not met") + requireNoNewRunsFor(t, stub, 1, 50*time.Millisecond) @@ - time.Sleep(50 * time.Millisecond) - require.Len(t, stub.runsSnapshot(), 0, "should not run before interval is met") + requireNoNewRunsFor(t, stub, 0, 50*time.Millisecond) @@ - time.Sleep(50 * time.Millisecond) - require.Equal(t, 1, store.saveCountValue(), "duplicate height should not persist again") + requireNoNewRunsFor(t, stub, 1, 50*time.Millisecond) // and keep the saveCount assertionAlso applies to: 340-342, 395-397, 455-457
292-303: Exercise the runner path instead of calling the mechanism directly.This test currently validates the stub’s return, not integration. Prefer runner.Execute and assert metrics reflect Duration/Tables/Mechanism.
metricsStub := &stubMetricsRecorder{} require.NoError(t, ext.runner.Execute(ctx, RunnerArgs{ Mechanism: stub, Logger: log.New(), Reason: "test", Metrics: metricsStub, })) snap := metricsStub.snapshot() require.Equal(t, "stub", snap.lastMechanism) require.Equal(t, 100*time.Millisecond, snap.lastDuration) require.Equal(t, 5, snap.lastTables)
305-347: Align test name with assertions or assert skipped metrics.“TestVacuumSkippedMetrics” doesn’t currently check metrics. Either rename to reflect interval behavior or assert skippedCount/skipReason via the metrics stub.
610-629: Consider testify’s Eventually to simplify polling helpers.You can replace waitForCondition with require.Eventually/assert.Eventually for concise polling.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
extensions/tn_vacuum/vacuum_test.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
extensions/tn_vacuum/vacuum_test.go (5)
extensions/tn_vacuum/mechanism.go (4)
RunRequest(23-27)MechanismDeps(18-21)RunReport(29-35)Mechanism(11-16)extensions/tn_vacuum/constants.go (5)
StatusOK(29-29)ExtensionName(5-5)ConfigKeyPgRepackJobs(17-17)ConfigKeyEnabled(15-15)ConfigKeyBlockInterval(16-16)extensions/tn_vacuum/extension.go (2)
ResetForTest(65-71)GetExtension(46-56)extensions/tn_vacuum/config.go (1)
Config(11-15)extensions/tn_vacuum/runner.go (2)
Runner(11-13)RunnerArgs(15-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: acceptance-test
…-postgres This commit updates all instances of the PostgreSQL image from `kwildb/postgres` to `ghcr.io/trufnetwork/kwil-postgres` across various configuration files and documentation. Key changes include: - Updated `compose.yaml`, `Taskfile.yml`, and multiple `docker-compose` files to use the new image. - Modified documentation to reflect the new image source and its configuration. - Adjusted scripts to ensure compatibility with the updated image. These changes ensure consistency in using the new PostgreSQL image for the TRUF.NETWORK project.
Description
leaderwatchextension to manage leader callbacks during block processing, including functionality for acquiring and losing leadership.tn_vacuumextension with various triggers (block interval, cron, manual, and digest coupled) to manage vacuum operations based on configurable parameters.leaderwatchandtn_vacuumfunctionalities to ensure reliability and correctness.Related Problem
How Has This Been Tested?
Summary by CodeRabbit
New Features
Improvements
Deployment
Documentation
Tests