-
Notifications
You must be signed in to change notification settings - Fork 202
cherrypick from main to REL_2_STABLE #1547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: REL_2_STABLE
Are you sure you want to change the base?
Conversation
At first, we use relfilenode to construct the directory table's directory path. However, relfilenode may be different between QD and QE, this will cause the directory table's file paths are different which will be confused in using. As a result, we will use relid that is same between QD and QE to construct the directory table's file path in this commit. Authored-by: Zhang Wenchao zhangwenchao@apache.org
Change the `BUILD_GBENCH` to OFF by default, we should not build the google bench in release mode.
For ORCA, it will prune the column reference(CColRef) which marked as "UNUSED". The producer and consumer output column reference of CTE will be marked as used, if current CTE have not been inlined. We observed that some SQL statements from our customers used CTE, but some of output columns were missing not used outside the consumer. Therefore, the current PR prunes the output columns that are not used by both the producer and the consumer.
In previous commits, we addressed the issue of "column references not used by producer or consumer." However, in practical cases, if a producer's output column is used as a GROUP BY key, JOIN key, etc., by downstream operators (within CTEs), the corresponding column can not be pruned. This commit adopts a new approach to resolve the issue: 1. We mark columns used in the consumer(Referenced by upper-level operators) and notify the corresponding producer. 2. The producer no longer marks its own output columns as used. Instead, it finalizes its output columns (i.e., the union of all columns required by consumers). 3. Since `NodeSharedScan` does not perform column projection, we align the output columns in consumers (which are guaranteed to be a subset of the producer’s output columns) after the producer finalizes its output. Note: Even if NodeSharedScan supported column projection, it would not improve performance but complicate certain cases. Thus, column projection was intentionally not implemented in `NodeSharedScan`.
After upgrading to GPDB7, we lost the detailed cdb executor instruments of hashagg since the code base has been changed in a large size. The patch is to re-implement explain analyze related code of hashagg to provide more critical info for troubleshooting issues.
…he#1159) In ExecRefreshMatView(), the pointer viewQuery may be invalidated when make_new_heap_with_colname() closes the old heap via table_close(OldHeap, NoLock). This happens because table_close() can release relcache entries and associated rules, including rd_rules, which stores the viewQuery. To avoid accessing a possibly freed memory region, make a deep copy of viewQuery before invoking make_new_heap_with_colname(), and use this copy as dataQuery for further processing. Fix: ERROR: unrecognized node type: 2139062143 (copyfuncs.c:7663) Author: Jianghua Yang
* Fix segfault when recovering 2pc transaction When promoting a mirror segment due to failover we have seen a stacktrace like this: ``` "FATAL","58P01","requested WAL segment pg_xlog/00000023000071D50000001F has already been removed",,,,,,,0,,"xlogutils.c",580,"Stack trace: 1 0x557bdb9f09b6 postgres errstart + 0x236 2 0x557bdb5fc6cf postgres <symbol not found> + 0xdb5fc6cf 3 0x557bdb5fd021 postgres read_local_xlog_page + 0x191 4 0x557bdb5fb922 postgres <symbol not found> + 0xdb5fb922 5 0x557bdb5fba11 postgres XLogReadRecord + 0xa1 6 0x557bdb5e7767 postgres RecoverPreparedTransactions + 0xd7 7 0x557bdb5f608b postgres StartupXLOG + 0x2a3b 8 0x557bdb870a89 postgres StartupProcessMain + 0x139 9 0x557bdb62f489 postgres AuxiliaryProcessMain + 0x549 10 0x557bdb86d275 postgres <symbol not found> + 0xdb86d275 11 0x557bdb8704e3 postgres PostmasterMain + 0x1213 12 0x557bdb56a1f7 postgres main + 0x497 13 0x7fded4a61c87 libc.so.6 __libc_start_main + 0xe7 ``` Note: stacktrace is from one of our production GP clusters and might be slightly different from what we will see in cloudberry, but the failure is still present here as well. Testcase proves it. PG13 and PG14 have a fix for this bug, but it's doesn't have any test case and looks like we didn't cherry-pick that far. The discussion can be found here: https://www.postgresql.org/message-id/flat/743b9b45a2d4013bd90b6a5cba8d6faeb717ee34.camel%40cybertec.at In a few words, StartupXLOG() renames the last wal segment to .partial but tries to read it by the old name later in RecoverPreparedTransactions(). The fix is mostly borrowed from PG14 postgres/postgres@f663b00 with some cloudberry-related exceptions. Also added a regression test which segfaults without this fix for any version of GP, PG<=12 or Cloudberry. * Add stable ordering to testcase --------- Co-authored-by: Jianghua.yjh <yjhjstz@gmail.com>
The mutable checks are redundant at this stage since we already perform this validation during gp_matview_aux entity insertion. Authored-by: Zhang Mingli avamingli@gmail.com
As the comments said: items in this file should be ordered. Authored-by: Zhang Mingli avamingli@gmail.com
Replace hardcoded `/usr/bin/bash` path with `/usr/bin/env bash` in gpAux/gpdemo/Makefile to improve portability across different systems. This change ensures the demo cluster can be created regardless of where bash is installed on the system. Will return the following errors when testing under Ubuntu 18.04 container: ``` gpadmin@cdw:~/cloudberry$ make create-demo-cluster -C ~/cloudberry make: Entering directory '/home/gpadmin/cloudberry' make -C gpAux/gpdemo create-demo-cluster make[1]: Entering directory '/home/gpadmin/cloudberry/gpAux/gpdemo' make[1]: /usr/bin/bash: Command not found ``` This change makes the demo environment more robust across different Linux distributions, container environments, and custom installations.
This commit enhances the AQUMV system by enabling it to compute queries
directly from materialized views that already contain a GROUP BY clause.
This improvement allows us to bypass additional GROUP BY operations
during query execution, resulting in faster and more efficient
performance.
For example, with a materialized view defined as follows:
```sql
CREATE MATERIALIZED VIEW mv_group_1 AS
SELECT c, b, COUNT(b) AS count_b FROM t0 WHERE a > 3 GROUP BY c, b;
```
An original query like:
```sql
SELECT COUNT(b), b, c FROM t0 WHERE a > 3 GROUP BY b, c;
```
is rewritten to:
```sql
SELECT count_b, b, c FROM mv_group_1;
```
The plan looks like:
```sql
explain(costs off, verbose)
select count(b), b, c from t0 where a > 3 group by b, c;
QUERY PLAN
---------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: count, b, c
-> Seq Scan on aqumv.mv_group_1
Output: count, b, c
Settings: enable_answer_query_using_materialized_views = 'on',
optimizer = 'off'
Optimizer: Postgres query optimizer
(6 rows)
```
The two SQL queries yield equivalent results, even though the selected
columns are in a different order. Since mv_group_1 already contains the
aggregated results and all rows have a column a value greater than 3,
there is no need for additional filtering or GROUP BY operations.
This enhancement eliminates redundant computations, leading to
significant time savings. Fetching results directly from these views
reduces overall execution time, improving responsiveness for complex
queries. This is particularly beneficial for large datasets, allowing
efficient data analysis without performance degradation.
The feature also applies to Dynamic Tables and Incremental Materialized
Views.
Authored-by: Zhang Mingli avamingli@gmail.com
This commit addresses scenarios where certain WHERE conditions
are subsets of others. For example, in WHERE A AND B, if condition
A is a superset of B, we can direct prune the condition A.
For most queries, eliminating redundant WHERE conditions to avoids
unnecessary inner joins when subqueries exist in the WHERE clause.
Due to the complexity of SQL, we can only target some simple cases
as superset. And only when subqueries exist, deduplication can
achieve actual performance improvement.
In this commit, the selection for superset is: subqueries without
`GROUP BY, WHERE, ORDER BY, LIMIT` as r-value. And the selection
for subset is: WHERE conditions containing the same l-value as
superset (or superset's l-value as joinkey with swapped values)
and the subquery with `GROUP BY, WHERE, ORDER BY, LIMIT` as l-values.
Of course, if the same supersets exist, it will also be pruned.
Example:
```
the schema: t1(v1, v2), t2(v3, v4)
The query:
select * from t1,t2
where
t1.v1 in (select v3 from t2)
and
t1.v1 in (select v3 from t2 where v3 < 100);
equivalent to:
select * from t1,t2
where
t1.v1 t1.v1 in (select v3 from t2 where v3 < 100);
```
Previously, XLogCompressBackupBlock() called elog(ERROR) on internal compression failures (e.g., out-of-memory or ZSTD error). Since this function is invoked inside a critical section during WAL record assembly, any ERROR is promoted to PANIC, leading to an immediate backend crash. This patch replaces elog(ERROR) with elog(LOG) and a boolean return value. Callers now check the return status and fall back to storing the uncompressed full-page image if compression fails. This preserves robustness and aligns with the behavior introduced in PostgreSQL 18
This issue caused `AggRef` in the `targetlist` no longer share the same `aggno` and `aggtransno`, which means AggRef would be calculated repeatedly. ex. ``` select sum(a) <- aggref1 from any_table having sum(a) < 100; <- won't use the result of aggref1 to do the filter. ``` This problem was introduced when cloudberry cherry-picked PG14. But after cloudberry was open-sourced, the history of the changes had been merged, and this problem only exists in ORCA. In PG14, it no longer computes shared AggRef in `ExecInitAgg`, but moved the logic into the planner. However in ORCA, it only uses `idx++` to generate aggno and aggtransno. I do think that is a workaround logic. The current commit fixes this behavior by adopting the same logic as the planner. But it's worth noting that, in multi-stage agg, the final agg often gets an outer reference `Var` in AggRef, so for the final agg, it still cannot get the same aggno (this logic is consistent with greenplum; in the TPCDS testing, even i generating the same aggno for final agg, it didn't bring performance improvement). But if necessary, we could consider moving the logic from `dxltoplstmt` to the pre-processor, which could also reduce some output columns.
Due to Greenplum's MPP architecture, the postgres process needs to connect to databases, some symbols from libpq.so are compiled into the postgres binary as well. However, some symbols are not safe to be exported both in the postgres binary and the libpq.so library. E.g., some of the functions are using palloc()/pfree() to allocate/free memory in backend codes while using malloc()/free() in frontend codes. See issue #16219. This patch resolves the issue by hiding frontend symbols in the postgres binary, so that frontend symbols referenced in extensions will be resolved correctly. (cherry picked from commit b3ad725c12092ec8eec6017bc0c896db52845c17)
In ORCA, both Dynamic Scan and Append operators can serve as scan methods for partitioned tables. Historically, ORCA initially supported the Append operator before switching to Dynamic Any Scan, with several iterations of changes in between. In current CBDB versions, Dynamic Scan is the default partitioned table scan operator. However, the vectorized executor in CBDB (closed-source) cannot support Dynamic Scan due to its multi-threaded architecture, which prohibits metadata reading during scanning. This commit reintroduces the Append operator as an alternative. The GUC `optimizer_disable_dynamic_table_scan` (default: off) now allows choosing between Dynamic Scan and Append operators. Important considerations when using Append operator: 1. **Row-level security unsupported on root partition table**: Append cannot be used with row-level security on partitioned tables. ORCA applies row-level security at the final stage, making it impossible to determine the root partition after generating Append plans. Dynamic Scan preserves root partition information, enabling runtime application of row-level security to child partitions. 2. **No index-only scan support**: This is a CBDB executor limitation. Only Dynamic Scan supports index-only scans for partitioned tables.
The compiler will complain or report errors for 1. compares between signed and unsigned integers 2. assigns struct by brace list in C++
…is specified When building with --enable-pax, the build now checks for the presence of protobuf (version >= 3.5.0) using pkg-config. If protobuf is not found, configure will fail with an appropriate error message. This ensures that missing dependencies are caught early in the build process.
This routine originally introduced in commit ed64982, but was later removed for unknown or unintended reasons. This change brings back that portion of the code to ensure correct behavior in the relevant path generation logic.
…upport When USE_ZSTD is not defined, XLogCompressBackupBlock() always returns false, indicating that WAL page compression did not succeed. The existing elog(LOG, "WAL compression failed, using uncompressed image") message may be misleading in such builds, as compression is not even attempted. Remove the log message to avoid confusion in builds without ZSTD support.
Postgres UPSTREAM does not support parallel DISTINCT processing since
DISTINCT across multiple workers cannot be guaranteed. In MPP databases,
however, we can utilize Motion to redistribute tuples across multiple
workers within a parallel query.
For a DISTINCT query like:
select distinct a from t_distinct_0;
we can create a parallel plan based on the underlying node's Parallel
Scan on the table. The tuples are distributed randomly after the
Parallel Scan, even when the distribution key matches the target
expression.
The pre-distinct node uses Streaming HashAggregate or HashAggregate to
deduplicate some tuples in parallel, which are then redistributed
according to the DISTINCT expressions. Finally, a second-stage process
handles the DISTINCT operation.
QUERY PLAN
------------------------------------------------------------
Gather Motion 6:1 (slice1; segments: 6)
-> HashAggregate
Group Key: a
-> Redistribute Motion 6:6 (slice2; segments: 6)
Hash Key: a
Hash Module: 3
-> Streaming HashAggregate
Group Key: a
-> Parallel Seq Scan on t_distinct_0
Optimizer: Postgres query optimizer
(10 rows)
Parallel Group Aggregation is also supported:
explain(costs off)
select distinct a, b from t_distinct_0;
QUERY PLAN
-----------------------------------------------------------
GroupAggregate
Group Key: a, b
-> Gather Motion 6:1 (slice1; segments: 6)
Merge Key: a, b
-> GroupAggregate
Group Key: a, b
-> Sort
Sort Key: a, b
-> Parallel Seq Scan on t_distinct_0
Optimizer: Postgres query optimizer
(10 rows)
Authored-by: Zhang Mingli avamingli@gmail.com
* Check MergeAppend node in share input mutator Do recursive call in plan walker for MergeAppend type of plan. When planner decides to merge two sorted sub-plans one of which has Share Input Scan node, executor fails to execute this, because of wrongly aligned internal structures. It turns out, it was forgotten to do proper recursion call in shareinput tree walker
Skip current partial path if locus does not match. Authored-by: Zhang Mingli avamingli@gmail.com
During TPC-DS testing, we observed that compiling postgres with libpostgres.so introduces PTL function call overhead for some functions. By linking object files (*.o) directly instead, we achieved a 5-8% performance improvement in the TPC-DS 1TB benchmark. This commit added an option enable_link_postgres_with_shared to link libpostgres.so when compiling postgres, and The default value is false, just like greenplum, statically linking all object files when compiling postgres. Additionally, this update fixes a minor bug: the pax extension has a dependency on libpostgres.so. Now, when enabling the pax entension, we check that enable_shared_postgres_backend is set to 'yes' to ensure proper functionality. And the ic-cbdb-parallel test has been migrated to use the release version instead of the debug version. This change was made because running the test on the debug version caused disk space issues. When both libpostgres.so and postgres are compiled in the debug version, disk usage increases by several hundred megabytes compared to the release version. As a result, the ic-cbdb-parallel test failed due to insufficient disk space. By switching to the release version, this issue is resolved, and the test runs faster as well.
Previously, passing an empty string to the -U or --username option (e.g., `initdb -U ''`) would cause confusing errors during bootstrap, as initdb attempted to create a role with an empty name. This patch adds an explicit check for empty usernames and exits immediately with a clear error message. A test case is added to verify that initdb fails when -U is given an empty string.
- Extend runtime filter pushdown to DynamicSeqscan; - Update related executor nodes and headers; - Update regression tests
Some variable declaration to beginning of function to enforce C90 and long-standing coding practice of PostgreSQL. Also, be tidy, and release memory allocated in GetDatabasePath. This is not a real memory leak, as standby promotion happends only once (until next restart). Per coverity report 529246.
This commit fixes the disk space issue in tests by: - Adding Docker volume mounts to expose host directories to containers, add removing some pre-installed tools from host, including: - GitHub Actions tool cache - Android SDK, .NET SDK, Haskell (GHC + GHCup), Swift - PowerShell, Chromium, Miniconda, Azure CLI, and Scala BUild Tool - Cleaning up RPM artifacts and source tarballs after extraction in the test job to reclaim additional space This approach frees ~30G of disk space per job, ensuring sufficient space for build and test operations.
Also fix pgstattuple expected output.
The gpcheckresgroupv2impl script failed on segments when running `gpconfig -c gp_resource_manager -v "group-v2"`. Root cause: The validation script tried to connect to localhost:5432 on each host to retrieve gp_resource_group_cgroup_parent. However, segment hosts don't run the master database - causing "Connection refused" errors. Fix: - Retrieve gp_resource_group_cgroup_parent from master database in gpresgroup.py before dispatching validation commands; - Pass the cgroup_parent value to gpcheckresgroupv2impl via command line argument (--cgroup-parent); - Remove database connection logic from gpcheckresgroupv2impl;
Add shebang line to failover scripts to fix RPM build warnings. During RPM package build, rpmbuild reports warnings about executable files without shebang lines. This causes the build process to remove the executable bit from these scripts. The following scripts are fixed: - master_check_back.sh - standby_check_back.sh - segment_check_back.sh - segment_all_down.sh - docker_master_check_back.sh - docker_standby_check_back.sh - docker_segment_check_back.sh Use "#!/usr/bin/env bash" for better cross-platform compatibility. See: apache#1445
1. Remove explicit -Werror=pessimizing-move flag from CMakeLists.txt. This flag was added in commit e7e07c2 to catch pessimizing-move warnings on higher GCC versions, but it breaks compilation on GCC 8.x where this warning option does not exist. The fix is safe because GCC 9+ enables -Wpessimizing-move by default and the existing -Werror flag already converts all warnings to errors. 2. Fix fast_io.cc compatibility issues: - Add missing <unistd.h> include for pread() - Define uring_likely macro fallback for older liburing versions See: Issue#1441 <apache#1441>
Install rpm-build and rpmdevtools in the Rocky 8 build image, which can be used to build RPM package in the future Rocky 8 scheduled CI workflow.
Here we added liburing to Docker build file 98320cb But now pax.so also have runtime dependencies to liburing.so.2 xifos@localhost:~$ ldd /usr/cloudberry-db/lib/postgresql/pax.so | grep liburi liburing.so.2 => /lib/x86_64-linux-gnu/liburing.so.2 (0x000072fc7908a000) so we need to install it with cloudberry package Co-authored-by: Leonid Borchuk <xifos@qavm-f9b691f5.qemu>
Fix execution-time error when operators have oprcanhash=true but lack actual hash functions. Previously, ORCA would generate HashAgg plans that failed at runtime with "could not find hash function for hash operator" errors. This occurred when operators were marked with the 'hashes' option but only registered in btree operator families, not hash operator families. The fix adds validation using get_op_hash_functions() to ensure hash functions exist before allowing HashAgg plans, moving error detection from execution to planning time.
Similar to the free disk space actions for workflows in Rocky Linux, this commit adds the same feature to the workflows in Ubuntu. You can see apache#1511 for details.
Update copyright year range from 2024-2025 to 2024-2026 in: * NOTICE file * src/bin/psql/help.c (psql copyright display) This can help maintain compliance with ASF requirements. The changes are mininal and focused only on updating the year in copyright statements to reflect the current year.
* Add check for current year in NOTICE file copyright statement - Ensure copyright year is up-to-date (eg, 2024-$currentyear) * Add binary files detection in source tree - This check is inspired by Apache MADlib's rat-check script. - Check for common binary extensions (class, jar, tar, tgz, zip, exe, dll, so) * Improve workflow output and reporting - Add structured console output for both checks - Include check results in GitHub Actions job summary
* Use smgr-interface provided create_ao function. Extensible SMGR is one of usefull Cloudberry features that allow to hijack storage execution flow from extension. smgrcreate_ao however does not follow this memo. Fix that. * Add assertion and comment
Add branch protection rules for the REL_2_STABLE release branch to ensure all modifications must go through pull requests. Configuration added: * Require at least 2 approving reviews before merging * Require conversation threads to be resolved before merging This protects the release branch from direct pushes and enforces code review workflow for all changes.
* Extend workflow triggers to include `REL_2_STABLE` branch
* Modified 4 workflow files:
apache-rat-audit.yml
build-cloudberry.yml
build-dbg-cloudberry.yml
build-deb-cloudberry.yml
* Aligns test/build workflows between main and REL_2_STABLE
Please note maintainers must sync all the necessary commits from main to
REL_2_STABLE to keep workflows running successfully. This commit just
add the REL_2_STABLE as the target branch.
* ORCA: Fix memory leak in CWindowOids by adding destructor CWindowOids class was leaking three CMDIdGPDB objects (m_MDIdRowNumber, m_MDIdRank, m_MDDenseRank) that were allocated in the constructor but never released. Fixes ORCA unit test failures: - gporca_test_CXformTest - gporca_test_CConstExprEvaluatorDefaultTest
Also, while on it, beautify code to conform PG-style coding. Per coverity report 544476
Perl with higher version(e.g. 5.38.0) will produce different error message, compared with v5.34.0 We should handle these different versions.
The default value of join_collapse_limit was 20. When this value is set and the query contains about 20 joins (see added test), Postgres query optimizer cannot build a plan during hours and consumes a lot of memory, because the planner checks a lot of possible ways to join the tables. When join_collapse_limit is 8, the query plan is built in reasonable time.
Check IsSorted before Sort to reduce O(n log n) to O(n-1) comparisons for pre-sorted IN lists, improving ORCA optimization time.
`hot_standby/query_conflict' case
It's as next minor release 2.1.0 now since the core code baseline for 2.0.0 was frozen around early June. |
Done |
Fixed. |
leborchuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Now, we cannot Let me do some study. If anyone can help resolve this, welcome to leave your thoughts. |
GitHub cannot replay the commits in this PR due to some contexts changed - I guess. Some commits in this PR already existed in the REL_2_STABLE, so we can try to delete them from this PR, including: So that we can create a safe context for GitHub to make the @hw118118 PTAL. Thanks! Love to have more thoughts from other reviewers. |
cherrypick from commit: 35595db to commit: 103da7a
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions