floware init script#295
Conversation
- for creating db on startup
📝 WalkthroughWalkthroughA new ChangesFloware Container Init Script
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@wavefront/server/scripts/floware-init.sh`:
- Around line 11-16: The psycopg2.connect() call is missing the port and
connect_timeout parameters, which will cause failures on non-default PostgreSQL
deployments and potential indefinite hangs on network issues. Add the port
parameter by reading from the DB_PORT environment variable (similar to how
DB_HOST, DB_USERNAME, DB_PASSWORD, and DB_NAME are retrieved) and pass it to the
psycopg2.connect() function. Additionally, add a connect_timeout parameter with
a reasonable bounded value (such as 10 seconds) to prevent the connection from
hanging indefinitely if there are network connectivity problems.
- Around line 19-23: The database creation logic at lines 19-23 has a TOCTOU
vulnerability where concurrent startups can both see the database as missing,
causing one to fail when executing CREATE DATABASE. Replace the
check-then-create pattern (the cur.fetchone() conditional) with a try-except
block that directly attempts to create the database and catches
psycopg2.errors.DuplicateDatabase when another process creates it concurrently.
Additionally, replace the unsafe string concatenation with dbname.replace() with
psycopg2.sql.Identifier() to safely parameterize the database identifier,
eliminating the manual escaping logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a1402e1d-a097-4c2b-bee8-49a5fcc62863
📒 Files selected for processing (2)
wavefront/server/docker/floware.Dockerfilewavefront/server/scripts/floware-init.sh
| host = os.environ['DB_HOST'] | ||
| user = os.environ['DB_USERNAME'] | ||
| password = os.environ['DB_PASSWORD'] | ||
| dbname = os.environ['DB_NAME'] | ||
|
|
||
| conn = psycopg2.connect(host=host, user=user, password=password, dbname='postgres') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "floware-init.sh" -type fRepository: rootflo/wavefront
Length of output: 104
🏁 Script executed:
cat -n ./wavefront/server/scripts/floware-init.sh | head -30Repository: rootflo/wavefront
Length of output: 1070
🏁 Script executed:
rg "DB_PORT" --type-list | head -20Repository: rootflo/wavefront
Length of output: 451
🏁 Script executed:
rg "DB_PORT" -A 2 -B 2Repository: rootflo/wavefront
Length of output: 4821
🏁 Script executed:
rg "connect_timeout" -A 2 -B 2Repository: rootflo/wavefront
Length of output: 1829
🏁 Script executed:
cat -n ./wavefront/server/packages/flo_cloud/flo_cloud/postgres/postgres.py | grep -A 5 -B 5 "connect_timeout"Repository: rootflo/wavefront
Length of output: 448
Use DB_PORT and a bounded connect timeout in the admin connection.
The psycopg2.connect() call on line 16 is missing both port and connect_timeout parameters. Since DB_PORT is consumed by the runtime configuration across the codebase (alembic, celery workers, config.ini files), the init script will fail on non-5432 deployments. Additionally, without a bounded connect_timeout, startup can hang indefinitely if there are network issues.
Suggested patch
import psycopg2, os
host = os.environ['DB_HOST']
+port = os.environ.get('DB_PORT', '5432')
user = os.environ['DB_USERNAME']
password = os.environ['DB_PASSWORD']
dbname = os.environ['DB_NAME']
+connect_timeout = int(os.environ.get('DB_CONNECT_TIMEOUT', '5'))
-conn = psycopg2.connect(host=host, user=user, password=password, dbname='postgres')
+conn = psycopg2.connect(
+ host=host,
+ port=port,
+ user=user,
+ password=password,
+ dbname='postgres',
+ connect_timeout=connect_timeout,
+)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| host = os.environ['DB_HOST'] | |
| user = os.environ['DB_USERNAME'] | |
| password = os.environ['DB_PASSWORD'] | |
| dbname = os.environ['DB_NAME'] | |
| conn = psycopg2.connect(host=host, user=user, password=password, dbname='postgres') | |
| import psycopg2, os | |
| host = os.environ['DB_HOST'] | |
| port = os.environ.get('DB_PORT', '5432') | |
| user = os.environ['DB_USERNAME'] | |
| password = os.environ['DB_PASSWORD'] | |
| dbname = os.environ['DB_NAME'] | |
| connect_timeout = int(os.environ.get('DB_CONNECT_TIMEOUT', '5')) | |
| conn = psycopg2.connect( | |
| host=host, | |
| port=port, | |
| user=user, | |
| password=password, | |
| dbname='postgres', | |
| connect_timeout=connect_timeout, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@wavefront/server/scripts/floware-init.sh` around lines 11 - 16, The
psycopg2.connect() call is missing the port and connect_timeout parameters,
which will cause failures on non-default PostgreSQL deployments and potential
indefinite hangs on network issues. Add the port parameter by reading from the
DB_PORT environment variable (similar to how DB_HOST, DB_USERNAME, DB_PASSWORD,
and DB_NAME are retrieved) and pass it to the psycopg2.connect() function.
Additionally, add a connect_timeout parameter with a reasonable bounded value
(such as 10 seconds) to prevent the connection from hanging indefinitely if
there are network connectivity problems.
| cur.execute(\"SELECT 1 FROM pg_database WHERE datname = %s\", (dbname,)) | ||
|
|
||
| if not cur.fetchone(): | ||
| cur.execute('CREATE DATABASE \"' + dbname.replace('\"', '\"\"') + '\"') | ||
| print('Database created') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "floware-init.sh" -type fRepository: rootflo/wavefront
Length of output: 104
🏁 Script executed:
cat -n ./wavefront/server/scripts/floware-init.sh | head -40Repository: rootflo/wavefront
Length of output: 1143
🏁 Script executed:
python3 << 'PY'
import psycopg2
from psycopg2 import errors
# Check if DuplicateDatabase exists in psycopg2.errors
print("DuplicateDatabase available:", hasattr(errors, 'DuplicateDatabase'))
# Also check sql module availability
from psycopg2 import sql
print("sql.Identifier available:", hasattr(sql, 'Identifier'))
print("sql.SQL available:", hasattr(sql, 'SQL'))
# Verify the exception inheritance
if hasattr(errors, 'DuplicateDatabase'):
print("DuplicateDatabase base classes:", errors.DuplicateDatabase.__bases__)
PYRepository: rootflo/wavefront
Length of output: 182
🌐 Web query:
psycopg2 DuplicateDatabase exception psycopg2.errors
💡 Result:
The psycopg2.errors.DuplicateDatabase exception is a specific exception class provided by the psycopg2 library to handle the PostgreSQL SQLSTATE error 42P04 [1][2]. This error occurs when you attempt to execute a CREATE DATABASE statement for a database that already exists [1]. Because PostgreSQL does not support a native IF NOT EXISTS clause for CREATE DATABASE, this exception is commonly used to gracefully handle such attempts in application code [1][3]. Key details: - Module: psycopg2.errors [2][4] - PostgreSQL SQLSTATE: 42P04 [1][5] - Exception Hierarchy: It is a subclass of ProgrammingError [2][6]. Because it inherits from standard DB-API exception classes, catching the broader ProgrammingError will also catch DuplicateDatabase, ensuring backward compatibility with legacy code [7][4]. - Usage Example: import psycopg2 from psycopg2 import errors #... (connection setup) try: with conn.cursor as cur: cur.execute("CREATE DATABASE my_database") except errors.DuplicateDatabase: # Handle the case where the database already exists pass The exception is non-retriable, meaning the statement is immediately aborted, though the database connection remains open and usable [1].
Citations:
- 1: https://pulse.support/kb/postgresql-duplicate-database-42p04
- 2: https://www.psycopg.org/docs/errors.html
- 3: https://stackoverflow.com/questions/44511958/python-postgresql-create-database-if-not-exists-is-error
- 4: https://docs.pgedge.com/psycopg2/v2-9-12/errors/
- 5: https://github.com/psycopg/psycopg2/blob/dcb30249/lib/errorcodes.py
- 6: https://cgi.cse.unsw.edu.au/~cs3311/24T1/exams/23T3/psycopg2/html/errors.html
- 7: https://access.crunchydata.com/documentation/psycopg2/latest/errors.html
🌐 Web query:
psycopg2 sql.Identifier sql.SQL module documentation
💡 Result:
The psycopg2.sql module provides utilities for generating dynamic SQL queries in a safe and convenient manner, specifically for parts of queries—such as table or column names—that cannot be parameterized using standard execute arguments [1][2]. sql.SQL The sql.SQL class represents a literal snippet of an SQL query. It is primarily used to create a template with {} placeholders into which variable parts (such as Identifiers or Literals) can be merged using the format method [1][3]. Key characteristics: - It acts as a Composable object [2]. - The string passed to SQL is not escaped, meaning it should contain only constant SQL code or template structures, never raw user-provided data [1][3]. - It supports join and format methods to build complex queries dynamically [1][3]. Example: query = sql.SQL("SELECT * FROM {}").format(sql.Identifier('my_table')) sql.Identifier The sql.Identifier class represents a PostgreSQL identifier, such as a table name or a column name. It ensures that the identifier is correctly escaped (e.g., by double-quoting) to prevent SQL injection and handle reserved keywords or special characters properly [1][4]. Key characteristics: - It can accept one or more strings to represent a qualified name (a dot-separated sequence of identifiers), such as schema.table [1][4][5]. - Since version 2.8, it supports passing multiple strings directly to the constructor [5][2]. Example: # Represents "schema"."table" table = sql.Identifier('schema', 'table') Together, these objects allow you to maintain query structure while safely injecting dynamic identifiers, keeping them separate from query parameters (which use %s placeholders) [1][5]. top_results: [1][4][5][3][2]
Citations:
- 1: https://www.psycopg.org/docs/sql.html
- 2: https://docs.pgedge.com/psycopg2/development/sql/
- 3: https://github.com/psycopg/psycopg2/blob/master/lib/sql.py
- 4: https://access.crunchydata.com/documentation/psycopg2/latest/sql.html
- 5: https://github.com/psycopg/psycopg2/blob/master/doc/src/sql.rst
🌐 Web query:
PostgreSQL CREATE DATABASE IF NOT EXISTS clause support
💡 Result:
PostgreSQL does not natively support an IF NOT EXISTS clause for the CREATE DATABASE statement [1][2][3]. As of June 2026, the official PostgreSQL documentation for the CREATE DATABASE command does not include this functionality [4][5][6]. Because CREATE DATABASE cannot be executed within a transaction block, you cannot simply wrap it in a standard PL/pgSQL DO block with exception handling to achieve this behavior directly [2]. However, you can simulate this functionality using one of the following approaches: 1. Using psql with \gexec: You can use the psql meta-command \gexec to conditionally execute the CREATE DATABASE command by checking the pg_database system catalog [2][3]: SELECT 'CREATE DATABASE my_database' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'my_database')\gexec 2. Using an external shell script: You can check for the database's existence via shell commands before attempting to create it [3]: psql -U postgres -tc "SELECT 1 FROM pg_database WHERE datname = 'my_database'" | grep -q 1 || psql -U postgres -c "CREATE DATABASE my_database" 3. Using dblink (for internal procedural logic): If you need to perform this within a stored procedure, you can use the dblink extension to execute the command outside of the current transaction [2]: CREATE EXTENSION IF NOT EXISTS dblink; DO $$ BEGIN PERFORM dblink_exec('', 'CREATE DATABASE my_database'); EXCEPTION WHEN duplicate_database THEN RAISE NOTICE 'Database already exists, skipping.'; END $$;
Citations:
- 1: https://www.dbvis.com/thetable/create-database-in-postgresql-a-complete-guide/
- 2: https://stackoverflow.com/questions/18389124/simulate-create-database-if-not-exists-for-postgresql
- 3: https://zaiste.net/databases/postgresql/howtos/create-database-if-not-exists/
- 4: https://www.postgresql.org/docs/current/sql-createdatabase.html
- 5: https://www.postgresql.org/docs/18/sql-createdatabase.html
- 6: https://www.postgresql.org/docs/17/sql-createdatabase.html
Address the database creation race condition.
Lines 19-23 implement a TOCTOU (time-of-check-time-of-use) vulnerability. Concurrent container startups can both see the database as missing; one will fail on CREATE DATABASE, causing avoidable failure.
Use try-except with psycopg2.errors.DuplicateDatabase instead of the check-then-create pattern. This approach also improves security by replacing string concatenation with sql.Identifier() for safe identifier parameterization.
Suggested patch
-import psycopg2, os
+import psycopg2, os
+from psycopg2 import errors, sql
@@
-cur.execute(\"SELECT 1 FROM pg_database WHERE datname = %s\", (dbname,))
-
-if not cur.fetchone():
- cur.execute('CREATE DATABASE \"' + dbname.replace('\"', '\"\"') + '\"')
- print('Database created')
-else:
- print('Database already exists, skipping')
+try:
+ cur.execute(sql.SQL("CREATE DATABASE {}").format(sql.Identifier(dbname)))
+ print("Database created")
+except errors.DuplicateDatabase:
+ print("Database already exists, skipping")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@wavefront/server/scripts/floware-init.sh` around lines 19 - 23, The database
creation logic at lines 19-23 has a TOCTOU vulnerability where concurrent
startups can both see the database as missing, causing one to fail when
executing CREATE DATABASE. Replace the check-then-create pattern (the
cur.fetchone() conditional) with a try-except block that directly attempts to
create the database and catches psycopg2.errors.DuplicateDatabase when another
process creates it concurrently. Additionally, replace the unsafe string
concatenation with dbname.replace() with psycopg2.sql.Identifier() to safely
parameterize the database identifier, eliminating the manual escaping logic.
Summary by CodeRabbit
FLOWARE_DB_CREATE=trueto automatically create the Floware Postgres database on container start if it doesn't exist.