Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/cd-api-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ jobs:
GOOGLE_CLIENT_ID_DEV: ${{ secrets.GOOGLE_CLIENT_ID_DEV }}
GOOGLE_CLIENT_SECRET_DEV: ${{ secrets.GOOGLE_CLIENT_SECRET_DEV }}
WIDGET_SECRET_KEY_DEV: ${{ secrets.WIDGET_SECRET_KEY_DEV }}
SLACK_WEBHOOK_URL_DEV: ${{ secrets.SLACK_WEBHOOK_URL_DEV }}
run: |
export HOSTNAME=$(hostname)
cd kokomen-backend/docker/dev
Expand Down
27 changes: 16 additions & 11 deletions .github/workflows/cd-api-prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,7 @@ jobs:
runs-on: [ self-hosted, prod, new ]

steps:
- name: Stop existing container
run: sudo docker rm -f kokomen-interview-api-prod

- name: Docker Image prune
run: sudo docker image prune -f -a

- name: pull docker compose yaml files
- name: Pull docker compose and deployment files
working-directory: /home/ubuntu
run: |
[ -d kokomen-backend ] || git clone --filter=blob:none --no-checkout https://github.com/samhap-soft/kokomen-backend.git
Expand All @@ -70,8 +64,14 @@ jobs:
- name: Docker Image pull
run: sudo docker pull samhap/kokomen-interview-api:prod

- name: Docker run
working-directory: /home/ubuntu
- name: Docker Image prune
run: sudo docker image prune -f

- name: Make deploy script executable
run: chmod +x /home/ubuntu/kokomen-backend/docker/prod/api/deploy.sh

- name: Run Blue-Green Deployment
working-directory: /home/ubuntu/kokomen-backend/docker/prod/api
env:
SPRING_DATASOURCE_URL_PROD: ${{ secrets.SPRING_DATASOURCE_URL_PROD }}
SPRING_DATASOURCE_USERNAME_PROD: ${{ secrets.SPRING_DATASOURCE_USERNAME_PROD }}
Expand All @@ -85,7 +85,12 @@ jobs:
GOOGLE_CLIENT_ID_PROD: ${{ secrets.GOOGLE_CLIENT_ID_PROD }}
GOOGLE_CLIENT_SECRET_PROD: ${{ secrets.GOOGLE_CLIENT_SECRET_PROD }}
WIDGET_SECRET_KEY_PROD: ${{ secrets.WIDGET_SECRET_KEY_PROD }}
SLACK_WEBHOOK_URL_PROD: ${{ secrets.SLACK_WEBHOOK_URL_PROD }}
run: |
export HOSTNAME=$(hostname)
cd kokomen-backend/docker/prod/api
sudo -E docker compose -f docker-compose-prod.yml up -d
sudo -E ./deploy.sh
Comment on lines 89 to +91

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

export와 명령어 할당을 분리하여 반환값 마스킹을 방지하세요.

Shellcheck SC2155: export HOSTNAME=$(hostname)hostname 명령의 반환값을 마스킹합니다. hostname이 실패해도 export의 성공으로 인해 $?가 0이 됩니다.

🔧 제안 수정
        run: |
-          export HOSTNAME=$(hostname)
+          HOSTNAME=$(hostname)
+          export HOSTNAME
          sudo -E ./deploy.sh
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run: |
export HOSTNAME=$(hostname)
cd kokomen-backend/docker/prod/api
sudo -E docker compose -f docker-compose-prod.yml up -d
sudo -E ./deploy.sh
run: |
HOSTNAME=$(hostname)
export HOSTNAME
sudo -E ./deploy.sh
🧰 Tools
🪛 actionlint (1.7.12)

[error] 89-89: shellcheck reported issue in this script: SC2155:warning:1:8: Declare and assign separately to avoid masking return values

(shellcheck)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cd-api-prod.yml around lines 89 - 91, Split the assignment
and export so the exit status of the hostname command isn't masked: first run
the hostname command and capture its result into HOSTNAME (and ensure you
propagate failure by checking the exit status or using set -e), then export the
HOSTNAME variable before running ./deploy.sh; update the workflow lines that
currently use export HOSTNAME=$(hostname) to perform assignment, error-checking,
and then export HOSTNAME so a failed hostname does not return a successful
status.


- name: Verify deployment
run: |
sleep 5
curl -sf http://localhost:80/actuator/health
1 change: 1 addition & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ dependencies {
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'ch.qos.logback.contrib:logback-json-classic:0.1.5'
implementation 'ch.qos.logback.contrib:logback-jackson:0.1.5'
implementation 'com.github.maricn:logback-slack-appender:1.6.1'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Is com.github.maricn:logback-slack-appender:1.6.1 compatible with Logback 1.5.x and Spring Boot 3.5.x?

💡 Result:

No, com.github.maricn:logback-slack-appender:1.6.1 is likely not fully compatible with Logback 1.5.x (used by Spring Boot 3.5.x). The original repository is archived since Feb 2024 with no updates since 2021, and the maintainer recommends the maintained fork https://github.com/cyfrania/logback-slack-appender, which explicitly supports Logback 1.3.x/1.4.x but notes Java 11+ for 1.4.x (Spring Boot 3.5.x requires Java 21+). The original appender depends on Logback via ${logback.version} (unspecified concrete version), but lacks confirmation of testing with Logback 1.5.x changes (e.g., supplier patterns). Potential configuration or API incompatibilities may arise, as seen in other Logback appenders with Spring Boot 3.x. Use the cyfrania fork (v1.2, 2024) for better compatibility.

Citations:


🏁 Script executed:

cat build.gradle

Repository: samhap-soft/kokomen-backend

Length of output: 1905


Spring Boot 3.5.3과 호환되지 않는 Logback appender 라이브러리 교체 필요

Line 57의 com.github.maricn:logback-slack-appender:1.6.1은 Spring Boot 3.5.3에서 사용하는 Logback 1.5.x와 호환되지 않습니다. 또한 원본 리포지토리는 2024년 2월부터 아카이브된 상태로 2021년 이후 업데이트가 없습니다. 로깅이 애플리케이션 초기화 경로에 직접 영향을 주므로 배포 시 런타임 오류가 발생할 수 있습니다.

대신 유지보수 중인 포크인 com.cyfrania:logback-slack-appender:1.2를 사용하세요. 이 버전은 최근에 업데이트되었으며 현재 스택과의 호환성이 보장됩니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build.gradle` at line 57, Replace the incompatible Logback Slack appender
dependency 'com.github.maricn:logback-slack-appender:1.6.1' with the maintained
fork 'com.cyfrania:logback-slack-appender:1.2' in the Gradle dependencies so the
app uses a Logback 1.5.x–compatible, actively maintained artifact; update the
dependency declaration where the current implementation line contains
'com.github.maricn:logback-slack-appender:1.6.1' to use
'com.cyfrania:logback-slack-appender:1.2' instead.


// PDF 텍스트 추출
implementation 'org.apache.pdfbox:pdfbox:3.0.3'
Expand Down
1 change: 1 addition & 0 deletions docker/dev/docker-compose-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ services:
GOOGLE_CLIENT_ID_DEV: ${GOOGLE_CLIENT_ID_DEV}
GOOGLE_CLIENT_SECRET_DEV: ${GOOGLE_CLIENT_SECRET_DEV}
WIDGET_SECRET_KEY_DEV: ${WIDGET_SECRET_KEY_DEV}
SLACK_WEBHOOK_URL_DEV: ${SLACK_WEBHOOK_URL_DEV}
networks:
- dev-kokomen-net

Expand Down
109 changes: 109 additions & 0 deletions docker/prod/api/deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#!/bin/bash
set -e

SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
COMPOSE_FILE="$SCRIPT_DIR/docker-compose-prod.yml"
HEALTH_TIMEOUT=120
HEALTH_INTERVAL=5
GRACEFUL_SHUTDOWN_WAIT=65

log_info() { echo "[INFO] $1"; }
log_warn() { echo "[WARN] $1"; }
log_error() { echo "[ERROR] $1"; }

get_active() {
if docker ps -q -f name=kokomen-api-blue | grep -q .; then
echo "blue"
elif docker ps -q -f name=kokomen-api-green | grep -q .; then
echo "green"
else
echo "none"
fi
}
Comment on lines +14 to +22

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

sudo 사용이 섞여 있어서 Docker 권한이 필요한 서버에서는 배포가 중간에 끊깁니다.

docker composesudo -E로 띄우고, 상태 조회/헬스체크/정리 단계는 일반 docker를 쓰고 있습니다. 이 구성은 Docker 그룹 권한이 없는 서버에서 get_active, wait_healthy, old container 정리 단계가 전부 실패합니다.

🔧 제안 수정
+DOCKER="sudo -E docker"
...
-    if docker ps -q -f name=kokomen-api-blue | grep -q .; then
+    if $DOCKER ps -q -f name=kokomen-api-blue | grep -q .; then
...
-        status=$(docker inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null || echo "starting")
+        status=$($DOCKER inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null || echo "starting")
...
-    if ! docker ps -q -f name=traefik | grep -q .; then
+    if ! $DOCKER ps -q -f name=traefik | grep -q .; then
...
-        sudo -E docker compose -f $COMPOSE_FILE up -d traefik
+        $DOCKER compose -f "$COMPOSE_FILE" up -d traefik
...
-    sudo -E docker compose -f $COMPOSE_FILE --profile $TARGET up -d "kokomen-api-$TARGET"
+    $DOCKER compose -f "$COMPOSE_FILE" --profile "$TARGET" up -d "kokomen-api-$TARGET"
...
-        docker rm -f "kokomen-api-$TARGET" 2>/dev/null || true
+        $DOCKER rm -f "kokomen-api-$TARGET" 2>/dev/null || true
...
-        docker stop -t $GRACEFUL_SHUTDOWN_WAIT "$OLD" || true
-        docker rm -f "$OLD" 2>/dev/null || true
+        $DOCKER stop -t "$GRACEFUL_SHUTDOWN_WAIT" "$OLD" || true
+        $DOCKER rm -f "$OLD" 2>/dev/null || true

Also applies to: 23-44, 67-75, 79-82, 93-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/prod/api/deploy.sh` around lines 13 - 21, The script mixes plain
docker calls with sudo -E docker compose causing failures on hosts where docker
requires sudo; define a single command variable (e.g., DOCKER_CMD) that detects
whether sudo is needed (try running `docker ps` and fall back to `sudo -E
docker`) and use that variable everywhere instead of raw `docker` or `sudo -E
docker compose`, and update all affected helpers (get_active, wait_healthy, the
old-container cleanup logic and any docker compose invocations) to invoke
$DOCKER_CMD (and $DOCKER_CMD compose) so all docker operations run under the
same privilege model.


wait_healthy() {
local container=$1
local elapsed=0

log_info "헬스체크 대기: $container (타임아웃: ${HEALTH_TIMEOUT}초)"

while [ $elapsed -lt $HEALTH_TIMEOUT ]; do
status=$(docker inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null || echo "starting")
if [ "$status" = "healthy" ]; then
echo ""
log_info "헬스체크 통과! (${elapsed}초 소요)"
return 0
fi
sleep $HEALTH_INTERVAL
elapsed=$((elapsed + HEALTH_INTERVAL))
echo -n "."
done

echo ""
log_error "헬스체크 실패: 타임아웃 (${HEALTH_TIMEOUT}초)"
return 1
}

main() {
log_info "========== Blue-Green 배포 시작 =========="

CURRENT=$(get_active)
log_info "현재 활성 환경: $CURRENT"

if [ "$CURRENT" = "blue" ]; then
TARGET="green"
OLD="kokomen-api-blue"
elif [ "$CURRENT" = "green" ]; then
TARGET="blue"
OLD="kokomen-api-green"
else
TARGET="blue"
OLD=""
log_info "최초 배포: blue 환경으로 시작"
fi

log_info "타겟 환경: $TARGET"

# Step 1: Traefik이 실행 중인지 확인
if ! docker ps -q -f name=traefik | grep -q .; then
log_info "Step 0: Traefik 시작"
sudo -E docker compose -f $COMPOSE_FILE up -d traefik

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

스크립트 내에서 sudo를 사용하는 것은 실행 환경의 권한 설정에 의존하게 되어 이식성이 떨어집니다. 일반적으로 배포 사용자를 docker 그룹에 포함시켜 sudo 없이 명령을 실행할 수 있도록 설정하는 것이 권장됩니다.

Suggested change
sudo -E docker compose -f $COMPOSE_FILE up -d traefik
docker compose -f $COMPOSE_FILE up -d traefik

sleep 3
fi

# Step 2: 새 컨테이너 시작
log_info "Step 1: $TARGET 컨테이너 시작"
sudo -E docker compose -f $COMPOSE_FILE --profile $TARGET up -d "kokomen-api-$TARGET"

# Step 3: 헬스체크 대기
log_info "Step 2: 헬스체크 수행"
if ! wait_healthy "kokomen-api-$TARGET"; then
log_error "배포 실패: 새 컨테이너 헬스체크 실패"
log_warn "롤백: 새 컨테이너 제거"
docker rm -f "kokomen-api-$TARGET" 2>/dev/null || true
exit 1
fi

# Step 4: Traefik 라우팅 안정화 대기
log_info "Step 3: Traefik 라우팅 안정화 대기"
sleep 5

# Step 5: 기존 컨테이너 graceful 종료
if [ -n "$OLD" ]; then
log_info "Step 4: 기존 컨테이너 종료 ($OLD, ${GRACEFUL_SHUTDOWN_WAIT}초 대기)"
docker stop -t $GRACEFUL_SHUTDOWN_WAIT "$OLD" || true
docker rm -f "$OLD" 2>/dev/null || true
log_info "기존 컨테이너 종료 완료"
fi

# Step 6: 완료 확인
log_info "Step 5: 배포 완료 확인"
if curl -sf "http://localhost:80/actuator/health" > /dev/null 2>&1; then
log_info "========== 배포 성공! =========="
log_info "활성 환경: $TARGET"
else
log_warn "경고: 외부 헬스체크 실패 (Traefik 라우팅 확인 필요)"
fi
}

main "$@"
91 changes: 78 additions & 13 deletions docker/prod/api/docker-compose-prod.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,25 @@
services:
kokomen-interview-api-prod:
traefik:
image: traefik:v3.0
container_name: traefik
restart: unless-stopped
ports:
- "80:80"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./traefik/traefik.yml:/etc/traefik/traefik.yml:ro
environment:
TZ: Asia/Seoul
networks:
- api-net

kokomen-api-blue:
image: samhap/kokomen-interview-api:prod
container_name: kokomen-interview-api-prod
container_name: kokomen-api-blue
restart: on-failure:3
ports:
- "8080:8080"
- "8081:8081"
expose:
- "8080"
- "8081"
volumes:
- ./app/logs:/logs
environment:
Expand All @@ -25,22 +39,73 @@ services:
GOOGLE_CLIENT_ID_PROD: ${GOOGLE_CLIENT_ID_PROD}
GOOGLE_CLIENT_SECRET_PROD: ${GOOGLE_CLIENT_SECRET_PROD}
WIDGET_SECRET_KEY_PROD: ${WIDGET_SECRET_KEY_PROD}
SLACK_WEBHOOK_URL_PROD: ${SLACK_WEBHOOK_URL_PROD}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8081/actuator/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 40s
labels:
- "traefik.enable=true"
- "traefik.http.routers.api.rule=Host(`api.kokomen.kr`)"
- "traefik.http.routers.api.entrypoints=web"
- "traefik.http.services.api.loadbalancer.server.port=8080"
- "traefik.http.services.api.loadbalancer.healthcheck.path=/actuator/health"
- "traefik.http.services.api.loadbalancer.healthcheck.port=8081"
- "traefik.http.services.api.loadbalancer.healthcheck.interval=5s"
- "traefik.http.services.api.loadbalancer.healthcheck.timeout=3s"
Comment on lines +49 to +57

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "docker-compose-prod.yml" -o -name "deploy.sh" -o -name "rollback.sh" | head -20

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

# Get the full structure of the docker directory
fd -t f "docker" | grep -E "(compose|deploy|rollback)" | head -30

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🏁 Script executed:

# Look for the specific file mentioned in the review
cat docker/prod/api/docker-compose-prod.yml 2>/dev/null || find . -name "docker-compose-prod.yml" -exec cat {} \;

Repository: samhap-soft/kokomen-backend

Length of output: 1905


🌐 Web query:

Traefik identical labels multiple containers blue-green deployment behavior

💡 Result:

Traefik's behavior with multiple Docker containers having identical labels in blue-green deployments depends on how the labels are configured, particularly the router and service names. By default, Traefik creates one router and one service per container based on container-specific identifiers (like container name or ID). Identical router rules (e.g., same Host(example.com)) across multiple containers lead to configuration conflicts, as Traefik detects multiple routers with the same rule but different services, resulting in errors like "Router defined multiple times with different configurations" and failure to route properly. To enable load balancing across multiple containers with identical routing needs (useful for scaling or canary/blue-green with weights), all containers must explicitly share the same service name via labels like traefik.http.services.shared-service.loadbalancer.server.port=80, and reference it in routers with traefik.http.routers.myrouter.service=shared-service. Traefik then aggregates matching containers into a single load-balanced service with servers for each container instance. For strict blue-green deployments (zero-downtime version switching without concurrent load balancing): - Use separate containers/services for blue/green with distinct router names or priorities (e.g., traefik.http.routers.app-blue.rule=Host(app.localhost); traefik.http.routers.app-blue.priority=10 vs green with higher priority). - Enable/disable via traefik.enable=true/false or weights (e.g., traefik.http.services.app.loadbalancer.weight=100 for active, 0 for inactive). - Switch by updating labels (recreating containers), health checks, or dynamic file provider configs pointing to @docker services. This avoids conflicts while allowing atomic traffic shifts. Official docs confirm Traefik creates services with servers per container instance when configurations match, supporting load balancing natively when service names align. Conflicts arise only from mismatched router/service declarations across identical-rule containers.

Citations:


🌐 Web query:

Traefik http.routers http.services same labels multiple backends load balancing

💡 Result:

In Traefik, to configure an HTTP router (e.g., http.routers.http) to load balance across multiple backends using the same service labels, use the Docker provider with identical service names across multiple containers. Traefik automatically aggregates containers with the same service name into a single loadBalancer service, adding each container's instance as a server. Key steps: 1. Define the same service name (e.g., my-service) on multiple containers via labels like traefik.http.services.my-service.loadbalancer.server.port=8080. 2. Point one or more routers to that service name using traefik.http.routers.http.service=my-service. 3. Traefik detects matching service configurations and load balances traffic across all containers exposing that service (using strategies like wrr by default). Example Docker Compose for two whoami containers load balanced by router "http": version: '3' services: whoami-1: image: traefik/whoami labels: - traefik.enable=true - traefik.http.routers.http.rule=Host(example.com) - traefik.http.routers.http.service=my-service - traefik.http.services.my-service.loadbalancer.server.port=80 whoami-2: image: traefik/whoami labels: - traefik.enable=true - traefik.http.routers.http.service=my-service # Router points to shared service - traefik.http.services.my-service.loadbalancer.server.port=80 # Same service name deploy: replicas: 1 # Scale as needed Traefik creates a loadBalancer for my-service with servers from both containers. Without explicit router labels on the second container, Traefik auto-creates routers if rules are defined, but explicit shared service ensures aggregation. For static/file provider, explicitly list servers: http: services: my-service: loadBalancer: servers: - url: http://backend1:80 - url: http://backend2:80 Load balancing strategies (wrr, p2c, etc.) configurable via traefik.http.services.my-service.loadbalancer.strategy=wrr. This works in Traefik v2+ and v3 (current as of 2026), per official docs.

Citations:


🌐 Web query:

Traefik blue-green deployment routing cutover strategy

💡 Result:

Traefik supports blue-green deployment routing through its dynamic configuration and weighted services, enabling zero-downtime cutovers by switching traffic between blue (current) and green (new) environments. Core Strategy: Maintain two identical environments: blue (live) and green (staging new version). Deploy and test the new version in green while blue handles all production traffic. Once validated, perform cutover by updating Traefik routing to direct 100% traffic to green. Rollback by switching back to blue. Cutover Methods by Environment: 1. Docker/Docker Compose (using labels): - Enable/disable services via labels like traefik.enable=true/false. - Update docker-compose.yml with env vars: BLUE_ENABLED=true, GREEN_ENABLED=false initially. - Deploy: docker compose up -d - Test green, then cutover: BLUE_ENABLED=false GREEN_ENABLED=true; docker compose up -d - Traefik auto-detects changes and switches instantly. - Example labels: traefik.http.services.app-blue.loadbalancer.server.port=3000 traefik.enable=${BLUE_ENABLED:-true} 2. File Provider (dynamic config): - Define separate services for blue/green. - Router points to active service: service: green@file - Edit config file (watched by Traefik), change service reference, save for instant switch. 3. Kubernetes (CRDs recommended): - Deploy blue/green deployments + services (e.g., app-blue-svc, app-green-svc). - Use IngressRoute with TraefikService for weighted routing: apiVersion: traefik.io/v1alpha1 kind: TraefikService metadata: name: bg-service spec: weighted: services: - name: app-blue-svc port: 80 weight: 100 # Initially 100% blue - name: app-green-svc port: 80 weight: 0 - IngressRoute references bg-service. - Cutover: Update weights (100->0 blue, 0->100 green), apply. Supports gradual shift (90/10 -> 50/50 -> 0/100). - Health checks ensure only healthy services receive traffic. Additional Best Practices: - Use health checks (Traefik auto-removes unhealthy servers). - Weighted Round Robin (WRR) for gradual cutover or canary. - Traffic mirroring for green testing (duplicate requests to green while blue responds). - Tools like Flagger automate with metrics-based promotion. - Monitor via Traefik dashboard. This leverages Traefik's service discovery and hot-reload for zero-downtime.

Citations:


현재 Traefik 설정은 동시에 두 백엔드로 트래픽을 분산합니다. 진정한 blue-green 전환이 아닙니다.

동일한 traefik.http.routers.api.*traefik.http.services.api.* 라벨을 blue/green 컨테이너에 붙이면, Traefik이 두 컨테이너를 하나의 로드밸런싱 풀로 통합하여 동시에 트래픽을 분산합니다. blue-green 디플로이먼트에서 진정한 zero-downtime 전환을 위해서는 다음 중 하나가 필요합니다:

  • traefik.enable=true/false를 사용한 활성/비활성 제어
  • 각 색상별 별도의 서비스 이름 지정
  • 라우터 우선순위 설정 후 전환 시 활성 라우터 변경

현재 deploy.sh/rollback.sh에 이러한 전환 메커니즘이 없으면, 실제 동작은 두 버전이 동시에 트래픽을 받는 rolling/canary 배포에 가깝습니다.

Also applies to: 96-104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/prod/api/docker-compose-prod.yml` around lines 49 - 57, The Traefik
labels currently use identical router/service names (traefik.http.routers.api.*
and traefik.http.services.api.*) so blue and green containers are merged into
one LB pool; change the strategy to perform real blue-green switches by either
(a) assigning distinct service/router names per color (e.g.,
traefik.http.services.api-blue.* and traefik.http.services.api-green.* with
matching traefik.http.routers.api-blue.rule / api-green.rule), or (b) toggling
traefik.enable=true/false on the inactive container from your deployment scripts
(deploy.sh / rollback.sh), or (c) use router priority and switch the active
router during deploy; update the docker-compose labels and the deploy/rollback
scripts to implement one of these approaches so only the active color receives
traffic.

networks:
- api-net
profiles:
- blue

nginx:
image: nginx:1.28.0
container_name: nginx
ports:
- "80:80"
kokomen-api-green:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

kokomen-api-green 서비스의 설정이 kokomen-api-blue와 거의 동일합니다. YAML anchor(&)와 alias(*)를 사용하여 공통 설정을 정의하면 중복을 제거하고 유지보수성을 높일 수 있습니다.

image: samhap/kokomen-interview-api:prod
container_name: kokomen-api-green
restart: on-failure:3
expose:
- "8080"
- "8081"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
- ./nginx/logs:/var/log/nginx
restart: unless-stopped
- ./app/logs:/logs
environment:
TZ: Asia/Seoul
JAVA_TOOL_OPTIONS: -Duser.timezone=Asia/Seoul
HOSTNAME: ${HOSTNAME}
SPRING_PROFILES_ACTIVE: prod
SPRING_DATASOURCE_URL_PROD: ${SPRING_DATASOURCE_URL_PROD}
SPRING_DATASOURCE_USERNAME_PROD: ${SPRING_DATASOURCE_USERNAME_PROD}
SPRING_DATASOURCE_PASSWORD_PROD: ${SPRING_DATASOURCE_PASSWORD_PROD}
REDIS_PRIMARY_HOST_PROD: ${REDIS_PRIMARY_HOST_PROD}
OPEN_AI_API_KEY: ${OPEN_AI_API_KEY}
KAKAO_CLIENT_ID_PROD: ${KAKAO_CLIENT_ID_PROD}
KAKAO_CLIENT_SECRET_PROD: ${KAKAO_CLIENT_SECRET_PROD}
KAKAO_ADMIN_KEY_PROD: ${KAKAO_ADMIN_KEY_PROD}
SUPERTONE_API_TOKEN: ${SUPERTONE_API_TOKEN}
GOOGLE_CLIENT_ID_PROD: ${GOOGLE_CLIENT_ID_PROD}
GOOGLE_CLIENT_SECRET_PROD: ${GOOGLE_CLIENT_SECRET_PROD}
WIDGET_SECRET_KEY_PROD: ${WIDGET_SECRET_KEY_PROD}
SLACK_WEBHOOK_URL_PROD: ${SLACK_WEBHOOK_URL_PROD}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8081/actuator/health"]
interval: 10s
timeout: 5s
retries: 3
start_period: 40s
labels:
- "traefik.enable=true"
- "traefik.http.routers.api.rule=Host(`api.kokomen.kr`)"
- "traefik.http.routers.api.entrypoints=web"
- "traefik.http.services.api.loadbalancer.server.port=8080"
- "traefik.http.services.api.loadbalancer.healthcheck.path=/actuator/health"
- "traefik.http.services.api.loadbalancer.healthcheck.port=8081"
- "traefik.http.services.api.loadbalancer.healthcheck.interval=5s"
- "traefik.http.services.api.loadbalancer.healthcheck.timeout=3s"
networks:
- api-net
profiles:
- green

node:
image: prom/node-exporter
Expand Down
101 changes: 101 additions & 0 deletions docker/prod/api/rollback.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#!/bin/bash
set -e

SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
COMPOSE_FILE="$SCRIPT_DIR/docker-compose-prod.yml"
HEALTH_TIMEOUT=120
HEALTH_INTERVAL=5

log_info() { echo "[INFO] $1"; }
log_warn() { echo "[WARN] $1"; }
log_error() { echo "[ERROR] $1"; }

get_active() {
if docker ps -q -f name=kokomen-api-blue | grep -q .; then
echo "blue"
elif docker ps -q -f name=kokomen-api-green | grep -q .; then
echo "green"
else
echo "none"
fi
}

check_container_exists() {
local container=$1
if docker ps -a -q -f name="$container" | grep -q .; then
return 0
else
return 1
fi
}
Comment on lines +13 to +30

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

sudo 사용이 섞여 있어서 Docker 권한이 필요한 서버에서는 중간 단계가 바로 실패합니다.

docker composesudo -E로 실행하고, docker ps/inspect/start/stop/rm은 일반 권한으로 실행하고 있습니다. 이 서버가 실제로 sudo가 필요한 환경이면 활성 색상 탐지, 헬스체크, 컨테이너 정리가 전부 실패합니다.

🔧 제안 수정
+DOCKER="sudo -E docker"
...
-    if docker ps -q -f name=kokomen-api-blue | grep -q .; then
+    if $DOCKER ps -q -f name=kokomen-api-blue | grep -q .; then
...
-        status=$(docker inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null || echo "starting")
+        status=$($DOCKER inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null || echo "starting")
...
-        docker start "kokomen-api-$ROLLBACK_TARGET" || true
+        $DOCKER start "kokomen-api-$ROLLBACK_TARGET" || true
...
-        sudo -E docker compose -f $COMPOSE_FILE --profile $ROLLBACK_TARGET up -d "kokomen-api-$ROLLBACK_TARGET"
+        $DOCKER compose -f "$COMPOSE_FILE" --profile "$ROLLBACK_TARGET" up -d "kokomen-api-$ROLLBACK_TARGET"
...
-    docker stop -t 65 "kokomen-api-$CURRENT" || true
-    docker rm -f "kokomen-api-$CURRENT" 2>/dev/null || true
+    $DOCKER stop -t 65 "kokomen-api-$CURRENT" || true
+    $DOCKER rm -f "kokomen-api-$CURRENT" 2>/dev/null || true

Also applies to: 31-52, 72-80, 91-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/prod/api/rollback.sh` around lines 12 - 29, Mixed use of sudo causes
failures when Docker requires elevated privileges; make Docker invocations
consistent by introducing a single command variable (e.g., DOCKER_CMD) that
conditionally includes "sudo -E" and use that variable everywhere (replace
direct calls in get_active, check_container_exists and all other docker
ps/inspect/start/stop/rm/docker compose usages across the script) so active
color detection, health checks and container cleanup all run with the same
privilege context.


wait_healthy() {
local container=$1
local elapsed=0

log_info "헬스체크 대기: $container"

while [ $elapsed -lt $HEALTH_TIMEOUT ]; do
status=$(docker inspect --format='{{.State.Health.Status}}' "$container" 2>/dev/null || echo "starting")
if [ "$status" = "healthy" ]; then
echo ""
log_info "헬스체크 통과!"
return 0
fi
sleep $HEALTH_INTERVAL
elapsed=$((elapsed + HEALTH_INTERVAL))
echo -n "."
done

echo ""
log_error "헬스체크 실패"
return 1
}

main() {
log_info "========== 롤백 시작 =========="

CURRENT=$(get_active)
log_info "현재 활성 환경: $CURRENT"

if [ "$CURRENT" = "blue" ]; then
ROLLBACK_TARGET="green"
elif [ "$CURRENT" = "green" ]; then
ROLLBACK_TARGET="blue"
else
log_error "현재 활성 환경을 확인할 수 없습니다"
exit 1
fi

log_info "롤백 대상: $ROLLBACK_TARGET"

# 롤백 대상 컨테이너 확인
if check_container_exists "kokomen-api-$ROLLBACK_TARGET"; then
# 컨테이너가 존재하면 시작
log_info "기존 컨테이너 시작"
docker start "kokomen-api-$ROLLBACK_TARGET" || true
else
# 컨테이너가 없으면 새로 생성
log_info "롤백 대상 컨테이너 생성"
sudo -E docker compose -f $COMPOSE_FILE --profile $ROLLBACK_TARGET up -d "kokomen-api-$ROLLBACK_TARGET"
fi

# 헬스체크
if ! wait_healthy "kokomen-api-$ROLLBACK_TARGET"; then
log_error "롤백 실패: 헬스체크 실패"
exit 1
fi

# Traefik 라우팅 안정화
sleep 5

# 현재 활성 컨테이너 종료
log_info "현재 컨테이너 종료: kokomen-api-$CURRENT"
docker stop -t 65 "kokomen-api-$CURRENT" || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

기존 컨테이너 종료 시의 대기 시간(65)이 하드코딩되어 있습니다. deploy.sh와 동일하게 상단에 변수를 정의하여 관리하는 것이 일관성 측면에서 좋습니다.

docker rm -f "kokomen-api-$CURRENT" 2>/dev/null || true

log_info "========== 롤백 완료 =========="
log_info "활성 환경: $ROLLBACK_TARGET"
}

main "$@"
Loading
Loading