Skip to content

모듈 책임 분리 및 PR 전용 CI 도입#6

Merged
JAEIL1999 merged 9 commits into
mainfrom
refactor/module-structure-split
Apr 9, 2026
Merged

모듈 책임 분리 및 PR 전용 CI 도입#6
JAEIL1999 merged 9 commits into
mainfrom
refactor/module-structure-split

Conversation

@JAEIL1999
Copy link
Copy Markdown
Owner

변경 내용

이번 PR에서는 backend, autoscaler, load_balancer, frontend 전반의 파일 구조를 책임 기준으로 분리하고,
PR 단계에서 기본 안정성을 검증할 수 있도록 GitHub Actions 기반 CI를 추가했습니다.

1. Backend 구조 정리

  • backend/src 구조를 제거하고 backend 루트 기준으로 평탄화
  • 실행 진입점, Flask 앱 구성, 설정, 부하 생성 로직 분리
  • server.py는 실행 전용 엔트리포인트로 단순화
  • 부하 생성 관련 상태와 프로세스 제어를 stress.py로 분리

2. Autoscaler 구조 정리

  • 기존 단일 파일/혼합 책임 구조를 다음 역할로 분리
    • 실행 진입점
    • 설정 로딩
    • 스케일링 정책
    • Docker 제어
    • Prometheus 조회
    • Load Balancer 알림
    • target 파일 관리
    • 종료 정리
  • 로그/print 메시지의 깨진 한글 문자열을 영어로 정리

3. Load Balancer 구조 정리

  • FastAPI 앱 조립, backend discovery, health check, proxy, metrics 생성 책임 분리
  • server.py는 uvicorn 실행만 담당하도록 단순화
  • init.py__init__.py로 정리
  • health check/log 메시지의 깨진 문자열 정리

4. Frontend 정리

  • Dashboard에서 하드코딩된 API URL과 Grafana URL 분리
  • config, services 디렉터리 추가
  • 미사용 Vite 예제 파일 제거 (App.tsx, App.css, react.svg)
  • TypeScript 설정 오류 정리
    • incremental 추가
    • 지원되지 않는 compiler option 제거
    • .tsx 확장자 import 제거
  • 프론트엔드 빌드/린트 가능 상태로 정리

5. CI 도입

  • .github/workflows/ci.yml 추가
  • PR 생성 시 아래 검증 수행
    • Python 모듈 문법 체크
    • Frontend lint/build
    • Docker Compose 설정 검증

6. 로컬 산출물 정리

  • 루트 .gitignore 추가
  • .idea, __pycache__, *.py[cod] 무시

검증 내용

로컬에서 확인한 항목:

  • Python 모듈 문법 검증 통과
  • fe 의존성 설치 후 npm run lint 통과
  • fe 의존성 설치 후 npm run build 통과

참고:

  • 로컬 환경에서는 docker compose 명령 자체가 없어 Compose 검증은 GitHub Actions에서 확인 예정

기대 효과

  • 파일별 책임이 명확해져 이후 로드밸런서 개선, 오토스케일러 정책 변경, 프론트엔드 대시보드 개편이 쉬워짐
  • PR 단위에서 기본적인 문법/빌드 오류를 자동 차단할 수 있음
  • 자기소개서/면접 관점에서도 구조 개선과 안정성 확보 과정을 설명하기 쉬운 형태로 정리됨

후속 작업 예정

  • Load Balancer 선택 정책 개선
  • Autoscaler 정책 고도화
  • Frontend 모니터링 화면 재구성
  • PR 기반 CI 동작 확인 후 필요 시 항목 보강

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the project structure across backend/autoscaler/load_balancer/frontend to better separate responsibilities, and adds a GitHub Actions CI workflow to validate basic correctness at PR time.

Changes:

  • Load balancer FastAPI code is decomposed into app, proxy, metrics, health_check, and discovery modules; server.py becomes a thin uvicorn entrypoint.
  • Backend Flask app is split into app, config, and stress modules; container build updated to match the new layout.
  • Frontend removes Vite sample artifacts, extracts endpoint/Grafana URLs into config/services, and adds PR CI for Python compile, frontend lint/build, and docker compose config.

Reviewed changes

Copilot reviewed 35 out of 40 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
load_balancer/server.py Simplifies entrypoint to run uvicorn with pre-built app
load_balancer/proxy.py Extracts proxying logic for /load and /cpu/toggle
load_balancer/metrics.py Extracts Prometheus-style metrics response builder
load_balancer/health_check.py Refactors health-check loop and refresh trigger wiring
load_balancer/discovery.py Extracts Docker-based backend discovery
load_balancer/config.py Centralizes load balancer constants
load_balancer/app.py Centralizes FastAPI app creation, routes, and lifespan
load_balancer/init.py Marks package
fe/tsconfig.app.json Adjusts TS compiler options for build/lint compatibility
fe/src/types/assets.d.ts Adds TS module declarations for asset imports
fe/src/services/loadBalancer.ts Adds service wrapper functions for LB calls
fe/src/pages/Dashboard.tsx Uses services/config instead of hard-coded URLs; copy cleanup
fe/src/main.tsx Removes .tsx extension import
fe/src/config/grafana.ts Centralizes Grafana panel definitions and URL builder
fe/src/config/endpoints.ts Centralizes backend/LB base URLs and endpoint builders
fe/src/components/Sidebar.tsx Uses shared BalancerMode type; copy cleanup
fe/src/components/Overview.tsx Maps Grafana panels from config instead of hard-coded iframes
fe/src/assets/react.svg Removes unused Vite sample asset
fe/src/App.tsx Removes unused Vite sample app
fe/src/App.css Removes unused Vite sample CSS
backend/stress.py Extracts stress generation/control into dedicated module
backend/src/server.py Removes old backend entrypoint under src/
backend/server.py Adds thin entrypoint that runs the Flask app
backend/requirements.txt Adjusts requirements file formatting/location consistency
backend/Dockerfile Updates Docker build to copy new backend layout
backend/config.py Centralizes backend load/stress constants
backend/app.py Refactors Flask app creation and endpoints
backend/init.py Marks package
autoscaler/targets.py Extracts Prometheus file_sd target management
autoscaler/scaler.py Extracts autoscaling loop logic into class
autoscaler/prometheus_client.py Extracts Prometheus query client
autoscaler/notifier.py Extracts load balancer refresh notifier
autoscaler/docker_manager.py Extracts Docker control utilities and target updates
autoscaler/config.py Adds settings loader via env vars
autoscaler/cleanup.py Adds shutdown cleanup and signal handler registration
autoscaler/autoscaler.py Simplifies autoscaler entrypoint to compose modules
autoscaler/metrics.py Removes legacy mixed-responsibility module
.gitignore Adds root ignore rules for local artifacts
.github/workflows/ci.yml Adds PR CI: python compile, frontend lint/build, compose config validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread fe/tsconfig.app.json
Comment on lines 19 to 24
"strict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"erasableSyntaxOnly": true,
"noFallthroughCasesInSwitch": true,
"noUncheckedSideEffectImports": true

},
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsconfig.app.json has a trailing comma after the last compilerOption (noFallthroughCasesInSwitch), which makes the file invalid JSON and can break TypeScript/tsc parsing. Remove the trailing comma (or add the next property) so the config is valid.

Copilot uses AI. Check for mistakes.
Comment thread autoscaler/cleanup.py Outdated
Comment on lines +20 to +25
def clear_local_target_file():
flask_json_path = "/app/prometheus/targets/flask.json"
if os.path.exists(flask_json_path):
with open(flask_json_path, "w") as file:
json.dump([], file)
print("flask.json cleared")
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The autoscaler writes Prometheus targets to /etc/prometheus/targets/flask.json (via targets.py and docker-compose volume mount), but shutdown cleanup clears /app/prometheus/targets/flask.json. This path mismatch means the autoscaled target file may not actually be cleared on shutdown; align this to the same path used by targets.FLASK_TARGET_PATH.

Copilot uses AI. Check for mistakes.
Comment thread autoscaler/cleanup.py
Comment on lines +9 to +13
def cleanup_autoscaled_containers():
client = docker.from_env()
for container in client.containers.list(all=True):
if container.name.startswith("autoscale_service-"):
print(f"Removing container {container.name}")
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup removes containers only when the name starts with the hard-coded prefix autoscale_service-, but DockerManager names containers using the configurable label (and the label can differ via env). Consider basing this on the configured label or on container labels (e.g., autoscale_service) to avoid leaving autoscaled containers behind when AUTOSCALE_LABEL changes.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +12
def get_metric(self, query: str) -> float:
url = f"{self.base_url}/api/v1/query"
response = requests.get(url, params={"query": query})
response.raise_for_status()
data = response.json()
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requests.get(...) is called without a timeout, so the autoscaler can hang indefinitely if Prometheus is slow/unreachable. Pass a reasonable timeout (and consider handling request exceptions) to keep the scaling loop responsive.

Copilot uses AI. Check for mistakes.
Comment thread autoscaler/scaler.py
Comment on lines +10 to +24
class AutoScaler:
def __init__(
self,
prom_url: str,
docker_image: str,
label: str = "autoscale_service",
cpu_threshold: float = 0.7,
min_instances: int = 1,
max_instances: int = 10,
check_interval: int = 10,
load_balancer_url: str = "http://host.docker.internal:8000",
):
self.prom = PrometheusClient(prom_url)
self.dock = DockerManager()
self.notifier = LoadBalancerNotifier(load_balancer_url)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoScaler.__init__ instantiates PrometheusClient but it is never used anywhere in the scaler logic. Either use it for scaling decisions or remove it to avoid dead code and unnecessary dependencies.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +11
async def discover_containers(network_name="pnu_cloud_computing_mynet") -> List[Dict]:
loop = asyncio.get_event_loop()
client = await loop.run_in_executor(None, docker.from_env)

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asyncio.get_event_loop() is deprecated in modern asyncio and can behave unexpectedly under some runtimes; inside a coroutine it’s safer to use asyncio.get_running_loop(). Consider switching to get_running_loop() here (and reusing a Docker client if possible) to avoid deprecation warnings and reduce per-interval overhead.

Copilot uses AI. Check for mistakes.
Comment thread load_balancer/app.py
Comment on lines +30 to +34
health_task = start_health_check(interval=DEFAULT_HEALTH_CHECK_INTERVAL)
yield
await app.state.http_client.aclose()
health_task.cancel()
logger.info("Load balancer shutting down...")
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On shutdown the health-check task is cancelled but never awaited. This can leave warnings about pending tasks and may skip cleanup in the health loop; consider awaiting the task with try/except asyncio.CancelledError (or contextlib.suppress) after calling cancel().

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +35
def run_container(self, image: str, label: str):
project = os.getenv("COMPOSE_PROJECT_NAME", "pnu_cloud_computing")
compose_labels = {
"com.docker.compose.project": project,
"com.docker.compose.service": label,
"com.docker.compose.oneoff": "False",
"autoscale_service": label,
}
labels = {"autoscale_service": label}
existing = self.list_containers(label)
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label is being used as both the Docker filter (in list_containers(filters={"label": label})) and as the value of the autoscale_service label (set to label). With the current defaults (AUTOSCALE_LABEL=autoscale_service in config.py/docker-compose), autoscaled containers end up labeled autoscale_service=autoscale_service, but the load balancer discovery looks for autoscale_service=backend first—so autoscaled containers won’t be discovered as long as the fixed backend exists. Consider separating “label key” vs “service name”, and ensure newly created containers carry the same label value that the load balancer discovery queries (or change discovery to match by key only).

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +18
containers = await loop.run_in_executor(
None,
lambda: client.containers.list(
filters={"status": "running", "label": "autoscale_service=backend"}
),
)
if not containers:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Container discovery first filters by label: autoscale_service=backend and only falls back to ancestor=backend when no containers match. With the autoscaler’s current default labeling (autoscale_service value set from AUTOSCALE_LABEL, defaulting to autoscale_service), autoscaled backends won’t match this filter—so discovery may see only the fixed backend and never include autoscaled ones. Consider loosening the filter (e.g., match on label key only) or aligning label values between autoscaler and load balancer.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors the autoscaler and load balancer components, modularizing their functionalities into separate files and adopting FastAPI for the load balancer. The autoscaler now uses dedicated modules for cleanup, configuration, Docker management, notifications, and Prometheus client interactions. The load balancer also benefits from a more structured approach with modules for application logic, configuration, service discovery, health checks, metrics, and request proxying. The frontend has been updated to align with these architectural changes. However, several critical issues were identified: an inconsistent flask.json path in the autoscaler's cleanup logic, a logical flaw in the autoscaler's CPU calculation that hinders effective scaling, and an incorrect latency measurement in the load balancer's health check. Additionally, a performance bottleneck due to time.sleep(1) in the Docker manager and an inaccurate uptime metric in the load balancer's metrics endpoint require attention.

Comment thread autoscaler/cleanup.py Outdated


def clear_local_target_file():
flask_json_path = "/app/prometheus/targets/flask.json"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The path for flask.json here (/app/prometheus/targets/flask.json) differs from the one defined in autoscaler/targets.py (/etc/prometheus/targets/flask.json). This inconsistency will likely cause the cleanup process to fail if the file is not found at the expected location. Please use a consistent path, preferably defined in a shared configuration file.

Comment thread autoscaler/scaler.py Outdated
Comment thread load_balancer/health_check.py Outdated
def get_container_cpu(self, container):
stats_stream = container.stats(stream=True, decode=True)
first = next(stats_stream)
time.sleep(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using time.sleep(1) inside get_container_cpu is highly inefficient, especially since this method is called in a loop for every container in scaler.py. If there are many containers, the scaling loop will be blocked for a significant amount of time (e.g., 10 containers = 10 seconds). Consider fetching stats for all containers in parallel or using the delta between consecutive scaling cycles to calculate CPU usage without explicit sleeping.

Comment thread load_balancer/metrics.py
lb_metrics = [
f"backend_servers_total {len(all_servers)}",
f"backend_servers_healthy {len(healthy_servers)}",
f"load_balancer_uptime {time.time()}",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The metric load_balancer_uptime is being assigned the current Unix timestamp (time.time()), which is not uptime. Uptime should be the duration since the load balancer started. This will cause Prometheus to see a value that constantly increases with the wall clock rather than representing the process health.

JAEIL1999 and others added 3 commits April 9, 2026 17:49
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@JAEIL1999 JAEIL1999 merged commit 5144323 into main Apr 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants