모듈 책임 분리 및 PR 전용 CI 도입#6
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the project structure across backend/autoscaler/load_balancer/frontend to better separate responsibilities, and adds a GitHub Actions CI workflow to validate basic correctness at PR time.
Changes:
- Load balancer FastAPI code is decomposed into
app,proxy,metrics,health_check, anddiscoverymodules;server.pybecomes a thin uvicorn entrypoint. - Backend Flask app is split into
app,config, andstressmodules; container build updated to match the new layout. - Frontend removes Vite sample artifacts, extracts endpoint/Grafana URLs into config/services, and adds PR CI for Python compile, frontend lint/build, and
docker compose config.
Reviewed changes
Copilot reviewed 35 out of 40 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| load_balancer/server.py | Simplifies entrypoint to run uvicorn with pre-built app |
| load_balancer/proxy.py | Extracts proxying logic for /load and /cpu/toggle |
| load_balancer/metrics.py | Extracts Prometheus-style metrics response builder |
| load_balancer/health_check.py | Refactors health-check loop and refresh trigger wiring |
| load_balancer/discovery.py | Extracts Docker-based backend discovery |
| load_balancer/config.py | Centralizes load balancer constants |
| load_balancer/app.py | Centralizes FastAPI app creation, routes, and lifespan |
| load_balancer/init.py | Marks package |
| fe/tsconfig.app.json | Adjusts TS compiler options for build/lint compatibility |
| fe/src/types/assets.d.ts | Adds TS module declarations for asset imports |
| fe/src/services/loadBalancer.ts | Adds service wrapper functions for LB calls |
| fe/src/pages/Dashboard.tsx | Uses services/config instead of hard-coded URLs; copy cleanup |
| fe/src/main.tsx | Removes .tsx extension import |
| fe/src/config/grafana.ts | Centralizes Grafana panel definitions and URL builder |
| fe/src/config/endpoints.ts | Centralizes backend/LB base URLs and endpoint builders |
| fe/src/components/Sidebar.tsx | Uses shared BalancerMode type; copy cleanup |
| fe/src/components/Overview.tsx | Maps Grafana panels from config instead of hard-coded iframes |
| fe/src/assets/react.svg | Removes unused Vite sample asset |
| fe/src/App.tsx | Removes unused Vite sample app |
| fe/src/App.css | Removes unused Vite sample CSS |
| backend/stress.py | Extracts stress generation/control into dedicated module |
| backend/src/server.py | Removes old backend entrypoint under src/ |
| backend/server.py | Adds thin entrypoint that runs the Flask app |
| backend/requirements.txt | Adjusts requirements file formatting/location consistency |
| backend/Dockerfile | Updates Docker build to copy new backend layout |
| backend/config.py | Centralizes backend load/stress constants |
| backend/app.py | Refactors Flask app creation and endpoints |
| backend/init.py | Marks package |
| autoscaler/targets.py | Extracts Prometheus file_sd target management |
| autoscaler/scaler.py | Extracts autoscaling loop logic into class |
| autoscaler/prometheus_client.py | Extracts Prometheus query client |
| autoscaler/notifier.py | Extracts load balancer refresh notifier |
| autoscaler/docker_manager.py | Extracts Docker control utilities and target updates |
| autoscaler/config.py | Adds settings loader via env vars |
| autoscaler/cleanup.py | Adds shutdown cleanup and signal handler registration |
| autoscaler/autoscaler.py | Simplifies autoscaler entrypoint to compose modules |
| autoscaler/metrics.py | Removes legacy mixed-responsibility module |
| .gitignore | Adds root ignore rules for local artifacts |
| .github/workflows/ci.yml | Adds PR CI: python compile, frontend lint/build, compose config validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "strict": true, | ||
| "noUnusedLocals": true, | ||
| "noUnusedParameters": true, | ||
| "erasableSyntaxOnly": true, | ||
| "noFallthroughCasesInSwitch": true, | ||
| "noUncheckedSideEffectImports": true | ||
|
|
||
| }, |
There was a problem hiding this comment.
tsconfig.app.json has a trailing comma after the last compilerOption (noFallthroughCasesInSwitch), which makes the file invalid JSON and can break TypeScript/tsc parsing. Remove the trailing comma (or add the next property) so the config is valid.
| def clear_local_target_file(): | ||
| flask_json_path = "/app/prometheus/targets/flask.json" | ||
| if os.path.exists(flask_json_path): | ||
| with open(flask_json_path, "w") as file: | ||
| json.dump([], file) | ||
| print("flask.json cleared") |
There was a problem hiding this comment.
The autoscaler writes Prometheus targets to /etc/prometheus/targets/flask.json (via targets.py and docker-compose volume mount), but shutdown cleanup clears /app/prometheus/targets/flask.json. This path mismatch means the autoscaled target file may not actually be cleared on shutdown; align this to the same path used by targets.FLASK_TARGET_PATH.
| def cleanup_autoscaled_containers(): | ||
| client = docker.from_env() | ||
| for container in client.containers.list(all=True): | ||
| if container.name.startswith("autoscale_service-"): | ||
| print(f"Removing container {container.name}") |
There was a problem hiding this comment.
Cleanup removes containers only when the name starts with the hard-coded prefix autoscale_service-, but DockerManager names containers using the configurable label (and the label can differ via env). Consider basing this on the configured label or on container labels (e.g., autoscale_service) to avoid leaving autoscaled containers behind when AUTOSCALE_LABEL changes.
| def get_metric(self, query: str) -> float: | ||
| url = f"{self.base_url}/api/v1/query" | ||
| response = requests.get(url, params={"query": query}) | ||
| response.raise_for_status() | ||
| data = response.json() |
There was a problem hiding this comment.
requests.get(...) is called without a timeout, so the autoscaler can hang indefinitely if Prometheus is slow/unreachable. Pass a reasonable timeout (and consider handling request exceptions) to keep the scaling loop responsive.
| class AutoScaler: | ||
| def __init__( | ||
| self, | ||
| prom_url: str, | ||
| docker_image: str, | ||
| label: str = "autoscale_service", | ||
| cpu_threshold: float = 0.7, | ||
| min_instances: int = 1, | ||
| max_instances: int = 10, | ||
| check_interval: int = 10, | ||
| load_balancer_url: str = "http://host.docker.internal:8000", | ||
| ): | ||
| self.prom = PrometheusClient(prom_url) | ||
| self.dock = DockerManager() | ||
| self.notifier = LoadBalancerNotifier(load_balancer_url) |
There was a problem hiding this comment.
AutoScaler.__init__ instantiates PrometheusClient but it is never used anywhere in the scaler logic. Either use it for scaling decisions or remove it to avoid dead code and unnecessary dependencies.
| async def discover_containers(network_name="pnu_cloud_computing_mynet") -> List[Dict]: | ||
| loop = asyncio.get_event_loop() | ||
| client = await loop.run_in_executor(None, docker.from_env) | ||
|
|
There was a problem hiding this comment.
asyncio.get_event_loop() is deprecated in modern asyncio and can behave unexpectedly under some runtimes; inside a coroutine it’s safer to use asyncio.get_running_loop(). Consider switching to get_running_loop() here (and reusing a Docker client if possible) to avoid deprecation warnings and reduce per-interval overhead.
| health_task = start_health_check(interval=DEFAULT_HEALTH_CHECK_INTERVAL) | ||
| yield | ||
| await app.state.http_client.aclose() | ||
| health_task.cancel() | ||
| logger.info("Load balancer shutting down...") |
There was a problem hiding this comment.
On shutdown the health-check task is cancelled but never awaited. This can leave warnings about pending tasks and may skip cleanup in the health loop; consider awaiting the task with try/except asyncio.CancelledError (or contextlib.suppress) after calling cancel().
| def run_container(self, image: str, label: str): | ||
| project = os.getenv("COMPOSE_PROJECT_NAME", "pnu_cloud_computing") | ||
| compose_labels = { | ||
| "com.docker.compose.project": project, | ||
| "com.docker.compose.service": label, | ||
| "com.docker.compose.oneoff": "False", | ||
| "autoscale_service": label, | ||
| } | ||
| labels = {"autoscale_service": label} | ||
| existing = self.list_containers(label) |
There was a problem hiding this comment.
label is being used as both the Docker filter (in list_containers(filters={"label": label})) and as the value of the autoscale_service label (set to label). With the current defaults (AUTOSCALE_LABEL=autoscale_service in config.py/docker-compose), autoscaled containers end up labeled autoscale_service=autoscale_service, but the load balancer discovery looks for autoscale_service=backend first—so autoscaled containers won’t be discovered as long as the fixed backend exists. Consider separating “label key” vs “service name”, and ensure newly created containers carry the same label value that the load balancer discovery queries (or change discovery to match by key only).
| containers = await loop.run_in_executor( | ||
| None, | ||
| lambda: client.containers.list( | ||
| filters={"status": "running", "label": "autoscale_service=backend"} | ||
| ), | ||
| ) | ||
| if not containers: |
There was a problem hiding this comment.
Container discovery first filters by label: autoscale_service=backend and only falls back to ancestor=backend when no containers match. With the autoscaler’s current default labeling (autoscale_service value set from AUTOSCALE_LABEL, defaulting to autoscale_service), autoscaled backends won’t match this filter—so discovery may see only the fixed backend and never include autoscaled ones. Consider loosening the filter (e.g., match on label key only) or aligning label values between autoscaler and load balancer.
There was a problem hiding this comment.
Code Review
This pull request significantly refactors the autoscaler and load balancer components, modularizing their functionalities into separate files and adopting FastAPI for the load balancer. The autoscaler now uses dedicated modules for cleanup, configuration, Docker management, notifications, and Prometheus client interactions. The load balancer also benefits from a more structured approach with modules for application logic, configuration, service discovery, health checks, metrics, and request proxying. The frontend has been updated to align with these architectural changes. However, several critical issues were identified: an inconsistent flask.json path in the autoscaler's cleanup logic, a logical flaw in the autoscaler's CPU calculation that hinders effective scaling, and an incorrect latency measurement in the load balancer's health check. Additionally, a performance bottleneck due to time.sleep(1) in the Docker manager and an inaccurate uptime metric in the load balancer's metrics endpoint require attention.
|
|
||
|
|
||
| def clear_local_target_file(): | ||
| flask_json_path = "/app/prometheus/targets/flask.json" |
There was a problem hiding this comment.
The path for flask.json here (/app/prometheus/targets/flask.json) differs from the one defined in autoscaler/targets.py (/etc/prometheus/targets/flask.json). This inconsistency will likely cause the cleanup process to fail if the file is not found at the expected location. Please use a consistent path, preferably defined in a shared configuration file.
| def get_container_cpu(self, container): | ||
| stats_stream = container.stats(stream=True, decode=True) | ||
| first = next(stats_stream) | ||
| time.sleep(1) |
There was a problem hiding this comment.
Using time.sleep(1) inside get_container_cpu is highly inefficient, especially since this method is called in a loop for every container in scaler.py. If there are many containers, the scaling loop will be blocked for a significant amount of time (e.g., 10 containers = 10 seconds). Consider fetching stats for all containers in parallel or using the delta between consecutive scaling cycles to calculate CPU usage without explicit sleeping.
| lb_metrics = [ | ||
| f"backend_servers_total {len(all_servers)}", | ||
| f"backend_servers_healthy {len(healthy_servers)}", | ||
| f"load_balancer_uptime {time.time()}", |
There was a problem hiding this comment.
The metric load_balancer_uptime is being assigned the current Unix timestamp (time.time()), which is not uptime. Uptime should be the duration since the load balancer started. This will cause Prometheus to see a value that constantly increases with the wall clock rather than representing the process health.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
변경 내용
이번 PR에서는 backend, autoscaler, load_balancer, frontend 전반의 파일 구조를 책임 기준으로 분리하고,
PR 단계에서 기본 안정성을 검증할 수 있도록 GitHub Actions 기반 CI를 추가했습니다.
1. Backend 구조 정리
backend/src구조를 제거하고backend루트 기준으로 평탄화server.py는 실행 전용 엔트리포인트로 단순화stress.py로 분리2. Autoscaler 구조 정리
3. Load Balancer 구조 정리
server.py는 uvicorn 실행만 담당하도록 단순화init.py를__init__.py로 정리4. Frontend 정리
config,services디렉터리 추가App.tsx,App.css,react.svg)incremental추가.tsx확장자 import 제거5. CI 도입
.github/workflows/ci.yml추가6. 로컬 산출물 정리
.gitignore추가.idea,__pycache__,*.py[cod]무시검증 내용
로컬에서 확인한 항목:
fe의존성 설치 후npm run lint통과fe의존성 설치 후npm run build통과참고:
docker compose명령 자체가 없어 Compose 검증은 GitHub Actions에서 확인 예정기대 효과
후속 작업 예정