refactor: prod/stage 환경 nginx 블루/그린 배포 방식 도입#753
Conversation
Walkthrough이 PR은 애플리케이션의 배포 인프라를 블루/그린 무중단 배포 방식으로 전환하는 변경사항들을 포함합니다:
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 78624f546d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/dev-cd.yml:
- Around line 153-166: The health-check loop that uses curl and treats any
non-"000" HTTP code as success is too permissive; update both the dev and prod
health-check sites so the curl target verifies a healthy 2xx response (or hits
/actuator/health and asserts an "UP" body if that endpoint is exposed).
Specifically, in the loop that references NEW_PORT and runs curl, change the
curl to request either "http://localhost:${NEW_PORT}/actuator/health" (if
exposed) or keep "/" but capture the HTTP status and require it to start with
"2" (e.g., test that HTTP matches ^2), and only break the loop on that
condition; keep the timeout/stop/remove logic unchanged. Apply this change in
.github/workflows/dev-cd.yml (lines 153-166) and .github/workflows/prod-cd.yml
(lines 166-179).
- Around line 141-142: Both workflows expose GITHUB_TOKEN via an echoed string
into docker login which can be captured in SSH session logs; for each site
replace the echo-pipe pattern with a secure auth approach: in
.github/workflows/dev-cd.yml (lines 141-142) remove echo "${{
secrets.GITHUB_TOKEN }}" | docker login ... and instead either (a) rely on
pre-configured GHCR credentials on the remote host (install docker credential
helper or place credentials in ~/.docker/config.json) and remove remote token
piping, or (b) pass a deploy-only token into the remote environment securely and
invoke docker login on the remote host reading the secret from a protected file
or environment (avoid printing the token or using echo), and in
.github/workflows/prod-cd.yml (lines 154-155) make the analogous change (use
server-side credential configuration or a securely managed deploy token in the
remote environment instead of echoing the secret); do not log or echo secrets in
SSH commands.
In @.github/workflows/prod-cd.yml:
- Around line 62-67: The current assignment injects raw values of
github.ref_name and inputs.tag_name directly into shell (IMAGE_TAG), enabling
command injection; instead, pass those values into the shell via a GitHub
Actions environment variable and reference that env var (keep using IMAGE_TAG)
so the shell does not perform expansion, and additionally validate/sanitize the
value (e.g., allow only expected tag characters like alphanumerics, dots,
hyphens) before setting IMAGE_TAG to reject dangerous input; update the step
that sets IMAGE_TAG to use the env mechanism and perform validation of
github.ref_name and inputs.tag_name before assigning to IMAGE_TAG.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 859ccff0-a634-44aa-80b9-e04a26f4d71f
📒 Files selected for processing (4)
.github/workflows/dev-cd.yml.github/workflows/prod-cd.ymldocker-compose.dev.ymldocker-compose.prod.yml
whqtker
left a comment
There was a problem hiding this comment.
헬스 체크 방식: 현재 포트 응답 여부(HTTP 000 != 응답 존재)로 확인합니다. 실제 앱 준비 상태를 정확히 확인하려면 /actuator/health 노출이 필요합니다(management.endpoints.web.exposure.include에 health 추가). 엔드포인트 노출 여부에 대한 의견을 부탁드립니다
노출시키는 게 나을 거 같네요
upstream.conf 경로 하드코딩: 워크플로우에서 /etc/nginx/conf.d/upstream.conf 경로를 하드코딩하고 있습니다. 인프라 레포의 nginx_setup.sh.tftpl도 동일 경로를 사용하며, 경로 변경 시 두 레포를 동시에 수정해야 합니다. 현재는 허용 가능한 수준이라 판단했으나, 팀 의견을 확인하고 싶습니다
현 방식으로 가도 괜찮을 것 같습니다
| # 5. 새 컨테이너 시작 | ||
| cd "${WORK_DIR}" | ||
| SLOT="${NEW_SLOT}" APP_PORT="${NEW_PORT}" MANAGEMENT_PORT="${MANAGEMENT_PORT}" OWNER_LOWERCASE="${OWNER_LOWERCASE}" IMAGE_TAG="${IMAGE_TAG_ONLY}" \ | ||
| docker compose -p "${CONTAINER_BASE}-${NEW_SLOT}" -f docker-compose.prod.yml up -d |
There was a problem hiding this comment.
dev는 docker compose -p "${CONTAINER_BASE}-${NEW_SLOT}" -f docker-compose.dev.yml up -d solid-connection-dev 로 구체적인 컨테이너 지정하는데 여기는 없네요 !
스크립트 동작을 통일시킬 필요가 있어 보입니다 !
There was a problem hiding this comment.
반영 완료했습니다! 액츄에이터 노출까지 반영했습니다!
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/dev-cd.yml (1)
155-167: 💤 Low value헬스 체크 롤백 시
docker compose down사용 권장
현재 상태: 헬스 체크 실패 시
docker stop/rm을 사용하지만, 정상 전환 후 구 컨테이너 정리(라인 175)에서는docker compose down을 사용합니다.권장 사항: 일관성을 위해 롤백 시에도
docker compose down을 사용하면 compose가 생성한 리소스(향후 네트워크나 볼륨 추가 시)를 확실히 정리할 수 있습니다.현재
network_mode: "host"설정으로 인해 실질적인 문제는 없지만, 향후 구성 변경에 대비한 개선입니다.♻️ 일관성 개선을 위한 제안
[ "$i" = "30" ] && { echo "Health check timed out after 150s" >&2 - docker stop "${CONTAINER_BASE}-${NEW_SLOT}" 2>/dev/null || true - docker rm "${CONTAINER_BASE}-${NEW_SLOT}" 2>/dev/null || true + SLOT="${NEW_SLOT}" APP_PORT="${NEW_PORT}" MANAGEMENT_PORT="${MANAGEMENT_PORT}" OWNER_LOWERCASE="${OWNER_LOWERCASE}" IMAGE_TAG="${IMAGE_TAG_ONLY}" \ + docker compose -p "${CONTAINER_BASE}-${NEW_SLOT}" -f docker-compose.dev.yml down 2>/dev/null || true exit 1 }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/dev-cd.yml around lines 155 - 167, The health check failure rollback currently uses separate docker stop and docker rm commands to clean up the container identified by CONTAINER_BASE-${NEW_SLOT}, but elsewhere in the workflow (for cleaning up the old slot after successful deployment) docker compose down is used. Replace the individual docker stop and docker rm commands in the health check timeout error handler with a docker compose down call to maintain consistency and ensure all compose-managed resources are properly cleaned up, even when future configuration changes add networks or volumes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/dev-cd.yml:
- Around line 155-167: The health check failure rollback currently uses separate
docker stop and docker rm commands to clean up the container identified by
CONTAINER_BASE-${NEW_SLOT}, but elsewhere in the workflow (for cleaning up the
old slot after successful deployment) docker compose down is used. Replace the
individual docker stop and docker rm commands in the health check timeout error
handler with a docker compose down call to maintain consistency and ensure all
compose-managed resources are properly cleaned up, even when future
configuration changes add networks or volumes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f7822a80-3caf-4362-b3d4-f8fac9dc837a
📒 Files selected for processing (5)
.github/workflows/dev-cd.yml.github/workflows/prod-cd.ymldocker-compose.dev.ymldocker-compose.prod.ymlsrc/main/resources/application.yml
✅ Files skipped from review due to trivial changes (1)
- src/main/resources/application.yml
🚧 Files skipped from review as they are similar to previous changes (3)
- docker-compose.dev.yml
- docker-compose.prod.yml
- .github/workflows/prod-cd.yml
관련 이슈
작업 내용
인프라 레포 PR solid-connection-infra#37에서 nginx upstream 방식으로 전환된 것을 바탕으로, dev-cd / prod-cd 워크플로우를 블루/그린 무중단 배포 방식으로 변경합니다.
변경 파일:
docker-compose.dev.yml,docker-compose.prod.yml,.github/workflows/dev-cd.yml,.github/workflows/prod-cd.yml블루/그린 배포 플로우
/etc/nginx/conf.d/upstream.conf를 읽어 현재 active 슬롯을 판별, 비활성 슬롯에 신규 컨테이너를 기동upstream.conf포트 교체 +nginx -s reload로 트래픽 무중단 전환docker-compose 변경
container_name: ...-${SLOT:-blue}— 슬롯별 컨테이너 이름 분리SERVER_PORT=${APP_PORT:-8080}— Spring Boot 포트 환경 변수로 제어prod EC2 디스크 공간 관리 개선
docker image prune -f(dangling 이미지만 정리)만 수행 → 배포마다 태그 이미지 무한 누적특이 사항
solid-connection-server/solid-connection-dev컨테이너(단일 컨테이너 방식)는 더 이상 관리되지 않습니다. 첫 배포 전 기존 컨테이너를 수동으로 종료하거나, 첫 배포 이후 수동 정리가 필요합니다management.endpoints.web.exposure.include: prometheus로/actuator/health가 미노출 상태여서, 헬스 체크는 포트 응답 여부(HTTP 상태코드000여부)로 대체하였습니다리뷰 요구사항 (선택)
/etc/nginx/conf.d/upstream.conf경로를 하드코딩하고 있습니다. 인프라 레포의nginx_setup.sh.tftpl도 동일 경로를 사용하며, 경로 변경 시 두 레포를 동시에 수정해야 합니다. 현재는 허용 가능한 수준이라 판단했으나, 팀 의견을 확인하고 싶습니다000!= 응답 존재)로 확인합니다. 실제 앱 준비 상태를 정확히 확인하려면/actuator/health노출이 필요합니다(management.endpoints.web.exposure.include에health추가). 엔드포인트 노출 여부에 대한 의견을 부탁드립니다