Why CI/CD Matters
At Asynq.ai, we deployed to production multiple times per day. Our Agentic AI hiring platform was evolving rapidly — new candidate evaluation models, recruiter dashboard features, and Shopify integration updates landing daily. Without a robust CI/CD pipeline, that velocity would be impossible. A manual deployment process that takes 30 minutes and requires SSH access to production servers doesn't scale when you're shipping 5 times a day.
GitHub Actions became our tool of choice for its tight integration with our Git workflow, generous free tier, and first-class Docker support. The same pipeline architecture now powers deployments at Modelia.ai and EduFly.
The Complete Workflow
Here's the production pipeline I've refined across three companies. It runs on every PR and push to main:
name: CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
NODE_VERSION: '20'
REGISTRY: 123456789.dkr.ecr.ap-south-1.amazonaws.com
IMAGE_NAME: modelia-api
jobs:
# ===== Stage 1: Code Quality =====
lint-and-typecheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npm run lint
- run: npm run type-check
- run: npx prisma generate
# ===== Stage 2: Tests =====
test:
runs-on: ubuntu-latest
needs: lint-and-typecheck
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_DB: test
POSTGRES_PASSWORD: test
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7-alpine
ports:
- 6379:6379
env:
DATABASE_URL: postgresql://postgres:test@localhost:5432/test
REDIS_URL: redis://localhost:6379
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
- run: npm ci
- run: npx prisma migrate deploy
- run: npm test -- --coverage --forceExit
- uses: codecov/codecov-action@v3
with:
files: ./coverage/lcov.info
# ===== Stage 3: Build & Push Docker Image =====
build:
runs-on: ubuntu-latest
needs: test
if: github.ref == 'refs/heads/main'
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/github-actions-deploy
aws-region: ap-south-1
- uses: aws-actions/amazon-ecr-login@v2
- name: Build and push Docker image
run: |
docker build -t $REGISTRY/$IMAGE_NAME:$GITHUB_SHA -t $REGISTRY/$IMAGE_NAME:latest .
docker push $REGISTRY/$IMAGE_NAME:$GITHUB_SHA
docker push $REGISTRY/$IMAGE_NAME:latest
# ===== Stage 4: Security Scan =====
security-scan:
runs-on: ubuntu-latest
needs: build
steps:
- uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
severity: 'HIGH,CRITICAL'
exit-code: '1'
# ===== Stage 5: Deploy =====
deploy:
runs-on: ubuntu-latest
needs: [build, security-scan]
if: github.ref == 'refs/heads/main'
environment: production
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/github-actions-deploy
aws-region: ap-south-1
- name: Deploy to ECS
run: |
aws ecs update-service --cluster production --service modelia-api --force-new-deployment
- name: Wait for deployment
run: |
aws ecs wait services-stable --cluster production --services modelia-apiTesting Strategy
Our testing pyramid ensures confidence at every level:
Unit Tests (70% of tests)
Fast, isolated, testing individual functions and business logic:
// tests/services/pricing.test.ts
describe('PricingService', () => {
it('calculates Shopify merchant subscription correctly', () => {
const result = calculateSubscription({
plan: 'professional',
productCount: 500,
aiRequestsPerMonth: 10000,
});
expect(result.monthlyPrice).toBe(49.99);
expect(result.aiRequestsIncluded).toBe(10000);
expect(result.overage).toBe(0);
});
it('applies overage charges when AI requests exceed plan limit', () => {
const result = calculateSubscription({
plan: 'starter',
productCount: 100,
aiRequestsPerMonth: 5000, // Starter plan includes 1000
});
expect(result.overage).toBe(4000);
expect(result.overageCharge).toBe(40.00); // $0.01 per extra request
});
});Integration Tests (20% of tests)
Testing database interactions and API endpoints with real services. Note the PostgreSQL and Redis services in the GitHub Actions workflow — we test against real databases, not mocks:
// tests/api/candidates.integration.test.ts
describe('POST /api/candidates', () => {
beforeEach(async () => {
await prisma.candidate.deleteMany();
});
it('creates a candidate and triggers AI evaluation', async () => {
const response = await request(app)
.post('/api/candidates')
.send({
name: 'Jane Doe',
email: 'jane@example.com',
resumeUrl: 'https://s3.amazonaws.com/resumes/jane.pdf',
jobId: testJob.id,
})
.expect(201);
expect(response.body.id).toBeDefined();
expect(response.body.stage).toBe('applied');
// Verify database record
const candidate = await prisma.candidate.findUnique({
where: { id: response.body.id },
});
expect(candidate).not.toBeNull();
expect(candidate?.email).toBe('jane@example.com');
});
});E2E Tests (10% of tests)
Critical user paths tested with Playwright:
// e2e/recruiter-flow.spec.ts
test('recruiter can view candidate pipeline and move to interview', async ({ page }) => {
await page.goto('/dashboard');
await page.click('[data-testid="candidates-tab"]');
await expect(page.locator('.candidate-card')).toHaveCount(5);
await page.click('.candidate-card:first-child');
await page.click('[data-testid="schedule-interview"]');
await page.fill('[name="interviewDate"]', '2025-02-15');
await page.click('[data-testid="confirm-schedule"]');
await expect(page.locator('.toast-success')).toBeVisible();
});Security in CI/CD
Lessons from working at Bharat Electronics Limited (BEL), where deployment rigour for Airforce projects isn't optional, directly shaped our CI/CD security practices:
1. OIDC Instead of Long-Lived Credentials
Never store AWS access keys as GitHub Secrets. Use OIDC (OpenID Connect) for short-lived, automatically rotated credentials:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/github-actions-deploy
aws-region: ap-south-1
# No access key! GitHub proves its identity to AWS via OIDC token2. Dependency Scanning
Every PR is automatically checked for vulnerable dependencies:
dependency-audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm audit --audit-level=high
- uses: github/codeql-action/analyze@v2
with:
languages: javascript-typescript3. Branch Protection
At Modelia.ai, direct pushes to main are impossible. Every change requires:
- ›All CI checks passing
- ›At least one code review approval
- ›No unresolved conversations
- ›Linear commit history (squash merge)
4. Docker Image Scanning
Trivy runs on every built image before deployment. If a HIGH or CRITICAL vulnerability is found, the pipeline fails and the image is never deployed:
- uses: aquasecurity/trivy-action@master
with:
image-ref: modelia-api:latest
severity: 'HIGH,CRITICAL'
exit-code: '1' # Fail the pipeline
ignore-unfixed: true # Don't fail on vulnerabilities without patchesDeployment Strategies
Rolling Updates (used at Modelia.ai)
New tasks start alongside old tasks. As new tasks pass health checks, traffic shifts to them. Old tasks drain connections and shut down:
- ›Zero downtime
- ›Gradual rollout — if the new version has issues, only a fraction of traffic is affected
- ›Automatic rollback on health check failure
- ›Takes 3-5 minutes for a full fleet rotation
Blue/Green (used for EduFly)
Two identical environments: Blue (current) and Green (new). Deploy to Green, run smoke tests, then switch the load balancer:
- ›Instant switchover
- ›Easy rollback — just switch back to Blue
- ›Higher cost (two environments running during deploy)
- ›Better for major version upgrades where gradual rollout is risky
Build Caching
Docker layer caching in GitHub Actions can dramatically speed up builds. At Modelia.ai, this reduced our build step from 8 minutes to 2 minutes:
- uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=maxMonitoring Post-Deploy
After every deployment at Modelia.ai, we automatically:
- ›Run smoke tests against production — Hit critical endpoints and verify 200 responses
- ›Check error rates in CloudWatch — Compare error rate in the 5 minutes after deploy vs. the 5 minutes before
- ›Verify API response times — If p99 latency increases by more than 50%, trigger a rollback alert
- ›Send Slack notification with deployment summary — who deployed, what commit, link to the diff
post-deploy-verification:
needs: deploy
runs-on: ubuntu-latest
steps:
- name: Smoke test
run: |
for endpoint in /health /api/products /api/recommendations; do
status=$(curl -s -o /dev/null -w "%{http_code}" "https://api.modelia.ai$endpoint")
if [ "$status" != "200" ]; then
echo "Smoke test failed: $endpoint returned $status"
exit 1
fi
done
- name: Notify Slack
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "Deployed to production",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Deployment successful*
Commit: ${{ github.sha }}
Author: ${{ github.actor }}"
}
}
]
}Key Takeaways
- ›CI/CD should run on every PR, not just main — catch problems before they're merged
- ›Test against real databases in CI — mocks hide integration bugs (a lesson from Asynq.ai)
- ›Use OIDC for AWS credentials — never store long-lived access keys, a security lesson from BEL
- ›Scan Docker images before deploying — Trivy catches vulnerabilities before they reach production
- ›Use Docker layer caching — it reduced our build time from 8 to 2 minutes at Modelia.ai
- ›Always have automated rollback capability — if post-deploy metrics degrade, roll back automatically
- ›Post-deploy verification is not optional — smoke tests and metric comparison after every deployment
- ›Branch protection enforces process — required reviews, passing checks, and squash merges keep main clean
- ›Invest in your pipeline early — at EduFly, setting up CI/CD on day one saved hundreds of hours over the project lifetime
