Why Docker Changed Everything for EduFly
When I built EduFly — an AI-powered School ERP — the deployment story was messy. My local machine ran Node 18, the staging server had Node 16, and the production VM had a random version of npm that broke our postinstall scripts. Teachers couldn't access the attendance system because a production deploy failed silently due to a missing native dependency.
Docker solved all of that. One Dockerfile, one image, one behavior — everywhere. It helped us achieve 99.9% uptime on AWS and I haven't looked back since.
These containerization patterns have followed me to Asynq.ai and Modelia.ai, where reliability is non-negotiable — our Shopify merchants depend on AI features working 24/7 during their peak sales hours.
Multi-Stage Builds
The single most impactful Docker optimization. A naive Dockerfile copies your entire project (including node_modules, dev dependencies, source files, tests) into the image. A multi-stage build separates the build phase from the runtime phase:
# === Stage 1: Build ===
FROM node:20-alpine AS builder
WORKDIR /app
# Copy package files first (layer caching)
COPY package.json package-lock.json ./
RUN npm ci
# Copy source and build
COPY tsconfig.json ./
COPY prisma ./prisma/
COPY src ./src/
RUN npx prisma generate
RUN npm run build
# === Stage 2: Production ===
FROM node:20-alpine AS production
WORKDIR /app
# Create non-root user
RUN addgroup -g 1001 appgroup && adduser -u 1001 -G appgroup -s /bin/sh -D appuser
# Copy only production artifacts
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/prisma ./prisma
COPY --from=builder /app/package.json ./
# Switch to non-root user
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]This reduced our EduFly production image from 1.2GB to 180MB — an 85% reduction. Smaller images mean faster deploys, faster autoscaling, and lower storage costs.
Layer Caching Strategy
Docker builds images layer by layer, and each layer is cached. The key insight: order your Dockerfile commands from least-changing to most-changing:
# GOOD order — package files rarely change, so npm ci is cached
COPY package.json package-lock.json ./
RUN npm ci
COPY src ./src/
RUN npm run build
# BAD order — copying src first invalidates npm ci cache on every code change
COPY . .
RUN npm ci
RUN npm run buildAt Modelia.ai, this optimization reduced our CI/CD build time from 8 minutes to 2 minutes because npm install (the slowest step) is cached unless we change dependencies.
Docker Compose for Development
A well-structured docker-compose.yml makes developer onboarding take minutes instead of hours. When a new engineer joins Modelia.ai, they run one command:
version: '3.8'
services:
api:
build:
context: .
dockerfile: Dockerfile.dev
ports:
- "3000:3000"
volumes:
- ./src:/app/src # Hot reload
- /app/node_modules # Don't override node_modules
environment:
- DATABASE_URL=postgresql://postgres:postgres@db:5432/modelia
- REDIS_URL=redis://redis:6379
- SHOPIFY_API_KEY=test_key
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
db:
image: postgres:16-alpine
ports:
- "5432:5432"
environment:
POSTGRES_DB: modelia
POSTGRES_PASSWORD: postgres
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
ports:
- "6379:6379"
mailhog:
image: mailhog/mailhog
ports:
- "1025:1025" # SMTP
- "8025:8025" # Web UI
volumes:
postgres_data:Security Best Practices
At Bharat Electronics Limited (BEL), building frontend interfaces for the Indian Airforce taught me deployment discipline that I apply everywhere. In defence, even a frontend bug that exposes wrong data is a serious incident. While commercial systems don't have those stakes, the same rigorous deployment principles produce more reliable software:
1. Never Run as Root
RUN addgroup -g 1001 appgroup && adduser -u 1001 -G appgroup -s /bin/sh -D appuser
USER appuser2. Pin Image Versions
# BAD — "latest" today might be different tomorrow
FROM node:latest
# GOOD — pinned to exact digest for reproducibility
FROM node:20.11-alpine3.193. Scan Images for Vulnerabilities
We run Trivy in our CI pipeline at Modelia.ai:
- name: Scan Docker image
run: |
trivy image --severity HIGH,CRITICAL --exit-code 1 modelia-api:latest4. Use .dockerignore
node_modules
.git
.env
.env.*
*.md
tests
coverage
.githubThis prevents secrets and unnecessary files from being included in the build context (or accidentally ending up in the image).
5. No Secrets in Images
# BAD — secret is baked into the image layer (visible with docker history)
ENV API_KEY=sk-secret-key-123
# GOOD — secrets injected at runtime
CMD ["node", "dist/server.js"]
# Secrets provided via: docker run -e API_KEY=sk-secret-key-123Health Checks
Every production container at Modelia.ai and previously at Asynq.ai includes a health check. This isn't optional — without health checks, your orchestrator (ECS, Kubernetes) can't tell if your container is alive but non-functional:
// health.ts — a comprehensive health check endpoint
app.get('/health', async (req, res) => {
const checks = {
database: false,
redis: false,
uptime: process.uptime(),
memory: process.memoryUsage(),
};
try {
await prisma.$queryRaw`SELECT 1`;
checks.database = true;
} catch (e) { /* database is down */ }
try {
await redis.ping();
checks.redis = true;
} catch (e) { /* redis is down */ }
const healthy = checks.database && checks.redis;
res.status(healthy ? 200 : 503).json(checks);
});Production Deployment with ECS
At Modelia.ai, we deploy to AWS ECS Fargate. The workflow:
- ›GitHub Actions builds the Docker image on PR merge
- ›Trivy scans for vulnerabilities
- ›ECR push — image pushed to Amazon Elastic Container Registry
- ›ECS rolling update — new tasks start with the new image, old tasks drain connections and stop
- ›Health check verification — ECS waits for the health check to pass before routing traffic
This gives us zero-downtime deployments. A deploy at Modelia.ai takes about 3 minutes from merge to live.
Debugging Containers
When things go wrong in production, you need to be able to investigate without SSH. Useful patterns:
# View logs from a running container
docker logs --tail 100 -f container_name
# Execute a shell in a running container (for debugging only!)
docker exec -it container_name sh
# View resource usage
docker stats container_name
# Inspect container configuration
docker inspect container_name | jq '.[0].Config.Env'Key Takeaways
- ›Multi-stage builds are essential — they reduce image size by 80%+ and improve security by excluding build tools from production
- ›Layer ordering matters — put rarely-changing layers first for better cache hits
- ›Docker Compose for development, ECS/Kubernetes for production — same Dockerfile, different orchestration
- ›Security scanning should be automated in CI/CD — Trivy catches vulnerabilities before they reach production
- ›Always use health checks — they're the foundation of self-healing infrastructure
- ›Never run as root, never hardcode secrets, always pin versions — lessons from Bharat Electronics Limited (BEL) that apply everywhere
- ›Use .dockerignore — prevent secrets and node_modules from entering your build context
- ›Invest in fast builds — Layer caching reduced our CI time from 8 to 2 minutes at Modelia.ai
