Full time Erbil
Title: DevOps Engineer
Location: Erbil
Available Until: 5/27/2026
Key Responsibilities:
Foundation (first
priority): the items below are the immediate build.
- Build CI/CD pipelines from scratch for all Java services (build → test → Docker image → Harbor → staging → production with approval gate).
- Define the Docker build standard and base image for all Java services.
- Set up Harbor (self-hosted container registry) for all Docker images.
- Deploy HashiCorp Vault for secrets management (replacing hardcoded credentials and environment variables).
- Configure Prometheus + Grafana for application metrics and alerting across all services.
- Set up and maintain staging and production server environments on FastPay's on-premises infrastructure.
- Integrate all Java services with the existing Graylog centralized logging system.
Ongoing: the items
below run continuously.
- Add a CI/CD pipeline for every new service.
- Write and test the rollback runbook for every production cutover.
- Maintain environment parity — staging mirrors production exactly.
- Monitor MinIO storage capacity, MySQL replication health, and Redis availability.
- Manage on-premises server capacity and plan for growth.
- On-call rotation for infrastructure incidents.
Requirements - Technical (must-have):
- 4–6 years of DevOps or infrastructure engineering experience.
- CI/CD pipeline authoring from scratch — not modifying existing templates: GitHub Actions or GitLab CI YAML, multi-stage pipelines, secrets handling, deployment gates.
- Docker: multi-stage Dockerfiles, layer caching, image security practices.
- Linux server administration: user management, SSH hardening, systemd, disk monitoring, process management — all FastPay infrastructure runs on Linux servers you manage directly.
- Harbor or equivalent self-hosted container registry.
- HashiCorp Vault: installation, Raft HA cluster, policy authoring, AppRole authentication.
- Prometheus + Grafana: scrape configuration, alert rules, dashboard creation.
- Nginx or HAProxy: reverse proxy, SSL termination, upstream load balancing.
- Blue/green or canary deployment patterns.
- Database operations: MySQL backup, restore, replication — no managed service safety net.
General (must-have):
- Has operated infrastructure for a system where downtime has financial or regulatory consequences.
- Has executed a production rollback under pressure — not theoretical knowledge, actual experience.