Site Reliability Engineer

Rootz

Ta' Xbiex, Malta Island
Permanent
Full-time

21 days ago

We're looking for a talented Site Reliability Engineer (SRE) to keep our systems running smoothly, reliably, and at scale. Through smart automation, sharp observability, and a cool head in a crisis, you'll help us balance speed with stability, working alongside DevOps and product teams to drive continuous improvements in performance, security, and resilience.This role reports directly to our DevOps Lead.What You'll Do

Define and implement SLIs/SLOs and error budgets for priority services.
Build actionable observability -metrics, logs, traces, dashboards, and alerts - while reducing alert fatigue.
Lead incident management, from on-call triage toblameless postmortems with actionable follow-ups.
Improve deployment safety with robust rollout/rollback strategies and production readiness checks.
Conduct capacity planning, performance tuning, and resilience testing - always with cost efficiency in mind.
Automate away toil - from runbooks to remediation scripts and proactive health checks.
Collaborate with DevOps to embed reliability gates into CI/CD pipelines.
Own and evolve our observability stack
Maintain high-quality documentation and operational standards.
Ensure compliance with security best practices.
Analyse performance and cost data to continually optimise our systems.

About you

Calm, clear communicator under pressure.
Proactive problem-solver who tackles issues before they escalate.
Passionate about automation, optimisation, and resilience.
A collaborative team player who thrives in a fast-paced environment.

Experience and Qualifications

5+ years in SRE, Systems Administration, or DevOps roles.
Bachelor's degree in Computer Science or equivalent technical experience.
Solid experience with Linux systems.
Strong Terraform skills.
Proficiency with Kubernetes and container orchestration.
Hands-on experience with AWS and Cloudflare.
Deep knowledge of Prometheus, Grafana, and the ELK stack.
Experience with CI/CD pipelines (ideally GitLab).
Bonus points for familiarity with RabbitMQ, Kafka, Redis, Aurora, and RDS.

The Ideal Fit

Organised with exceptional attention to detail.
Comfortable working across distributed teams.
Strong analytical and troubleshooting skills.
Self-driven and curious - always learning, always improving.
Keeps up-to-date with industry best practices and emerging technologies.

Why Join Us?At Wildz Group, you'll be part of a collaborative, high-performing team that values ownership, continuous learning, and impact. Your work will directly shape the stability, performance, and reliability of the systems our customers rely on every day.

Rootz

Apply Now