
Site Reliability Engineer
- Malta Island
- Permanent
- Full-time
- Being the first point of technical escalation of issues within our infrastructure both in cloud and on-prem.
- Participating in stand-ups with the development teams and informing your squad of updates and changes to our platform.
- Automating everything – Workflow and tool automation - such as deployments of distributed applications and infrastructure using various scripting languages to allow our 24/7 Incident Engineers to mitigate incidents without escalation.
- Able to analyse, diagnose and solve issues in the production environment with minimal number of escalations to supporting 3rd Level support teams.
- Participate in Change Management process via review of RFC’s to ensure “Definition of Done” as well as executing and supporting software and hardware deployments.
- Developing and Documenting ways-of-working between the LiveOps(NOC) Team and the development teams to improve efficiencies in diagnostics and impact mitigation.
- Automation and configuration management tools (Octopus, Team City, Terraform)
- AWS Cloud infrastructure, CDNs, and other various systems running in multiple data centres and environments
- Cloud Application Load Balancer, preferably with experience on AWS ALB
- Cloud DNS support such as AWS Route 53, GCP Cloud DNS, or Azure DNS
- Serverless Computing such as AWS Lambda
- Cloud Firewall such as AWS WAF
- Server virtualisation such as VMware, IaaS and PaaS cloud such as AWS and Azure
- Open-source monitoring and alerting tools (Prometheus, Loki, Grafana and Jaeger)
- Scripting in Python, Bash, Powershell or others
- Microsoft SQL databases via Stored Procedures, Locking/Unlocking tables and running select statements to assess impact and diagnose problems
- Ideally you will have a Bachelors degree or equivalent experience, technical degree beneficial
- Aws Cloud practitioner or equivalent would be beneficial.
JobsinMalta