Our client is looking for an experienced DevOps engineer with Openstack certification and experience to help support and administer their telco cloud environment.
This role will be inside of IR35, and has the potential to be extended.
- Certified Openstack Administrator
- Hands on experience configuring & managing Openstack in Enterprise HA environments.
- Strong background automating the configuration and management of large-scale platforms: Linux, Ansible, Python, Git.
- Understanding of modern monitoring and logging systems: Zabbix, ELK, Grafana etc.
- At least 3 years’ experience working on large-scale Enterprise environments.
- Experience with containerisation: Docker, Kubernetes.
- Understanding of Software Defined Networks.
- Experience with Nokia network equipment and F5 load balancers would be advantageous.
- Be responsible for the Engineering of Lab and Production Telco Cloud environments, including patches , upgrades and reliability and performance improvements .
- Manage day-to-day incident resolution, working with other teams in Sky and Suppliers.
- Be part of an on-call rota that provides escalation support for Production Incidents. <- This is quite key. Person will not have on call from day 1, but we would need someone who can be part of on call.
- The way we do on-call is 1:6, meaning it comes every 6 weeks for a resource, obviously people cover each other’s holidays so it can be less/more sometimes.
- On-call timings are from 9 am Monday to next week 9 am Monday to cover evenings/night + weekends
- There may be no call out in the period if there are no issues
- If there is a call, investigate if this is service outage or not,
- If no service outage, it can be resolved in the next working day
- If there is service outage, focus is on issue resolution, rather than RCA and other things. Long term fix can be done in the next working day
- Be able to use Python scripting to automate manual and repetitive tasks.
- Collaborate with software development & SRE teams during application on-boarding and when troubleshooting performance and infrastructure issues.
- Use and adapt modern tools to monitor and track faults and performance over time.
- Have excellent problem-solving skills and be able to communicate technical feedback to other members of your team, application owners, and developers.