This SRE will help upscale the existing operational team on AWS, Aura and Mapt notification engine platform, two days per week in the London office and part of OnCall.
What you’ll do:
- Develop a telco grade PaaS capability.
- Design, document, and implement a PaaS solution to onboard and integrate vendor provided or requested applications with our telecommunications infrastructure.
- Take part in an on-call rota to action symptoms before they become outages.
- As a senior SRE engineer, be responsible for the engineering and support of production environments, including automation of patches, upgrades, reliability and performance improvements
- Ownership of lab facilities for Dev & Test activities of PaaS
- Develop assurance, monitoring, and management capabilities for PaaS infrastructure using Zabbix, Prometheus, Grafana, and ELK stack.
- Act as technical escalation point for colleagues within the team.
- Act as a day to day technical point of contact for the engineers in other teams.
- Lead creation of automated reports for various services and PaaS infrastructure.
- Manage the operational playbook for the PaaS infrastructure and the services running within it.
- Automate dashboards and reporting for the platform against SLOs, SLAs and KPIs.
- Support managers with inputs on resourcing as needed.
- Monitor and manage Linux VMs, Containers and applications.
- Support and lifecycle management of various applications and services, including patching, upgrades, updates and troubleshooting.
- Plan and lead proactive disaster recovery testing.
- Work with suppliers to onboard their VNFs and CNFs
What you’ll bring:
- Experience working with Public cloud, OpenStack, VM, Linux boxes
- Strong background automating the configuration and management of large-scale platforms: Linux, Git, any scripting language like Python, Go, Bash etc
- Experience in database deployment and management (SQL, NoSQL). Eg Couchbase, PostgreSQL
- Linux system administration & configuration management, primarily with CentOS or Ubuntu.
- Experience of building and maintaining CI/CD pipelines
- Experience with automation/orchestration with tools such as Ansible and Terraform.
- Knowledge of web servers ie nginx or Apache etc
Required behaviours:
>Act as a role model to set acceptable working standards, ethics and practices.
>Mentor and develop colleagues in the team.
>Lead by example for minimising toil and maximising automation.