Looking for a SRE Lead Engineer to join a hardworking team of Agile SRE and test automation engineers who will be responsible for the E2E infrastructure, support and deployment of code into Lab/Production environments. Highly uptime SLA oriented team and process driven.
SRE team is expanding along with knowledge pool and converging on a single entity to support mobile/broadband for different geographies” - Manager - Software Engineering
What you'll do: -
- Develop a telco grade PaaS capability for the business.
- Design, document, and implement a PaaS solution to onboard and integrate vendor provided or requested applications with customers telecommunications infrastructure.
- Be responsible for the engineering and support of production environments, including automation of patches, upgrades, reliability and performance improvements
- Develop assurance, monitoring, and management capabilities for PaaS infrastructure using Zabbix, Prometheus, Grafana, and ELK stack.
- Lead creation of automated reports for various services and PaaS infrastructure.
- Own the operational playbook for the PaaS infrastructure and the services running within it.
- Monitor and manage Linux VMs, Containers and applications.
- On call 1 in 5 weeks - This will move to a 1 in 8 weeks once a full team is in place
What you'll bring: -
- Linux system administration & configuration management, primarily with CentOS and Ubuntu.
- Experience with automation/orchestration with tools such as Ansible and Terraform.
- Experience of building and maintaining CI/CD pipelines.
- Experience working with Git and performing code reviews.
Good to have: -
- Working with Java apps, Rancher, Kubernetes, and Helm.
- Experience deploying and maintaining Hadoop, Airflow, Geode, and related components.
- Experience building and managing Kafka, Zookeeper, Couchbase, PostgreSQL and Consul clusters.