Monitoring Specialist /Devops Engineer

Sign up to see company details
  • Permanent
  • £50,000 - £60,000 (GBP)
  • London, England, United Kingdom
    and remote
  • ASAP

The Monitoring Specialist / Devops Engineer role is responsible for developing and configuring effective monitoring and event management solutions to support the CRM platform. This specialist individual will ideally have a Dev Ops or similar background specialising in monitoring, but also be code-aware of mainstream enterprise languages such as Java. The role will be part of the CRM Support Team charged with driving improvements to infrastructure, application and service monitoring within IT.

Description

The Monitoring Specialist / Devops Engineer role is responsible for developing and configuring effective monitoring and event management solutions to support the CRM platform. This specialist individual will ideally have a Dev Ops or similar background specialising in monitoring, but also be code-aware of mainstream enterprise languages such as Java. The role will be part of the CRM Support Team charged with driving improvements to infrastructure, application and service monitoring within IT.


What you'll do:

  • Evaluate our existing commercial and open-source production monitoring systems, identifying, designing and deploying improvements, or redevelopment.
  • Be an integral part of our reliability and CDN engineering team, designing monitoring & alarming coverage for new content delivery systems, content libraries and storage.
  • Interface monitoring systems with Incident Management and alerting solutions, to ensure 24/7 teams have the detail they need to manage large live, VoD and internet event situations.
  • Assist in-depth analysis of platform issues, by linking multiple data sources to provide the insight needed for resolution.
  • Work with Reliability Engineering to provide platform logging, trusted with processing hundreds of gigabytes of logs daily used for fault analysis.


What you'll bring:

  • Experience of production-scale deployment and operation of open-source monitoring and TSDB systems, for example: Syslog, Grafana, ELK & Timelion, telegraf, collectd, Clickhouse, Nagios, Zabbix, RRDTool, Cacti, InfluxDB, Prometheus, ServiceNow.
  • A Linux sysadmin specialism, ideally with RHEL/Centos distributions, experienced with performance tuning for high-TPS loads.
  • Production familiarity operating virtualisation/container technologies using e.g., Terraform, Docker, LXC, Xen, LVM, VMware or Openstack with their Cloud equivalents at GC, AWS or Azure.
  • Practical understanding of TCP/IP, including IPv4, IPv6, DNS, DHCP and HTTP.
  • A flair for producing clear documentation and diagrams and the ability to manage configuration, shell-scripts and markdown using git.

Skills

IT Infrastructure Expertise
DevOps
Linux Engineering
Monitoring Tools & Management
Syslog
IT Infrastructure Products
Cacti
ELK Stack
Grafana
Linux
Nagios
IT Network Expertise
Support

Industry Experience

Media & Broadcasting - TV, Music, Movies, Radio, Entertainment
Telecommunications - Service Provider, ISP, Mobile