Operations and System Reliability Engineer
REPORTING TO: Head of Product Delivery
LOCATION: NI / Hybrid
About the role:
This role will involve building, testing and maintaining appropriate infrastructure and tools to provide our customers with an effective, efficient and reliable service. You will work with development, delivery, support and account management teams across all our cloud deployments.
Responsibilities:
- Work with the Head of Product Delivery to define Operational requirements and framework
- Identify priorities based on requirements within overarching framework
- Plan and execute on priorities
- Instigate and manage risk management assessments
- Identify ‘normal’ operating procedures and optimise
- Design, build, source systems, tools and processes to meet Operational needs
- Outline and define Operational requirements for Delivery and Support
- Work with agile development processes and take ownership for aspects of Operational requirements by putting appropriate methods and tools in place
- Implement monitoring, log analysis and reporting systems associated with hardware, sites, software, performance, cost, security and user experiences
- Troubleshoot configuration, environmental and software issues and help identify solutions
- Automation of processes
- Focus on optimising costs and productivity (focus on customer experience and performance relative to cost)
Essential criteria:
- Degree level education in a relevant discipline or equivalent experience
- 12 months experience in an Operational role or a developer role involving significant Operational considerations
- Experienced in at least one of the main cloud technologies – AWS, Azure, RedHat, GCP, IBM Cloud
- Strong working knowledge of Linux
- Strong competence in Python
- Experience of building and implementing automated pipelines including working with repos, build automation tools, build orchestration and environment automation
- Experience in implementing tools for logging, monitoring and alerting
- Experience in creating and automating virtual machines in public and private clouds
- An understanding or experience of high availability, business continuity and disaster recovery solutions in the cloud
- Strong communication
Desirable criteria:
- A recognised DevOps or SysOps certification from e.g. AWS
- Experience developing custom scripts in Python, Bash, PowerShell, GoLang or similar language
- Experience implementing cloud infrastructure and networking required to host services, including storage, firewall and network configuration
- Experience in deploying serverless functions e.g. AWS Lambda
- Experience of Agile Scrum, Lean or Kanban using JIRA, or similar agile tracking tools