Site Reliability Engineer (SRE)
Posted on 28-Jul-2021
> 1 year
About the position
1.1 Job description
We are looking for Site reliability engineer with strong platform development skills, and a
comprehensive understanding of how to secure environments with solid grasp of
information security and performance optimization.
The focus of the role will be to build scalable, secure, exceptional infrastructure, automating
wherever possible. You will also enable visibility, maintain existing systems and develop new
systems in support of business applications. You will need to be a problem solver with the
ability to multi-task, collaborate and desire to learn new skills and improve.
You will be the link between Infrastructure and Development with a hands-on approach and
an excellent coder. You will provide a DevOps capability model that enables rapid
continuous integration and deployment of application change and have an oversight and
governance of all changes across the environment
1.2 Key Responsibilities
The key accountability of an SRE is to support the agile development team in their product
delivery by building and operating the infrastructure that the product requires in order to
operate. It involves following key functions-
• Proactively monitor and review application performance
• Handle on-call and emergency support
• Ensure software has good logging and diagnostics
• Create and maintain operational runbooks
• Maintain production services through measuring and monitoring availability, latency and
overall system health.
• Scale systems through automation.
• Practice sustainable incident response and blameless postmortems.
• Not be afraid to contribute changes back to the Software engineering team to improve the
• Managing the delivery pipeline into production.
• Troubleshooting issues with web applications
• Understanding of security principles and best practices
• Ensuring that critical data is backed up
• Configuration of monitoring systems including infrastructure monitoring and Application
Performance Monitoring systems such as New Relic.
• Ensuring that web application infrastructure is built1.3 Skills/Experience
A key skill of a software reliability engineer is that they have a deep knowledge of the
application, the code, and how it runs, is configured, and scales. That knowledge is what
makes them so valuable at also monitoring and supporting it as a site reliability engineer.
1.3.1 System administration, security and networking
The Site Reliability Engineer is expected to have a good understanding of system
administration (Linux or Windows) and networking.
• Essential commands
• Operation of Running Systems
• User and Group Management
• Knowledge of networking concepts (DNS, TCP/IP, and Firewalls)
• Service Configuration
• Storage Management
• Experience understanding virtualization technology
• Good grasp of fundamental Security concepts
1.3.2 Automation and deployment technologies
• Good understanding of “infrastructure as code” principles.
• Knowledge of a scripting language such Bash, PowerShell or DSC or similar.
• Ability to configure infrastructure using a Configuration Management technology such
Puppet, Chef or Ansible.
• Be able to create a build and deployment pipeline using an automation server such as
Jenkins or Bamboo.
• Proficiency in a high-level programming language such as Python, Ruby, Go or Java.
• Understanding of container technologies such as Docker.
• Some experience with container orchestration technologies such as ECS or Kubernetes
would be beneficial.
• Use Terraform to deploy cloud infrastructure.
1.3.3 Cloud technologies
• Experience designing available, cost-efficient, fault-tolerant, and scalable distributed
systems on AWS
• Hands-on experience using compute, networking, storage, and database AWS services
• Hands-on experience with AWS deployment and management services
• Ability to identify and define technical requirements for an AWS-based application
• Ability to identify which AWS services meet a given technical requirement
• Knowledge of recommended best practices for building secure and reliable applications on
the AWS platform
• An understanding of the basic architectural principles of building on the AWS Cloud
• An understanding of the AWS global infrastructure
• An understanding of network technologies as they relate to AWS
• An understanding of security features and tools that AWS provides and how they relate to
What we offer:
• Competitive salary
• Red tape free environment
• Ability to grow at a fast paced rate