Senior Site Reliability Engineer - Vantage Point Consulting
St. Louis, MO
About the Job
Overview:
The Site Reliability Engineer (SRE) plays a critical role in ensuring the reliability, scalability, and performance of Client's digital platforms and infrastructure.
As part of a global team of highly skilled engineers, the SRE will work on challenging and impactful projects that directly contribute to the company's core business activities.
Client is committed to fostering a culture of innovation, collaboration, and continuous learning, providing the SRE with an opportunity to grow and develop their skills while making a positive impact on the world.
Main Accountabilities:
o Troubleshoot and resolve infrastructure issues and incidents in a timely manner.
o Design, implement, and maintain reliable and scalable infrastructure solutions to support Client's digital platforms and applications.
o Monitor and analyze system performance, identify potential issues, and take proactive measures to prevent outages and disruptions.
o Collaborate with cross-functional teams, including software engineers, product managers, and operations personnel, to ensure seamless integration of infrastructure and application components.
o Develop and implement automation scripts and tools to streamline infrastructure management tasks and improve operational efficiency.
o Stay up to date with industry best practices and emerging technologies in the field of site reliability engineering.
o Close cooperation with DevOps and Cloud engineers.
Impact/Dimensions:
o Contributes to the reliability and uptime of Client's digital platforms, which are critical for the company's global operations and customer satisfaction.
o Works on projects that have a direct impact on Client's revenue and profitability.
o The individual in this role will have a significant impact on the efficiency and effectiveness of Client's technology operations and will be responsible for driving continuous improvement initiatives that save the company time and money.
Key Performance Indicators (KPIs):
o Mean Time to Repair (MTTR) for critical systems
o System uptime and availability o Number of incidents and outages prevented
o Customer satisfaction with infrastructure performance Major Opportunities and Decisions:
o Identifying and mitigating potential risks to infrastructure stability and performance.
o Making decisions on infrastructure investments and resource allocation to optimize costeffectiveness and scalability.
o Balancing the need for innovation with the requirement for stability and reliability in infrastructure operations.
Management/Leadership: o Leads and mentors a team of junior SREs and infrastructure engineers.
o Provides technical guidance to cross-functional teams on infrastructure-related matters.
o Actively participates in shaping the company's infrastructure strategy and roadmap. Key Relationships, Stakeholders & Interfaces (External & Internal):
o Works closely with software engineering teams to ensure seamless integration of infrastructure and application components.
o Development teams o Infrastructure teams o Business stakeholders o Vendors and partners Knowledge and Technical Competencies:
o Strong understanding of SRE & DevOps principles and practices.
o Experience with CI/CD Azure DevOps platform.
o Knowledge of infrastructure management tools such as Ansible, Puppet, or Chef.
o Solid experience with containerization such as Docker and orchestration tools such as Kubernetes.
o Solid knowledge about security aspects in cloud and on-premises.
o Proficient in scripting languages such as Python or Bash.
o Experience with cloud computing platforms such as AWS and Azure where GCP is preferred.
o Experience with monitoring software such as Datadog, Zabbix, Kibana etc.
o Hand-on coding, deploying, and supporting large scale, serverless architectures.
o Infrastructure provisioning with Terraform or CloudFormation (IaaC).
o Experience with Linux and Windows operating systems.
o Strong problem-solving and analytical skills.
o Excellent communication and interpersonal skills.
Education/Experience: o Bachelor's degree in computer science or a related field.
o 5+ years of experience in DevOps engineering.
o Experience with leading teams and managing projects.
o Very good knowledge of English in general
The Site Reliability Engineer (SRE) plays a critical role in ensuring the reliability, scalability, and performance of Client's digital platforms and infrastructure.
As part of a global team of highly skilled engineers, the SRE will work on challenging and impactful projects that directly contribute to the company's core business activities.
Client is committed to fostering a culture of innovation, collaboration, and continuous learning, providing the SRE with an opportunity to grow and develop their skills while making a positive impact on the world.
Main Accountabilities:
o Troubleshoot and resolve infrastructure issues and incidents in a timely manner.
o Design, implement, and maintain reliable and scalable infrastructure solutions to support Client's digital platforms and applications.
o Monitor and analyze system performance, identify potential issues, and take proactive measures to prevent outages and disruptions.
o Collaborate with cross-functional teams, including software engineers, product managers, and operations personnel, to ensure seamless integration of infrastructure and application components.
o Develop and implement automation scripts and tools to streamline infrastructure management tasks and improve operational efficiency.
o Stay up to date with industry best practices and emerging technologies in the field of site reliability engineering.
o Close cooperation with DevOps and Cloud engineers.
Impact/Dimensions:
o Contributes to the reliability and uptime of Client's digital platforms, which are critical for the company's global operations and customer satisfaction.
o Works on projects that have a direct impact on Client's revenue and profitability.
o The individual in this role will have a significant impact on the efficiency and effectiveness of Client's technology operations and will be responsible for driving continuous improvement initiatives that save the company time and money.
Key Performance Indicators (KPIs):
o Mean Time to Repair (MTTR) for critical systems
o System uptime and availability o Number of incidents and outages prevented
o Customer satisfaction with infrastructure performance Major Opportunities and Decisions:
o Identifying and mitigating potential risks to infrastructure stability and performance.
o Making decisions on infrastructure investments and resource allocation to optimize costeffectiveness and scalability.
o Balancing the need for innovation with the requirement for stability and reliability in infrastructure operations.
Management/Leadership: o Leads and mentors a team of junior SREs and infrastructure engineers.
o Provides technical guidance to cross-functional teams on infrastructure-related matters.
o Actively participates in shaping the company's infrastructure strategy and roadmap. Key Relationships, Stakeholders & Interfaces (External & Internal):
o Works closely with software engineering teams to ensure seamless integration of infrastructure and application components.
o Development teams o Infrastructure teams o Business stakeholders o Vendors and partners Knowledge and Technical Competencies:
o Strong understanding of SRE & DevOps principles and practices.
o Experience with CI/CD Azure DevOps platform.
o Knowledge of infrastructure management tools such as Ansible, Puppet, or Chef.
o Solid experience with containerization such as Docker and orchestration tools such as Kubernetes.
o Solid knowledge about security aspects in cloud and on-premises.
o Proficient in scripting languages such as Python or Bash.
o Experience with cloud computing platforms such as AWS and Azure where GCP is preferred.
o Experience with monitoring software such as Datadog, Zabbix, Kibana etc.
o Hand-on coding, deploying, and supporting large scale, serverless architectures.
o Infrastructure provisioning with Terraform or CloudFormation (IaaC).
o Experience with Linux and Windows operating systems.
o Strong problem-solving and analytical skills.
o Excellent communication and interpersonal skills.
Education/Experience: o Bachelor's degree in computer science or a related field.
o 5+ years of experience in DevOps engineering.
o Experience with leading teams and managing projects.
o Very good knowledge of English in general
Source : Vantage Point Consulting