SRE (Observability) Engineer - US Tech Solutions, Inc.
New Orleans, LA
About the Job
Location: Remote (CST Hours)
Description
We are seeking a highly skilled SRE ( Observability) Engineer with a deep understanding of modern observability practices and tools. The ideal candidate will have hands-on experience with provisioning, configuring, and developing infrastructure solutions, along with a strong focus on automation, scalability, and reliability. This role involves a mix of development, system architecture, and troubleshooting responsibilities, providing opportunities to influence the evolution of our infrastructure.
Responsibilities
Required Skills
Preferred Skills
Description
We are seeking a highly skilled SRE ( Observability) Engineer with a deep understanding of modern observability practices and tools. The ideal candidate will have hands-on experience with provisioning, configuring, and developing infrastructure solutions, along with a strong focus on automation, scalability, and reliability. This role involves a mix of development, system architecture, and troubleshooting responsibilities, providing opportunities to influence the evolution of our infrastructure.
Responsibilities
- Design, implement, and manage observability solutions using tools like Dynatrace, Prometheus, Thanos, or Grafana.
- Develop metrics, alerts, and silences for comprehensive system monitoring.
- Automate infrastructure tasks using Chef (recipes, cookbooks), Ansible (tasks, playbooks), or Terraform with a strong focus on syntax and GitLab CI/CD configuration.
- Script solutions using Python, PowerShell, or Bash to enable automation across the infrastructure.
- Propose and implement innovative ideas to reduce manual workload and improve operational efficiency through automation.
- Provision and configure cloud resources via CLI or APIs on Clienture, GCP, or AWS.
- Troubleshoot and resolve system issues with an SRE (Site Reliability Engineering) mindset, focusing on root cause analysis and corrective actions.
- Develop and enhance documentation, including application guides, runbooks, and system configurations, ensuring clarity in the "why" and "how" of operations.
- Plan, design, and execute scalable and redundant system architecture to meet organizational goals.
- Observability Tools : Hands-on experience with Dynatrace, Prometheus, Thanos, or Grafana.
- Infrastructure Automation : Proficiency in Chef, Ansible, Terraform, and GitLab CI/CD.
- Scripting Languages : Advanced skills in Python, PowerShell, or Bash.
- Cloud Platforms : Proficient in provisioning and configuring resources on Clienture and GCP (AWS experience acceptable).
- SRE Practices : Familiarity with troubleshooting using SRE principles, root cause analysis, and corrective action planning.
- Documentation : Strong ability to write clear, concise, and detailed technical documentation and runbooks.
- System Architecture : Solid understanding of scalability and redundancy principles.
- Kubernetes : Basic understanding of container orchestration and CLI.
- Linux Administration : Configuration, package management, and troubleshooting expertise.
- Networking : Knowledge of VPCs, proxies, CDNs, and their integration into scalable systems.
- Storage Systems : Familiarity with block and object storage configuration.
- US Tech Solutions is a global staff augmentation firm providing a wide range of talent on-demand and total workforce solutions. To know more about US Tech Solutions, please visit www.ustechsolutions.com.
- US Tech Solutions is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
Source : US Tech Solutions, Inc.