Site Reliability Engineer (SRE) - Observability Specialist at Sunrise Group Inc.
Las Vegas, NV
About the Job
We are seeking a skilled Site Reliability Engineer (SRE) with expertise in Observability to design, implement, and maintain monitoring, logging, and tracing solutions. This role focuses on improving system reliability, scalability, and performance through effective observability practices while collaborating with development, operations, and business teams.
Additional Requirements:
- Role: Site Reliability Engineer (SRE) - Observability Specialist
- Experience: 6-9 Years
- Location: Las Vegas, NV
- Duration: 6 Month+ Contract
- Observability Solutions: Design and integrate tools for monitoring, logging, and tracing (e.g., Prometheus, Grafana, Elasticsearch, Datadog).
- Monitoring & Alerting: Define KPIs, SLOs, and SLIs; implement actionable alerts to ensure reliability.
- System Reliability: Analyze observability metrics to identify risks and collaborate on mitigations.
- Collaboration: Partner with teams to embed observability into the software lifecycle and advocate best practices.
- Automation: Streamline observability processes like dashboard creation and log parsing.
- Documentation: Maintain documentation for observability tools and processes, ensuring visibility for stakeholders.
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- Proven experience with observability platforms (Prometheus, Grafana, Splunk, OpenTelemetry).
- Proficiency in programming/scripting languages (Python, Go, Bash).
- Strong knowledge of distributed systems, cloud platforms (AWS, Azure, GCP), and containerization (Kubernetes, Docker).
- Familiarity with KPIs, SLOs, and SLIs for monitoring and reporting.
- Certifications in observability tools or cloud platforms and experience with Infrastructure as Code (e.g., Ansible, Terraform).
Additional Requirements:
- Must obtain and maintain a valid Nevada Gaming License.