Site Reliability/Observability Engineer - Metasys Technologies
Johnston, RI
About the Job
Senior Specialist - Architecture (Site Reliability/Observability Engineer (SREs).)
Johnston, RI (4 - 5 days onsite/week)
Perm Position
Objectives of this role:
- Run the production environment by monitoring availability and taking a holistic view of system health.
- Build software and systems to manage platform infrastructure and applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
- Provide primary operational support and engineering for multiple large-scale distributed software applications.
Responsibilities:
- Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.
- Partner with development team, Data Scientist, MLOps Architect/Engineers to improve services through rigorous testing and release procedures.
- Participate in system design consulting, platform management, troubleshooting production issues, and capacity planning.
- Create/manage sustainable systems and services through automation and uplifts.
- Balance feature development speed and reliability with well-defined service-level objectives.
Required skills and qualifications:
- Ability to program (structured and OOP) using one or more high-level languages, such as Python, Java, C/C++, Ruby, and JavaScript.
- Experience in working with cloud services such as Amazon S3, Sagemaker, Amazon Bedrock.
- Excellent knowledge working with cloud-native infrastructure, such as AWS Lambda, OpenShift.
- Good understanding of API management and should be able to troubleshoot API related issues.
- Automation Mindset to manage cloud infrastructure using AWS CloudFormation/Terraform.
- Impeccable creative and communication skills.
- Ability to problem solve in a fast-paced, high-stakes environment.
- Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
- Comprehensive Medical Plan Covering Medical, Dental, Vision
- Short Term and Long-Term Disability Coverage
- 401(k) Plan with Company match
- Life Insurance
- Vacation Time, Sick Leave, Paid Holidays
- Paid Paternity and Maternity Leave
Source : Metasys Technologies