Site Reliability Engineer - CareerBuilder Premium Subscription
Boca Raton, FL
About the Job
Job description:
- We are a cutting edge biomedical startup that is preparing for our first product release. This is a unique opportunity to be on the ground floor of a rapidly growing biomedical company. We are a tight-knit, agile group with many capable engineering, medical, and business personnel on the team and board alike. We are looking to further expand our team by adding a strong software development arm to the company.
Current Project:
- Client's Humero Tech C1 changes the way shoulder injuries are rehabilitated with our innovative strength-building and sensor based technology. Our rotator cuff machine tracks patients' efforts as they work through strength-based exercises. At the end of sessions, the user gets a set of in-depth metrics to help inform the next steps for recovery.
- Client is at the very beginning of device rollout into the field, and thus Titin is searching for a talented Site Reliably Engineer to ensure customers have a smooth experience while working with our software and their data.
- Additionally, a strong and positive personality is critical because this person will inevitably be communicating directly with our customers.
System Monitoring and Incident Management
- Set up and maintain monitoring tools to track system performance, availability, and reliability.
- Respond to incidents, troubleshoot issues, and ensure fast recovery to minimize downtime.
- Implement alerting mechanisms to proactively identify potential issues before they impact end users.
Automation and Efficiency
- Automate manual operations and repetitive tasks to improve system reliability and speed.
- Write scripts and create tools to streamline deployment, monitoring, and scaling processes.
- Work with Continuous Integration/Continuous Deployment (CI/CD) management tools.
Infrastructure Management
- Manage cloud infrastructure to ensure system reliability and scalability.
- Monitor and maintain these systems to comply with HIPPA and SOC 2 requirements.
Performance Optimization
- Analyze system performance and work on tuning to meet predefined service level objectives (SLOs).
- Optimize resource usage, including compute, memory, and storage, to ensure cost-efficiency without sacrificing performance.
Disaster Recovery and High Availability
- Develop, test, and implement disaster recovery plans.
- Ensure high availability by using redundancy, failover mechanisms, and geographical distribution of systems.
Security and Compliance
- Implement security best practices to safeguard data and systems.
- Ensure compliance with industry regulations and internal security policies.
- Cooperate and respond with necessary compliance Audits.
Collaboration and Communication
- Work closely with development teams to integrate reliability into the software development lifecycle.
- Participate in post-incident reviews to identify root causes and prevent future occurrences.
- Provide technical support to teams and help to build a culture of reliability across the organization.
Documentation
- Document incident response processes, infrastructure architecture, and SRE best practices.
- Maintain clear, accessible records for troubleshooting, deployments, and maintenance tasks.
- Generate work instructions to document tasks and enable smooth team expansion.
Continuous Improvement
- Identify opportunities for process improvements and performance enhancements.
- Keep up to date with the latest technology trends and industry practices, and adopt relevant innovations.
Application Question(s):
- Past Projects Portfolio
Education:
- High school or equivalent (Required)
- Undergraduate or equivalent experience (Preferred)
- AWS Certifications (Preferred)
Required Experience:
- Experience with AWS
- Experience with Python
- Experience with SQL / Databases
- Knowledge of managing cloud-based infrastructure, networking, and storage.
- Ability to write automation scripts for deployment, monitoring, and scaling.
Preferred Experience:
- Experience with Linux/Unix systems
- Experience with version control systems like Git.
- Understanding of AWS IAM
- Expertise in system administration tasks, such as patching, user management, and system performance tuning.
- Familiarity with securing infrastructure, including access control, encryption, and vulnerability management.
Source : CareerBuilder Premium Subscription