Senior Software Engineers - Data License Service Reliability - Bloomberg
New York, NY 10261
About the Job
At Bloomberg, we deliver billions of data points to hundreds of thousands of customers every day (and growing) enabling them to make informed business decisions. It is paramount that our customers have reliable access to our services to receive market moving data when they need it. As our customers' needs continue to increase and evolve, our mission is to leverage software engineering, collaboration, and automation to handle the demand and exceed expectations. This is where you come in.
Through software engineering and collaboration with other engineering teams, members of the Data License Reliability Engineering team design, build solutions, and incorporate industry-standard best practices to ensure Data License services run reliably for our customers - increasing the observability of Data License services, automating capacity management of production infrastructure, and reducing the time it takes to resolve issues through deployments and incident response automation. Our team builds and maintains a full-stack application, Synthetic Requests and Notification system (aka SyReN) that monitors Data License services end-to-end, generating service level metrics and triggering alerts when issues are detected.
Additionally, our team routinely assesses and tests Data License services overall through automated game days and proactive chaos testing to ensure the highest level of service reliability is delivered to our customers.
We'll trust you to:
You'll need to have:
We'd love to see:
Through software engineering and collaboration with other engineering teams, members of the Data License Reliability Engineering team design, build solutions, and incorporate industry-standard best practices to ensure Data License services run reliably for our customers - increasing the observability of Data License services, automating capacity management of production infrastructure, and reducing the time it takes to resolve issues through deployments and incident response automation. Our team builds and maintains a full-stack application, Synthetic Requests and Notification system (aka SyReN) that monitors Data License services end-to-end, generating service level metrics and triggering alerts when issues are detected.
Additionally, our team routinely assesses and tests Data License services overall through automated game days and proactive chaos testing to ensure the highest level of service reliability is delivered to our customers.
We'll trust you to:
- Take a "solve this with automation" approach to challenges and issues related to service reliability
- Improve the observability of applications, services, and infrastructure systems to help teams understand system performance
- Design and propose improvements to software solutions used to measure and monitor the performance of applications and services
- Collaborate with application development teams to identify gaps that negatively impact service reliability; including improving monitoring, capacity management, and incident response workflows.
- Manage a high-quality and robust production platform and promotion pipeline to ensure available capacity and resources for services
- Reduce human toil through automation of manual tasks, steps, and workflows
- Work collaboratively with the team to accomplish goals within an agile software development lifecycle
You'll need to have:
- 4+ years of experience working with an object-oriented programming language (C/C++, Python, Java, etc.)
- A Degree in Computer Science, Engineering, Mathematics, similar field of study or equivalent work experience
- Preference for data-driven approach to decision making
- Creative problem solving approaches that account for existing services, environment and resource limit constraints
- Demonstrated understanding or experience working in all levels of the technical stack, from applications to underlying computing infrastructure and machine hardware
- Willingness to learn new technologies and adapt to changing priorities
We'd love to see:
- Containerization technologies (Docker, Kubernetes, Mesos)
- Chaos testing or similar experience to validate reliability
- Infrastructure as code and configuration management tools
- Defining and measuring service level indicators and service level objectives for applications, services and infrastructure
Source : Bloomberg