Location: Alpharetta, GA
WM Performance Engineering team is a centralized team responsible for performance, scalability and reliability of all applications and platforms in Brokerage Wealth Management division. This position is that of a lead Resiliency / Chaos Test Lead Engineer that will act as a horizontal to the existing Performance Engineering team that caters to vertical business silos. The lead Resiliency / Chaos Test Lead Engineer is to plan, execute and report on Resiliency / Chaos Engineering for critical platforms and applications as well as guide the rest of the team on Best Practices to follow.
Act as a very hands-on lead for conducting Resiliency / Chaos Engineering experiments.
Analyze end to end application architecture and determine failure points for Cloud as well as On-Prem hosted applications
Partner with various stakeholders to formulate Resiliency Scenarios / Hypothesis.
Define success criteria for testing results and outcomes.
Ensure monitoring / alerting solutions are in place prior to test executions.
Identify and remediate Resiliency issues identified, coordinate with various partners in application, architecture, production support and infra teams as needed.
Ensure observability of the platforms and applications and identify gaps where found.
Track manage and report on the findings.
Educate the rest of the team on the Best Practices and standard procedures to follow.
Evaluate how to scale Resiliency Testing, automated as much as possible to do more with less.
Establish a strong presence as a change agent providing innovative, effective, and efficient Resiliency Testing / Chaos Engineering practices and solutions.
10 + yrs Experience in the Technology with background in Architecture or Performance Engineering.
Ability to multitask and handle large workloads.
Experience working with Agile/Scrum methodology.
Experience working in Azure Cloud ( AKS / ASE / APIM / Redis etc)
Must possess a strong knowledge of enterprise application architecture and technologies including web, web services and databases on both cloud and distributed platforms as well mainframe and messaging layers. Must also have a strong understanding of the monitoring solutions and KPIs used to determine availability, performance and reliability of these technology stacks.
Ability to understand complex architecture patterns and failure points.
Possess great analytical and troubleshooting skills to find and remediate Resiliency issues.
Knowledge of Load Generation tools like Performance Center and JMeter.
Knowledge of Monitoring Tools such as App Dynamics, Dynatrace and Splunk.Ability to parse raw Application
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.