Technical Escalation Engineer at VAST Data
Dallas, TX 75215
About the Job
VAST Data is looking for a Technical Escalation Engineer to join our growing team!
This is a great opportunity to be part of one of the fastest-growing infrastructure companies in history, an organization that is in the center of the hurricane being created by the revolution in artificial intelligence.
"VASTs data management vision is the future of the market." - Forbes
VAST Data is the data platform company for the AI era. We are building the enterprise software infrastructure to capture, catalog, refine, enrich, and protect massive datasets and make them available for real-time data analysis and AI training and inference. Designed from the ground up to make AI simple to deploy and manage, VAST takes the cost and complexity out of deploying enterprise and AI infrastructure across data center, edge, and cloud.
Our success has been built through intense innovation, a customer-first mentality and a team of fearless VASTronauts who leverage their skills & experiences to make real market impact. This is an opportunity to be a key contributor at a pivotal time in our company’s growth and at a pivotal point in computing history.
Summary
As a Technical Escalation Engineer, you will be responsible for monitoring and maintaining the health and performance of our fleet of installed clusters. You will work in a 24/7 network operations center-style environment, ensuring the availability, reliability, and security of services. This role involves real-time monitoring, incident detection, incident management, incident resolution, and clear written and verbal communication with other teams and stakeholders.
The Role
- Monitor clusters using internal monitoring tools to detect and troubleshoot issues promptly.
- Respond to alerts and incidents in a timely manner, following standard operating procedures (SOPs) and escalation processes.
- Perform initial investigation and diagnosis of problems, escalating complex issues to support.
- Document incidents, including their details, troubleshooting steps, and resolutions in the incident tracking system.
- Collaborate with other teams, including Support, R&D, Account teams, and customers to ensure effective incident resolution and communication.
- Conduct routine checks and audits to identify potential problems or vulnerabilities.
- Assist with the implementation of changes and updates to the infrastructure as directed by team leads.
- Assist with writing Root Cause Analysis documentation, and delivering to customers within prescribed timelines.
- Participate in shift-based work schedules, including nights, weekends, and holidays, to provide 24/7 coverage.
- Maintain up-to-date knowledge of VAST Data Platform technologies via prescribed hands-on training modules.
- Adhere to security protocols and ensure the confidentiality, integrity, and availability of network and system data.
- Provide excellent customer service to internal and external stakeholders during incident resolution and communication.
Requirements
- Proven experience as a NOC Operator or in a similar network monitoring role is preferred.
- Superior communication skills, both written and verbal, to interact with technical and non-technical stakeholders.
- Strong understanding of networking concepts, protocols, and technologies (TCP/IP, SNMP, DHCP, DNS, etc.).
- Ability to work independently and collaboratively in a team-based environment.
- Excellent problem-solving and analytical skills, with the ability to multitask effectively.
- Willingness to work in a 24/7 shift-based environment, including nights, weekends, and holidays. Option for Wednesday – Saturday shift, Sunday -Wednesday, or Monday – Friday.
- Detail-oriented and committed to maintaining accurate documentation.
- Demonstrated commitment to continuous learning and self-improvement