Data Engineer - Clinical Decision Support Solutions | WContract Acceptablw - Contract - TalentBurst, Inc.
West Sacramento, CA
About the Job
Title: Data Engineer - Clinical Decision Support Solutions
Duration: 6 months with possible extension
Location: West Sacramento, CA (Hybrid Work)
Pay Rates: W2 Acceptable (no C2C)
Job Description:
Seeking a team member to join our Microbiology R&D Development Science functional team. In this role you will develop and maintain end-to-end data and machine Learning pipelines for clinical and verification studies.
We're looking for associates who thrive in a team-oriented, goal focused environment.
The Data Engineer is responsible for development and implementation of end-to-end Ops pipelines to support ML model deployment throughout the entire ML lifecycle. This position is part of the data science located in Sacramento, California and will be a hybrid role. The data engineer will be a part of the development science functional group and report to the data science manager. If you thrive in a cross functional team and want to work to build a world-class biotechnology organization—read on.
Responsibilities
• Collaborate with stakeholders to understand data requirements for ML, Data Science and Analytics projects.
• Assemble large, complex data sets from disparate sources, writing code, scripts, and queries, as appropriate to efficiently extract, QC, clean, harmonize and visualize Big Data sets.
• Write pipelines for optimal extraction, transformation, and loading of data from a wide variety of data sources using Python, SQL, Spark, AWS 'big data' technologies.
• Develop and Design data schemas to support Data Science team development needs
• Identify, design, and implement continuous process improvements such as automating manual processes and optimizing data delivery.
• Design, Develop and maintain a dedicated ML inference pipeline on AWS platform (SageMaker, EC2, etc.)
• Deployment of inference on a dedicated EC2 instance or Amazon SageMaker
• Establish a data pipeline to store and maintain inference output results to track model performance and KPI benchmarks
• Document data processes, write data management recommended procedures, and create training materials relating to data management best practices.
Required Qualifications
• BS or MS in Computer Science, Computer Engineering, or equivalent experience.
• 5-7 years of Data and MLOps experience developing and deploying Data and ML pipelines.
• 5 years of experience deploying ML models via AWS SageMaker, AWS Bedrock.
• 5 years of programming and scripting experience utilizing Python, SQL, Spark.
• Deep knowledge of AWS core services such as RDS, S3, API Gateway, EC2/ECS, Lambda etc
• Hands-on experience with model monitoring, drift detection, and automated retraining processes
• Hands-on experience with CI/CD pipeline implementation using tools like GitHub (Workflows and Actions), Docker, Kubernetes, Jenkins, Blue Ocean
• Experience working in an Agile/Scrum based software development structure
• 5-years of experience with data visualization and/or API development for data science users
Duration: 6 months with possible extension
Location: West Sacramento, CA (Hybrid Work)
Pay Rates: W2 Acceptable (no C2C)
Job Description:
Seeking a team member to join our Microbiology R&D Development Science functional team. In this role you will develop and maintain end-to-end data and machine Learning pipelines for clinical and verification studies.
We're looking for associates who thrive in a team-oriented, goal focused environment.
The Data Engineer is responsible for development and implementation of end-to-end Ops pipelines to support ML model deployment throughout the entire ML lifecycle. This position is part of the data science located in Sacramento, California and will be a hybrid role. The data engineer will be a part of the development science functional group and report to the data science manager. If you thrive in a cross functional team and want to work to build a world-class biotechnology organization—read on.
Responsibilities
• Collaborate with stakeholders to understand data requirements for ML, Data Science and Analytics projects.
• Assemble large, complex data sets from disparate sources, writing code, scripts, and queries, as appropriate to efficiently extract, QC, clean, harmonize and visualize Big Data sets.
• Write pipelines for optimal extraction, transformation, and loading of data from a wide variety of data sources using Python, SQL, Spark, AWS 'big data' technologies.
• Develop and Design data schemas to support Data Science team development needs
• Identify, design, and implement continuous process improvements such as automating manual processes and optimizing data delivery.
• Design, Develop and maintain a dedicated ML inference pipeline on AWS platform (SageMaker, EC2, etc.)
• Deployment of inference on a dedicated EC2 instance or Amazon SageMaker
• Establish a data pipeline to store and maintain inference output results to track model performance and KPI benchmarks
• Document data processes, write data management recommended procedures, and create training materials relating to data management best practices.
Required Qualifications
• BS or MS in Computer Science, Computer Engineering, or equivalent experience.
• 5-7 years of Data and MLOps experience developing and deploying Data and ML pipelines.
• 5 years of experience deploying ML models via AWS SageMaker, AWS Bedrock.
• 5 years of programming and scripting experience utilizing Python, SQL, Spark.
• Deep knowledge of AWS core services such as RDS, S3, API Gateway, EC2/ECS, Lambda etc
• Hands-on experience with model monitoring, drift detection, and automated retraining processes
• Hands-on experience with CI/CD pipeline implementation using tools like GitHub (Workflows and Actions), Docker, Kubernetes, Jenkins, Blue Ocean
• Experience working in an Agile/Scrum based software development structure
• 5-years of experience with data visualization and/or API development for data science users
Source : TalentBurst, Inc.