Data Engineer - Chenoa Information Services
New York, NY 10004
About the Job
Job Title: Data Engineer
Office Location: Bethlehem, PA
Work Location: This is a hybrid role and the selected resource will be required to work onsite in the Bethlehem, PA office a minimum of three days per week. Local candidates only.
Seeking an experienced Data Engineer to be part of our Enterprise Data and Analytics organization. You will be playing a key role in building and delivering best-in-class data and analytics solutions aimed at creating value and impact for the organization and our customers. As a member of the data engineering team, you will help with the development and delivery of Data Products with quality backed by best-in-class engineering. You will collaborate with analytics partners, business partners and IT partners to enable the solutions.
You will:
• Architect, build, and maintain scalable and reliable data pipelines including robust data quality as part of data pipeline which can be consumed by analytics and BI layer.
• Design, develop and implement low-latency, high-availability, and performant data applications and recommend & implement innovative engineering solutions.
• Design, develop, test and debug code in Python, SQL, PySpark, bash scripting as per Client standards.
• Design and implement data quality framework and apply it to critical data pipelines to make the data layer robust and trustworthy for downstream consumers.
• Design and develop orchestration layer for data pipelines which are written in SQL, Python and PySpark.
• Apply and provide guidance on software engineering techniques like design patterns, code refactoring, framework design, code reusability, code versioning, performance optimization, and continuous build and Integration (CI/CD) to make the data analytics team robust and efficient.
• Performing all job functions consistent with Client policies and procedures, including those which govern handling PHI and PII.
• Work closely with various IT and business teams to understand systems opportunities and constraints for maximally utilizing Client Enterprise Data Infrastructure.
• Develop relationships with business team members by being proactive, displaying an increasing understanding of the business processes and by recommending innovative solutions.
• Communicate project output in terms of customer value, business objectives, and product opportunity.
You have:
• 5+ years of experience with Bachelors / master's degree in computer science, Engineering, Applied mathematics or related field.
• Extensive hands-on development experience in Python, SQL and Bash.
• Extensive Experience in performance optimization of data pipelines.
• Extensive hands-on experience working with cloud data warehouse and data lake platforms like Databricks, Redshift or Snowflake.
• Familiarity with building and deploying scalable data pipelines to develop and deploy Data Solutions using Python, SQL, PySpark.
• Extensive experience in all stages of software development and expertise in applying software engineering best practices.
• Extensive experience in developing end-to-end orchestration layer for data pipelines using frameworks like Apache Airflow, Prefect, Databricks Workflow.
• Familiar with :
RESTful Webservices (REST APIs) to be able to integrate with other services.
API Gateways like APIGEE to secure webservice endpoints.
Data pipelines, Concurrency and parallelism.
• Experience in creating and configuring continuous integration/continuous deployment using pipelines to build and deploy applications in various environments and use best practices for DevOps to migrate code to Production environment.
• Ability to investigate and repair application defects regardless of component: front-end, business logic, middleware, or database to improve code quality, consistency, delays and identify any bottlenecks or gaps in the implementation.
• Ability to write unit tests in python using unit test library like pytest.
Additional Qualifications (nice to have):
• Experience in using and implementing data observability platforms like Monte Carlo Data, Metaplane, Soda, bigeye or any other similar products.
• Expertise in debugging issues in Cloud environment by monitoring logs on the VM or using AWS features like Cloudwatch.
• Experience with DevOps tech stack like Jenkins and Terraform.
• Experience working with concept of Observability in software world and experience with tools like Splunk, Zenoss, Datadog or similar.
• Experience in developing and implementing Data Quality framework either home grown or using any open-source frameworks like Great Expectations, Soda, Deequ.
• Ability to learn and adapt to new concepts and frameworks and create proof of concept using newer technologies.
• Ability to use agile methodology throughout the development lifecycle and provide updates on regular basis, escalating issues, or delays in a timely manner.
Office Location: Bethlehem, PA
Work Location: This is a hybrid role and the selected resource will be required to work onsite in the Bethlehem, PA office a minimum of three days per week. Local candidates only.
Seeking an experienced Data Engineer to be part of our Enterprise Data and Analytics organization. You will be playing a key role in building and delivering best-in-class data and analytics solutions aimed at creating value and impact for the organization and our customers. As a member of the data engineering team, you will help with the development and delivery of Data Products with quality backed by best-in-class engineering. You will collaborate with analytics partners, business partners and IT partners to enable the solutions.
You will:
• Architect, build, and maintain scalable and reliable data pipelines including robust data quality as part of data pipeline which can be consumed by analytics and BI layer.
• Design, develop and implement low-latency, high-availability, and performant data applications and recommend & implement innovative engineering solutions.
• Design, develop, test and debug code in Python, SQL, PySpark, bash scripting as per Client standards.
• Design and implement data quality framework and apply it to critical data pipelines to make the data layer robust and trustworthy for downstream consumers.
• Design and develop orchestration layer for data pipelines which are written in SQL, Python and PySpark.
• Apply and provide guidance on software engineering techniques like design patterns, code refactoring, framework design, code reusability, code versioning, performance optimization, and continuous build and Integration (CI/CD) to make the data analytics team robust and efficient.
• Performing all job functions consistent with Client policies and procedures, including those which govern handling PHI and PII.
• Work closely with various IT and business teams to understand systems opportunities and constraints for maximally utilizing Client Enterprise Data Infrastructure.
• Develop relationships with business team members by being proactive, displaying an increasing understanding of the business processes and by recommending innovative solutions.
• Communicate project output in terms of customer value, business objectives, and product opportunity.
You have:
• 5+ years of experience with Bachelors / master's degree in computer science, Engineering, Applied mathematics or related field.
• Extensive hands-on development experience in Python, SQL and Bash.
• Extensive Experience in performance optimization of data pipelines.
• Extensive hands-on experience working with cloud data warehouse and data lake platforms like Databricks, Redshift or Snowflake.
• Familiarity with building and deploying scalable data pipelines to develop and deploy Data Solutions using Python, SQL, PySpark.
• Extensive experience in all stages of software development and expertise in applying software engineering best practices.
• Extensive experience in developing end-to-end orchestration layer for data pipelines using frameworks like Apache Airflow, Prefect, Databricks Workflow.
• Familiar with :
RESTful Webservices (REST APIs) to be able to integrate with other services.
API Gateways like APIGEE to secure webservice endpoints.
Data pipelines, Concurrency and parallelism.
• Experience in creating and configuring continuous integration/continuous deployment using pipelines to build and deploy applications in various environments and use best practices for DevOps to migrate code to Production environment.
• Ability to investigate and repair application defects regardless of component: front-end, business logic, middleware, or database to improve code quality, consistency, delays and identify any bottlenecks or gaps in the implementation.
• Ability to write unit tests in python using unit test library like pytest.
Additional Qualifications (nice to have):
• Experience in using and implementing data observability platforms like Monte Carlo Data, Metaplane, Soda, bigeye or any other similar products.
• Expertise in debugging issues in Cloud environment by monitoring logs on the VM or using AWS features like Cloudwatch.
• Experience with DevOps tech stack like Jenkins and Terraform.
• Experience working with concept of Observability in software world and experience with tools like Splunk, Zenoss, Datadog or similar.
• Experience in developing and implementing Data Quality framework either home grown or using any open-source frameworks like Great Expectations, Soda, Deequ.
• Ability to learn and adapt to new concepts and frameworks and create proof of concept using newer technologies.
• Ability to use agile methodology throughout the development lifecycle and provide updates on regular basis, escalating issues, or delays in a timely manner.
Source : Chenoa Information Services