Lead Data Engineer at TRINITY INFOTECH INC
About the Job
We are looking for lead big data developer for ETL project in our enterprise Transparency Services group. Big data developer will work on designing, ingesting, storing, validating and disseminating, after transforming data in a consumable format, for business intelligence teams and data analysts to get deeper business insight from the data.
Job Responsibilities
• Understand complex business requirements
• Design and develop ETL pipeline for collecting, validating and transforming data according to the specification
• Develop automated unit tests, functional tests and performance tests.
• Maintain optimal data pipeline architecture
• Design ETL jobs for optimal execution in AWS cloud environment
• Reduce processing time and cost of ETL workloads
• Lead peer reviews and design/code review meetings
• Provide support for production support operations team
• Implement data quality checks.
• Identify areas where machine learning can be used to identify data anomalies
Experience & Qualifications
• 7+ years of experience in programming language Java or Scala
• 7+ years of experience in ETL projects
• 5+ years of experience in big data projects
• 3+ years of experience with API development (REST API’s)
• Believes in Scrum/Agile, and has deep experience delivering software when working on teams that use Scrum/Agile methodology
• Strong and creative analytical and problem-solving skills
Required Technical Skills & Knowledge
• Strong experience in Java or Scala
• Strong experience in big data technologies like AWS EMR, AWS EKS, Apache Spark
• Strong experience with serverless technologies like AWS Dynamo DB, AWS Lambda
• Strong experience in processing with JSON and csv files
• Must be able to write complex SQL queries
• Experience in performance tuning and optimization
• Familiar with columnar storage formats (ORC, Parquet) and various compression techniques
• Experience in writing Unix shell scripts
• Unit testing using JUnit or ScalaTest
• Git/Maven/Gradle
• Code Reviews
• Experience with CI/CD pipelines
• Agile
The following skills a plus:
• AWS Cloud
• BPM/ AWS Step Functions
• Python scripting
• Performance testing tools like Gatling or JMeter
Nice to have skills:
- AWS Aurora
- Data testing