GCP Data Engineer - Global Commerce & Information, Inc.
St.Louis, MO 63134
About the Job
Your Success is Our Success. Global CI is an award-winning 30-year IT Services company founded on the principles of providing high-quality, value-added technology consulting services. Our vision is to create a better future by improving the lives of the people we serve through emerging technologies. Join us and together we will advance the future of technology services.
Global CI offers competitive compensation and non-salary benefits to all eligible employees.
Job Description
The GCP Data Engineer will be responsible for constructing and developing large-scale cloud data processing systems within the Google Cloud Platform (GCP). This role involves curating a comprehensive data set that includes information about users, groups, and their permissions to various data sets. The engineer will redesign and implement a scalable data pipeline to ensure timely updates and transparency in data access.
Key Responsibilities:
" Design, develop, and implement scalable, high-performance data solutions on GCP.
" Curate and manage a comprehensive data set detailing user permissions and group memberships.
" Redesign the existing data pipeline to improve scalability and reduce processing time.
" Ensure that changes to data access permissions are reflected in the Tableau dashboard within 24 hours.
" Collaborate with technical and business users to share and manage data sets across multiple projects.
" Utilize GCP tools and technologies to optimize data processing and storage.
" Re-architect the data pipeline that builds the BigQuery dataset used for GCP IAM dashboards to make it more scalable.
" Run and customize DLP scans.
" Build bidirectional integrations between GCP and Collibra.
" Explore and potentially implement Dataplex and custom format-preserving encryption for de-identifying data for developers in lower environments.
Required Skills :
" Bachelor's degree in Computer Engineering or a related field.
" 5+ years of experience in an engineering role using Python, Java, Spark, and SQL.
" 5+ years of experience working as a Data Engineer in GCP.
" Proficiency with Google's Identity and Access Management (IAM) API.
" Strong Linux/Unix background and hands-on knowledge.
" Experience with big data technologies such as HDFS, Spark, Impala, and Hive.
" Experience with Shell scripting and bash.
" Experience with version control platforms like GitHub.
" Experience with unit testing code.
" Experience with development ecosystems including Jenkins, Artifactory, CI/CD, and Terraform.
" Demonstrated proficiency with Airflow.
" Excellent written and verbal communication skills.
" Ability to understand and analyze complex data sets.
" Ability to exercise independent judgment on moderately complex issues.
" Ability to make recommendations to management on new processes, tools, and techniques.
" Ability to work under minimal supervision and use independent judgment requiring analysis of variable factors.
" Ability to collaborate with senior professionals in the development of methods, techniques, and analytical approaches.
" Ability to advise management on approaches to optimize for data platform success.
" Ability to effectively communicate highly technical information to various audiences, including management, the user community, and less-experienced staff.
" Proficiency in multiple programming languages, frameworks, domains, and tools.
" Coding skills in Scala.
" Experience with GCP platform development tools such as Pub/Sub, Cloud Storage, Bigtable, BigQuery, Dataflow, Dataproc, and Composer.
" Knowledge in Hadoop and cloud platforms and surrounding ecosystems.
" Experience with web services and APIs (RESTful and SOAP).
" Ability to document designs and concepts.
" API Orchestration and Choreography for consumer apps.
" Well-rounded technical expertise in Apache packages and hybrid cloud architectures.
" Pipeline creation and automation for data acquisition.
" Metadata extraction pipeline design and creation between raw and transformed datasets.
" Quality control metrics data collection on data acquisition pipelines.
" Ability to collaborate with scrum teams including scrum master, product owner, data analysts, Quality Assurance, business owners, and data architecture to produce the best possible end products.
" Experience contributing to and leveraging Jira and Confluence.
" Strong experience working with real-time streaming applications and batch-style large-scale distributed computing applications using tools like Spark, Kafka, Flume, Pub/Sub, and Airflow.
" Ability to work with different file formats like Avro, Parquet, and JSON.
" Managing and scheduling batch jobs.
" Hands-on experience in Analysis, Design, Coding, and Testing phases of the Software Development Life Cycle (SDLC).
Benefits include:
- Comprehensive medical, dental, vision, life, and short & long-term disability insurance + health savings account
- Matching 401k retirement plan + IRA's and Roth IRA's
- Generous paid time off and paid holidays
- Employee recruitment/referral bonus
- Paid community service hours
- Tuition reimbursement
- Employee discounts
At Global Commerce & Information, Inc. we celebrate, support, and are committed to creating a diverse and inclusive environment. We're proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, veteran status, or any other legally protected characteristics.
Global Commerce & Information, Inc maintains a drug-free workplace.