Data Engineer - Ampcus Incorporated
Mountain View, CA
About the Job
Bravens Inc., a wholly owned subsidiary of Ampcus Inc., is an information technology consulting and services company. Bravens is a leader in providing tailored staffing solutions across both IT and non-IT industries. We are in search of a highly motivated candidate to join our talented team and contribute to our ongoing success.
Job Title: Data Engineer
Location(s): Mountain View, CA
Must have:
· Strong hands-on experience with Apache Spark for data processing and analytics.
· Proficiency in writing advanced SQL queries, including complex joins, aggregations, and window functions.
· Familiarity with Spark components such as Spark SQL, Spark Streaming, and PySpark.
· Understanding of distributed computing concepts and Spark architecture (e.g., RDDs, DAGs, partitions).
· Experience working with large datasets, data lakes, and data warehouses.
· Knowledge of file formats like Parquet, Avro, and ORC.
· Proven ability to optimize Spark jobs and SQL queries for efficiency and scalability.
· Strong problem-solving skills with attention to detail.
· Ability to collaborate effectively with cross-functional teams.
· Excellent communication skills for sharing insights and progress with stakeholders.
· Knowledge of Python, Scala, or Java for Spark application development
Good to have:
· Experience with big data ecosystems like Hadoop, Hive, or HBase.
· Familiarity with workflow orchestration tools such as Apache Airflow or Luigi.
· Knowledge of NoSQL databases like MongoDB, Cassandra, or Elasticsearch.
· Experience deploying Spark jobs on cloud platforms (e.g., AWS EMR, Azure Synapse, or Google Dataproc)
· Familiarity with cloud data platforms like Snowflake, BigQuery, or Redshift.
· Scripting experience for automating repetitive tasks.
· Familiarity with monitoring tools like Prometheus, Grafana, or Spark’s built-in UI.
· Hands-on experience with debugging tools for Spark and SQL processes.
· Relevant certifications in big data (e.g., Databricks Certified Associate, Cloudera Certified Developer).
· Understanding of industry-specific data needs, such as finance, healthcare, or retail analytics.
What You'll Do
· Build and maintain distributed data processing pipelines using Apache Spark.
· Write efficient SQL queries to extract, transform, and analyze large datasets.
· Perform data cleansing, validation, and enrichment to ensure high-quality datasets.
· Optimize Spark jobs for performance, including tuning Spark configurations and improving query efficiency.
· Implement partitioning, caching, and indexing strategies for large-scale data processing.
· Develop and manage ETL workflows to process data from various sources into data lakes or warehouses.
· Collaborate with data engineers to integrate data from structured and unstructured sources.
· Monitor Spark jobs and cluster performance, addressing bottlenecks and failures.
· Troubleshoot SQL queries and Spark processes to resolve performance and accuracy issues.
· Work closely with data engineers, analysts, and stakeholders to understand data requirements.
· Present findings and insights derived from large datasets to business teams.
· Document workflows, best practices, and troubleshooting guides for Spark and SQL usage.
Bravens is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identify, national origin, age, protected veterans or individuals with disabilities.