Key Responsibilities
Data Engineering & Pipeline Development
Design, develop, and optimize ETL/ELT pipelines using Azure Databricks (PySpark).
Build scalable data ingestion workflows from various structured and unstructured sources.
Implement transformation logic, data cleansing, enrichment, and validation frameworks.
Work with Delta Lake to build medallion architecture (Bronze/Silver/Gold layers).
Develop reusable Databricks notebooks and jobs for production data workflows.
Azure Cloud & Integration
Build and orchestrate pipelines using Azure Data Factory (ADF).
Integrate Databricks with other Azure services—ADLS, Azure SQL, Event Hub, Key Vault, Synapse.
Optimize compute environments (clusters, pools, autoscaling).
Implement DevOps processes using Git, CICD, Azure DevOps.
Performance, Quality & Governance
Optimize PySpark jobs for performance and cost efficiency.
Implement best practices for data governance, security, and access control.
Troubleshoot production issues and perform root-cause analysis.
Conduct code reviews ensuring coding standards and data quality.
Collaboration & Documentation
Work with Data Architects to define architecture and design patterns.
Prepare technical documents, solution diagrams, and runbooks.
Collaborate with business stakeholders to understand requirements and translate them into technical solutions.
Azure Databricks – notebooks, jobs, workflows, Delta Lake.
PySpark – dataframes, Spark SQL, optimization & debugging.
Azure Data Factory (ADF) – triggers, pipelines, integration runtime.
Data Lake Storage (ADLS Gen2) – folder structures, partitioning, security.
CI/CD – Git (branching strategies), Azure DevOps pipelines.
SQL – strong proficiency in writing optimized queries
Jobseeker
Recruiter