Design, develop, and maintain end-to-end data pipelines in Databricks using Spark and Delta Lake
Build and optimize ELT/ETL processes for structured and unstructured data ingestion into the Data Lakehouse
Implement scalable ingestion patterns (batch and event-driven) from internal systems, third-party APIs, and cloud sources
Develop data models (bronze, silver, gold layers) to support enterprise reporting, analytics, and downstream consumption
Data Platform & Integration Integrate the Data Lakehouse with enterprise tools such as Tableau, Alteryx, and machine learning platforms
Design and implement data access controls, identity management, and secure data sharing mechanisms
Support API-based integrations and downstream data consumption patterns
Data Quality, Governance & Controls Implement data quality checks, reconciliation processes, and monitoring within Databricks pipelines
Ensure adherence to enterprise data governance standards, including lineage, metadata, and audit requirements
Support regulatory and compliance requirements (e.g., data integrity, privacy, and security controls)
Cloud & Automation Develop and manage workflows using orchestration tools (e.g., Airflow, Control-M)
Automate data pipelines, deployments, and operational processes through CI/CD pipelines Leverage cloud-native services (AWS/Azure) for data processing, storage, and event-driven architectures
Operations & SupportMonitor, troubleshoot, and optimize data pipelines and Spark workloads for performance and reliability
Support production data platforms, including incident resolution and root cause analysis
Ensure high availability, data integrity, and SLA adherence across enterprise data systems
Collaboration
Partner with data architects, data scientists, BI teams, and business stakeholders to deliver data solutions
Participate in Agile ceremonies and contribute to iterative delivery of data products
Translate business requirements into scalable technical data solutions
Required Qualifications
3+ years of experience in data engineering, data platforms, or related roles
Hands-on experience with Databricks, Apache Spark (PySpark), and Delta Lake
Strong SQL and data modeling skills (relational and dimensional)
Experience building and supporting data pipelines in a cloud environment (AWS or Azure)
Experience with ELT/ETL tools (e.g., Fivetran, custom ingestion frameworks)
Familiarity with data orchestration tools (Airflow, Control-M)
Experience working in Agile development environments
Experience in financial services or regulated environments (e.g., banking, risk, regulatory reporting)
Knowledge of data governance frameworks and tools (e.g., Collibra)
Experience with real-time or streaming data pipelines
Exposure to machine learning pipelines and feature engineering in Databricks