Required Skills: Python, AWS, Cloud Computing, PySpark
Job Description
Extensive experience with AWS services: EMR, S3, Redshift, Glue, Lambda, Step Functions, DynamoDB, RDS, Athena, EventBridge, API Gateway, and SNS.
• Expert in ETL concepts, with a strong background in AWS Glue and data pipeline orchestration.
• Strong experience with PySpark and Kafka for building data streams and batch processing systems.
• In-depth knowledge of data partitioning and Parquet files for efficient data storage and querying.
• Strong experience with SQL, including writing complex queries, and working with databases like Redshift and Snowflake.
• Proficiency in DevOps concepts, with hands-on experience in CI/CD pipelines, Docker, and Terraform.
• Excellent understanding of data lake, data warehouse, and data lake house concepts.
• Proven experience leading teams, mentoring engineers, and managing end-to-end technical implementations.
• Experience working with Redshift Spectrum and Athena for querying large-scale data.
• Understanding of security best practices for cloud data solutions, including IAM roles and policies.
• Familiarity with data governance, compliance, and data quality frameworks
Agile Way of Working, Digital : Python, Digital : Amazon Web Service(AWS) Cloud Computing, Digital : PySpark