Required Skills: Python, Django, Flask, Spark, Pyspark, Pytorch, GCP, Google cloud platform, GKE, Google Kubernetes engine, Bigquery, Big query
Job Description
Please Note - This Position only for W2 candidates
If you have a proper LinkedIn for example (Created before 2019- I will make sure you will get the interview)
If you're interested, please send me a copy of your resume and the following details as soon as possible
- 1st References
Name-
Title-
Company -
Email – (Official email)
Contact -
2nd References -
Name-
Title-
Company -
Email – (Official email)
Contact -
- Last 4 Digit of SSN-
- Date of Birth (DOB) - MM/DD --
Job Description:
Overview: At a high level, they have migrated from Hadoop to GCP for data processing. Have a GCP data environment, predominantly for big data applications on the cloud. Seeking 3-5 Senior Level Data Engineers with strong Python skills to support ongoing data migration and ingestion efforts.
- Source systems: get data from multiple external channels - provider data, healthcare groups, hospitals, etc. send data and their platform processes it and provides to operation systems.
- Their end state is not a data warehouse for analytics - but the data directly feeds applications.
- Currently, a lot of the data ingestion is being done manually and they are looking to automate.
- Their data pipelines are PySpark, Scala/Spark run on Dataproc for larger volumes.
- Python, Google Cloud Functions to execute the scripts with Google Kubernetes Engine (GKE)
- Should have experience in working with denormalized data types, both structured and unstructured data.
- Using Cloud SQL as relational cloud database, but okay with others i.e. Oracle, Postgres.
- Building AI use cases as well - 5-6 use cases on their plate right now, including AI for data pipeline builds. Folks who can have some background in developing AI applications would be the ideal profile. If we found a strong ML or AI candidate with Python programming skills, they could potentially find a space for them.
Must Have:
- Strong hands-on Python programming
- Spark/PySpark
- GCP (BigQuery, Dataproc, Google Cloud Functions, GKE, Cloud SQL - would not consider all must-haves, just general awareness of the GCP ecosystem and data services)
- Experience working with various data types and structures
Nice to Have:
- AI experience - building AI systems, models, or building inference pipelines and processing data "for AI"