Required Skills: Data Engineering, CDP Streaming, CDF streaming, SQL, Kubernetes, Snowflake ingestion patterns, Debezium, CI/CD data pipelines, Kerberos, Ranger, Atlas
Job Description
Healthcare Domain Experience Mandatory,
Years of Experience - 12+ Years MUST
Title: Lead Cloudera Consultant
Location: Chicago, IL (Onsite)
Job Description:
You must have hands-on, production-grade experience with ALL of the following:
Cloudera CDP / CDF
- CDP Public Cloud or Private Cloud Base
- Cloudera Flow Management (NiFi + NiFi Registry)
- Cloudera Streams Messaging (Kafka, SMM)
- Cloudera Stream Processing (Flink, SSB)
- Kudu / Impala ecosystem
Apache NiFi (Advanced)
- Building complex flows (not just admin/ops)
- QueryDatabaseTable / GenerateTableFetch / Merge Record
- Record-based processors & schema registry
- JDBC / DBCP controller services
- Stateful processors & incremental ingestion
- NiFi → Snowflake integration
- NiFi → Kudu ingestion patterns
Apache Kafka
- Kafka brokers, partitions, retention, replication, consumer groups
- Schema registry (Avro/JSON)
- Designing topics for high-throughput streaming
Apache Flink
- Flink SQL + DataStream API
- Event-time processing, watermarks, windows
- Checkpointing, save points, state backends
- Kafka source/sink connectors
- Exactly-once semantics
- Flink CDC a plus
Apache Kudu
- Table design (PKs, partition strategies)
- Upserts, deletes, merge semantics
- Integration with Impala
SQL Stream Builder (SSB)
- Creating jobs, connectors, materialized views
- Deploying and monitoring Flink SQL jobs in CDP
CDC (Change Data Capture)
- CDC via NiFi or Flink CDC or SSB
- Handling late-arriving events
- Handling deletes, updates, schema evolution
- Incremental key tracking
General Requirements
- 11+ years in data engineering / streaming
- 3–5+ years specifically with CDP/CDF streaming
- Strong SQL and distributed system fundamentals
- Experience in financial services, healthcare, telecom, or other high-volume industries preferred
Nice to Have
- Kubernetes experience running NiFi/Kafka/Flink operators
- Snowflake ingestion patterns (staging, Copy Into)
- Experience with Debezium
- CI/CD for data pipelines
- Security (Kerberos, Ranger, Atlas)