09:00 – 09:25: Welcome and goals
– Objectives of the workshop
– Platform walkthrough (console, cloud shell, APIs, budget/billing)
– Datasets and use cases
09:30 – 10:00: Cloud storage
– Google cloud storage
– Hadoop interface and usage
10:05 – 10:45: Spark in GCP
– PySpark notebooks
– Data exploration and preprocessing in Spark
10:45 – 10:55: Break
11:00 – 12:00: Spark in GCP
– Machine learning pipelines in Spark (MLlib estimators and transformers)
– Feature extractors and transformers
– Classification use case
– Prediction use case
12:00 – 12:30: Recap, Q&A, formative assignment
– Storage, Dataproc and Spark APIs (Dataframes, SQL, MLlib)
– Q&A and troubleshooting
12:30 – 13:30: Lunch break
13:30 – 13:45: Recap on Kafka architecture
– Recap on Kafka architecture: brokers, topics, producers, and consumers
- Kafka installation & setup
13:50 – 14:10: Kafka setup and use case
– Use case: real-time patient monitoring
– Discussion on the necessary infrastructure (topics, producers, and consumers)
14:15 – 15:00: Producer and logging
– Producer logic
– Schema design for log storage
– BigQuery fundamentals
15:00 – 15:10: Break
15:15 – 16:00: Consumer and data visualisation
– Consumer logic
– Looker Studio fundamentals
– BigQuery structured queries
16:05 – 16:30: Recap, Q&A and formative assignment
– Streaming data processing
– Kafka architecture
– Q&A and troubleshooting
16:35 – 17:00: Further training and supporting materials
– Discussion of available resources and other sources of training
– Community of practice and networking