SPA Assignment 1 Guide¶

Warning

All the steps mentioned here are based on how I approached the assignment. I would anyone who is reading this to properly read the instructions given by the faculty and discuss the same in the group to see if my approach is in line with what the professor has given. THis guide was written to primarily help those that have no idea about the Databrick, etc.

Subject & Subject Code¶

Stream Processing and Analytics (S2-22 SSZG556)

Questions¶

Question Summary: http://taxila-aws.bits-pilani.ac.in/mod/forum/discuss.php?d=57989

Part A - 10 M
- Analyzing and evaluating Streaming Framework / Platform
Part B - 25 M
- Using given problem statement and dataset, you need to use databricks platform and run ML experiements using Pyspark ML library and showcase

Video Guide¶

Check this video for a guide on how to solve the the assignment: https://youtu.be/cJ-rLnizaFw

Feel free to comment in the video if you have any doubts. Or contact me in the WhatsApp group. I will try to explain as much as I know.

Links & Important Code Blocks¶

Links¶

Binary Classification Databricks Sample - https://docs.databricks.com/_extras/notebooks/source/binary-classification.html
DataBricks - https://community.cloud.databricks.com/
G7 Demo on Databricks - https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6374943227371516/1151767014322707/8298046428141319/latest.html

Code Blocks¶

Reading Data from DBFS

dataset = spark.read.format("csv").schema(schema).option("header", "true").load("/FileStore/data.csv")

Assignments Post Mid Sem