Skip to content

SPA Assignment 1 Guide

Warning

All the steps mentioned here are based on how I approached the assignment. I would anyone who is reading this to properly read the instructions given by the faculty and discuss the same in the group to see if my approach is in line with what the professor has given. THis guide was written to primarily help those that have no idea about the Databrick, etc.

Subject & Subject Code

Stream Processing and Analytics (S2-22 SSZG556)

Questions

Question Summary: http://taxila-aws.bits-pilani.ac.in/mod/forum/discuss.php?d=57989

  • Part A - 10 M
    • Analyzing and evaluating Streaming Framework / Platform
  • Part B - 25 M
    • Using given problem statement and dataset, you need to use databricks platform and run ML experiements using Pyspark ML library and showcase

Video Guide

Check this video for a guide on how to solve the the assignment: https://youtu.be/cJ-rLnizaFw

Feel free to comment in the video if you have any doubts. Or contact me in the WhatsApp group. I will try to explain as much as I know.

Code Blocks

Reading Data from DBFS
dataset = spark.read.format("csv").schema(schema).option("header", "true").load("/FileStore/data.csv")