Sparkify Customer Churn Prediction — DSND Capstone Project

Project Definition

The project is based on Udacity’s Udacity’s Data Scientist Nanodegree program.

Udacity’s Sparkify is a virtual company such as the other music streaming services Spotify or Google Music. Udacity has provided the dataset that contains a customer behavior log from October to November 2018. Customer log holds time-based information (Unix time seconds since 1970) of every activity that customer has made e.g. registration day, length of sessions, and page visited (the main information of customer behavior).

The dataset is 12 GB so we will need Amazon Web Services (AWS) Elastic MapReduce (EMR) and Spark to process the data. The dataset…


12GB dataset ETL, EDA, feature engineer, analyze, modelling three machine learning models and tuned all of them, and the whole above-mentioned process two times with the costs shown above. Most of the spendings went with configuring the cluster and installing python libraries.

Have you ever got stuck when paid virtual clusters are running and you’ll try to diagnose where the cause is? After reading this blog you’ll find some tips to avoid costs.

The main reason for writing this blog was to show the results I got from the user churn assignment in Udacity’s Data Science Nanodegree program but during the process, I ran into practical issues, and therefore I’d like to share some tips for reducing costs when using virtual clusters.

The assignment was to build a model to predict customer churn for Udacity’s virtual company Sparkify, which is like the…


Photo by olia danilevich from Pexels

When you are solving what is the average salary for a developer type you’ll cross with this observation: Dataset has 23 different developer categories but this is a multiple-choice question so respondents can choose all that applies. This means for example that one respondent can have many developer types listed. There were 8269 different answer types (23 categories + 8264 different combinations of those categories). So how can you exactly tell, which salary is meant for each developer type?

Stack Overflow makes an Annual Developer Survey to their developer community. In 2020 with nearly 65,000 responses fielded from over 180…

Vesa Jaakola

Data Expert, Innovator & Co-Founder of Digitalents Helsinki, Innovative Leader, PSM

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store