In this video, I will walk through my solution and analysis for one of the most popular beginner's competitions on Kaggle, that is the Titanic survival prediction competition. This video is part one of a two-part series.
Kaggle is a subsidiary of Google and an online community of data scientists and machine learning practitioners. On Kaggle, you can find many published datasets, data science and machine learning tutorials but most importantly, Kaggle is best known for their competitions.
The Titanic survival prediction competition is a great beginner's competition that introduces beginners to not only the Kaggle platform but also the process behind an end-to-end machine learning project, from loading and reading datasets to building a fully functional predictive model.
The aim of this competition is to analyse how different passenger features such as age, gender and ticket class correlate with survival and subsequently train a machine learning model to classify unknown passenger data. This is an example of a binary classification problem in machine learning where passengers are classified as either survived or did not survive.
This video covers the exploratory data analysis (EDA) section of my notebook. EDA is the process of exploring our datasets as well as summarise the key characteristics and trends in our data such as data types, distributions and correlation between numerical variables.
I managed to gather three important insights as a result of the EDA process:
1. Female passengers were more likely to survive than male passengers
2. First-class passengers were most likely to survive in comparison to second class as well as third-class passengers
3. Passengers of younger ages, especially children were more likely to survive than the other passengers on the Titanic
A lot of time has gone into preparing the solution notebook as well as this video. So, if you enjoyed it or found it helpful in your own learning, it would mean the world to mean if you could like the video and subscribe to my channel.
If you have any questions, feel free to reach out to me. Happy learning!
Timestamp
00:00 - Introduction
05:06 - Import libraries
05:48 - Import and read data
08:20 - Data description
09:30 - Data types, missing data and summary statistics
12:35 - Feature analysis introduction
13:49 - Analyse categorical variables
19:26 - Analyse numerical variables
24:33 - Summary and conclusion
26:04 - Outro
Install Anaconda and Jupyter Notebook
[ Ссылка ]
Kaggle Titanic Survival Prediction Competition
[ Ссылка ]
Link to my notebook on GitHub
[ Ссылка ]
Follow me
Facebook - [ Ссылка ]
Instagram - [ Ссылка ]
Twitter - [ Ссылка ]
Medium - [ Ссылка ]
LinkedIn - [ Ссылка ]
#Kaggle #DataScience #MachineLearning
Ещё видео!