+1 (315) 557-6473 

Create a Program to Implement Titanic Survival Analytics in Python Assignment Solution.


Write a program to implement titanic survival analytics in python.

Requirements and Specifications

Description: Project R: Titanic 1. Go on Kaggle.com. This is a website for Data Scientist to practice their skills on real-world datasets and solve real-world problems. You will need to sign up an account in order to access the data and problems on Kaggle. 2. Find the web page for Titanic: Machine Leaning for Disaster. First, understand what the problem is, then download "train.csv" file under the "Data" tab. NOTE: this is a live competition for price. You are NOT required to participate in the competition. DO NOT submit your code on Kaggle. You will only need to analyze the train.csv data in this course. 3. After completing Udacity lessons on Explore One Variable, perform at least TWO different analysis using One variable on the train.csv dataset. 4. Create an RMD file and name it as Titanic.rmd. Inside this file, include your name (5 points) 5. In the same RMD file, for each of the one-variable analysis you perform, explain: Why you want to perform this analysis. In other words, what question do you want to answer by performing this analysis? (20 points/analysis) What information/conclusion you get from the analysis result/plot. (20 points/analysis) In the RMD file, you must include R code in the R code chunks which will generate the results and plots. If the code is not included in the R code chunks, the maximum credit you can receive from this step is 20 points/analysis. 6. At last, create an HTML report from this RMD file (15 points). Your HTML report must be knitted from the RMD file and must include plots. If plots do not show up in the HTML file, the maximum credit you can receive from this step is 5 points. Submit HTML report and the RMD file. (HINT: the Tutorials on theTitanic: Machine Leaning for Disaster main page provide many interesting examples)
Source Code
# NAME: Adjoua
# Load packages
# Load the data
train = read.csv('train.csv', stringsAsFactors = F)
# Summary statistics of training data
#Taking a quick look at missing values.
as.list(colSums(is.na(train) | train == ""))
'We found that the age have 177 null rows, Cabin 687, Embarked 2'
# Check the distribution of Sexes on the Titanic
ggplot(train,aes(Sex))+geom_bar(aes(fill=factor(Sex)))+ ggtitle("Passenger sexes")
'We find that the most passengers gender are the males'
# Chart to see how many people survived and how many didn't
ggplot(train,aes(Survived))+geom_bar()+ ggtitle("Died vs Survived")
'Unfortunately We find that most passengers are died'
# Check the distribution of Ages on the Titanic
ggplot(train,aes(Age))+geom_histogram(bins=30, color='black', fill='#008B8B')+ ggtitle("Age distribution")
'We find that most passengers are youth their age between 20 and 40 years old'
# Box plot for class vs age.
ggplot(train,aes(Pclass,Age))+geom_boxplot(aes(group=Pclass,fill=factor(Pclass), alpha=0.3))+ ggtitle("Age vs Pclass")
'We notice that, the higher the age- the higher the people that survived'