Netflix Data Analysis:
Netflix has always been a data-driven company since its reception.Netflix uses data to extract insights and use them in improving their services.
Let us do a simple project with the help of pandas to answer some questions of Netflix Data.
Download data from here :
Import Libraries
import pandas as pd
import seaborn as sns
Import Dataset
netflix = pd.read_csv(r"C:\Users\abhis\Downloads\netflix.csv")
netflix
Snoshow_idtypetitledirectorcastcountrydate_addedratingdurationlisted_in Description0s1MovieDick Johnson Is DeadKirsten JohnsonNaNUnited StatesSeptember 25, 2021PG-1390 minDocumentariesAs her father nears the end of his life, filmm...1s1MovieDick Johnson Is DeadKirsten JohnsonNaNUnited StatesSeptember 25, 2021PG-1390 minDocumentariesAs her father nears the end of his life, filmm...2s2TV ShowBlood & WaterNaNAma Qamata, Khosi Ngema, Gail Mabalane, Thaban...South AfricaSeptember 24, 2021TV-MA2 SeasonsInternational TV Shows, TV Dramas, TV MysteriesAfter crossing paths at a party, a Cape Town t...3s3TV ShowGanglandsJulien LeclercqSami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...NaNSeptember 24, 2021TV-MA1 SeasonCrime TV Shows, International TV Shows, TV Act...To protect his family from a powerful drug lor...
You are reading netflix data analysis.
Simple Analysis
Top 5 Records
netflix.head()Last 5 Records:
netflix.tail()You are reading netflix data analysis.
shape of Dataset
netflix.shape(8810, 11)Size - Total number of elements (Values) in the dataset
netflix.size96910Column names:
netflix.columnsIndex(,
dtype='object')
dtypes:
netflix.dtypesOut:
show_id object
type object
title object
director object
cast object
country object
date_added object
rating object
duration object
listed_in object
description object
dtype: objectInfo
print(netflix.info())
RangeIndex: 8810 entries, 0 to 8809
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 show_id 8810 non-null object
1 type 8810 non-null object
2 title 8810 non-null object
3 director 6176 non-null object
4 cast 7984 non-null object
5 country 7977 non-null object
6 date_added 8800 non-null object
7 rating 8806 non-null object
8 duration 8807 non-null object
9 listed_in 8810 non-null object
10 description 8810 non-null object
dtypes: object(11)
memory usage: 757.2+ KB
None
Is there any duplicate Records,if yes then Remove Duplicate Records .
You are reading netflix data analysis.
To check Duplicate Records
netflixYou are reading netflix data analysis.
Remove Duplicate Records
netflix.drop_duplicates(inplace = True)Is there any null-value in our dataset ? Show with heatmap
isnull()
netflix.isnull()You are reading netflix data analysis.
Count null values
netflix.isnull().sum()
show_id 0
type 0
title 0
director 2634
cast 825
country 831
date_added 10
rating 4
duration 3
listed_in 0
description 0
dtype: int64Heatmap
sns.heatmap(netflix.isnull())For 'Squid Game' , What is the showid and director of the show ?
To check if Squid Game in title
In :
netflix.loc=='Squid Game']netflix.isin()]netflix.str.contains('Squid Game')]
In which year the highest number of tv shows and movies are released? Show with Bar Graph
To perform an operation on date_added, we must first check its data type, and if it is not in DateTime, we should convert it.
Convert it into DateTime:
netflix.dtypesdtype('O')netflix = pd.to_datetime(netflix)
netflix.dtypes2019.0 2016
2020.0 1879
2018.0 1649
2021.0 1498
2017.0 1188
2016.0 429
2015.0 82
2014.0 24
2011.0 13
2013.0 11
2012.0 3
2009.0 2
2008.0 2
2010.0 1
Name: date_added, dtype: int64Bar Graph
netflix.dt.year.value_counts().plot(kind='bar')How many movies and tv shows are in this dataset, Show with Graph?
groupby
netflix.groupby('type').count()type
Movie 6131
TV Show 2676
Name: type, dtype: int64Bar Graph:
sns.countplot(netflix)Show all the movies that were released in year 2021
Create a column 'Release year':
netflix = netflix.dt.year
netflix0 2021.0
2 2021.0
3 2021.0
4 2021.0
5 2021.0
...
8805 2019.0
8806 2019.0
8807 2019.0
8808 2020.0
8809 2019.0
Name: release year, Length: 8807, dtype: float64netflix.loc==2021) & (netflix=='Movie'),]Show only the titles of Movies released in india only
netflix.loc=='Movie') & (netflix=='India'),]Show top 10 directors who gave most Shows and movies ever .
netflix.value_counts()Rajiv Chilaka 19
Raúl Campos, Jan Suter 18
Marcus Raboy 16
Suhas Kadav 16
Jay Karas 14
Cathy Garcia-Molina 13
Martin Scorsese 12
Youssef Chahine 12
Jay Chapman 12
Steven Spielberg 11
Name: director, dtype: int64Show all the records of Indian Comedy Movies
netflix.loc=='India') & (netflix.str.contains('Comedies')) & (netflix=='Movie')]In how many movies 'Will Smith' is cast ?
netflixx = netflix.dropna()
netflixx.loc.str.contains('Will Smith')]netflixx.loc.str.contains('Will Smith')].count()10What are the different ratings defined by netflix ?
netflix.nunique()
17How many movies got TV-14 rating in india in 2021?
netflix.loc=='TV-14') & (netflix=='India') & (netflix==2021) & (netflix=='Movie')]You are reading netflix data analysis.
How many TV Shows got the 'R' rating, after year 2018 ?
netflix.loc=='TV Show') & (netflix=='R') & (netflix>2018)]You are reading netflix data analysis.
Conclusion:
Netflix, a giant streaming platform has made it big using big data analytics. Netflix is one of the most prominent examples of how advancements in technology have helped brands like Netflix to grow into becoming famous and successful. It is not only Netflix that is making use of big data analytics like Amazon.
Learn to analyze data like never before . We have placed many students with our well structure course and guided learning.
Check here : Best Data Analytics Course with Python - Consoleflare
