Data Science helps humans make better decisions; either quicker decisions or better decisions. Companies invest a lot of money in data science so they could get the right information to make the right decisions.
Data science includes processes like data cleansing, preparation, and analysis.
A data scientist would collect data from multiple sources like surveys, physical data plotting.
Data Science is about solving business problems. Through data science you can make better and right decisions.
It uses all Machine Learning techniques along with statistics to find insights from extensive data. To get precise information from data companies, invest a lot on data science.
Data science is nothing but the art of collecting the data from various sources, storing it into a structured manner and using the organized data to predict the demands, trends and helps the business to achieve its goal(in short, profit.)
Applications of data science:
Data Science Programing Languages:
Top data science jobs
Data Science Manager
Machine Learning Engineer
What skills are required to make a Data scientist?
Deep knowledge of Python.
Knowledge of databases like SQL.
Good knowledge in the field of Mathematics and statistics.
Understanding of analytical functions.
Knowledge and experience in machine learning.
Why Data Science :-
predict the future growth
also find the application in other fields like survey, product launch, elections.
This is a snapshot of the amount of data we create.
Apparently, 90% of the world's data has been created in the last 2 years only.
More Data means more information. More Information means more opportunity to utilize the information in different ways.
This will of course need some tools/techniques/methodology etc which can make use of this data.
The above combination of tools+technology+methodology is nothing but Data Science.
Hence Data Science is gaining popularity and is required today.
How does business increase their revenue?
Amazon sells more than 40% of its product through recommendation systems.
Google ads use your browsing history for showing ads which is its main source of revenue.
Microsoft keeps some hidden feedback features so that they can collect most of your data.
(Goto Settings -> Privacy -> Diagnostic & Feedback)
Recommendation systems and personalised content are possible with the help of artificial intelligence.
Facebook, Instagram, Netflix, YouTube companies use recommendation systems.
Most popular data science libraries
And much more
Anaconda is a distribution of python.
Anaconda is the most popular Python data science platform.
This means it includes not only python but many libraries that we use in the course, as well as its own virtual environment system.
It’s an “all in one” install that is extremely popular in data science and machine learning!
Jupyter is a development environment where we can write code, display images. And write down market notes.
It is the most popular IDE in data science for exploring and analysing data!
It is also a great learning tool
If you have Anaconda, install numpy by going to your terminal or command prompt and typing:
conda install numpy
pip install numpy
What is NumPy
-Numpy is the core library for scientific computing in python
-Numpy is the linear algebra library for python, the reason it is important for data science with python is that almost all of the libraries in the pyData ecosystem rely on numPy as one of their main building blocks.
-It Provides a high performance multidimensional array object and tools for working with these arrays.
Why are we using NumPy v/s List
It Occupies less memory as compare to list
It is Fast &
It is very convenient to work with NumPy
What is Pandas
Pandas is an open source library built on top of numpy
It allows for fast analysis and data cleaning and preparation.
It excels in performance and productivity
It also has built in visualization features.
It can work with data from a wide variety of sources.
You will need to install pandas by going to your command line or terminal and using either:
conda install pandas
pip install pandas
Topics covers in Pandas:
Merging, Joining and Concatenating
Data input and Outputs
What is the meaning of thresh?
thresh takes integer value which tells minimum amount of na values to drop
Keep only the rows with at least thresh=n non-NA values.
Pandas head() method is used to return top n (5 by default) rows of a data frame or series.
Pandas corr() is used to find the pairwise correlation of all columns in the dataframe. Any na values are automatically excluded. For any non-numeric data type columns in the dataframe it is ignored.
What is a pivot table?
A pivot table is a data summarization tool that is used in the context of data processing & statistics. Pivot tables are used to summarize, sort, reorganize, group, count, total or average data stored in a database. It allows its users to transform columns into rows and rows into columns. It allows grouping by any data field.
Matplotlib is the most popular plotting library for python.
It gives you control over every aspects of figure
It was designed to have a similar feel to matLabs graphical plotting.
How to install matplotlib:
conda install matplotlib
pip install matplotlib
What is machine learning?
Machine Learning is making the computer learn from studying data and statistics.
Machine Learning is a step into the direction of artificial intelligence (AI).
Machine learning a program that analyses data and learns to predict the outcome.
Introduction to Machine Learning
We will be using Introduction to Statistical Learning by Gareth James as a companion book.
It’s freely available online, let’s see how to get it
Students who want the mathematical theory should do the reading.
Students who just want light theory and more interested in Python Applications.
Read Chapters 1 & 2 to gain a background understanding before continuing to the Machine Learning Lectures.
What is Machine Learning?
? Machine learning is a method of data analysis that automates analytical model building.
? Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look.
What is it used for?
? Fraud detection.
? Web search results.
? Real-time ads on web pages
? Credit scoring and next-best offers.
? Prediction of equipment failures.
? New pricing models.
? Network intrusion detection.
? Recommendation Engines
? Customer Segmentation
? Text Sentiment Analysis
? Predicting Customer Churn
? Pattern and image recognition.
? Email spam filtering.
? Financial Modeling