UNDERSTANDING DATA SCIENCE FUNDAMENTALS
WHAT IS DATA SCIENCE
Data science is an interdisciplinary field used to process, analyse and derive insight from different data
It relies on a plethora of techniques such as: visualization , statistics, machine learning e.t.c to make sense of data
DATA SCIENCE
1. DS is used to formulate the right questions.
2. In DS, the data for analysis is prepare by processing, massaging, cleansing, visualizing and organizing the data.
3. DS uses data from several datasets for solving real-world problems
DATA ANALYTICS
DA is used to solve questions coming from a business perspective.
DA helps data to identify patterns and discover correlations.
DA identifies data quality issues and uses a single data sets
DIFFERENCES BETWEEN DATA SCIENCE AND DATA ANALYTICS
DATA SCIENCE (DS)
APPLICATION OF DATA SCIENCE AND MACHINE LEARNING
PROGRAMMING IN DATA SCIENCE
Text
PYTHON
Data scientists and programmers like Python because it is a general-purpose and dynamic programming language.
This language also contains good packages for natural language processing and data learning and is inherently object-oriented.
RSTUDIO
R is better for ad hoc analysis and exploring datasets than Python. It is an open-source language and software for statistical computing and graphics.
SQL (Structured Query Language)
SQL (Structured Query Language) is a domain-specific language used for managing data in a relational database management system. SQL tables and SQL queries are critical for every data scientist to know and be comfortable with.
SOME BASIC PYTHON PACKAGES USED IN DATA SCIENCE
NUMERICAL PYTHON (NUMPY)
PANDAS
MATPLOTLIB
DATA PREPROCESSING AND DATA WRANGLING
Data preprocessing/preparation/cleaning process of detecting and correcting(or removing) corrupt or inaccurate records from a dataset.
Data Wrangling is the process of converting and mapping data from its raw form to another format with the purpose of making it more valuable and appropriate for advance tasks such as Data Analytics and Machine Learning.
DATA VISUALIZATION
Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps. Data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.
MATHEMATICS AND STATISTIC IN DATA SCIENCE
Math and Statistics for Data Science are essential because these disciples form the basic foundation of all the Machine Learning Algorithms. In fact, Mathematics is behind everything around us, from shapes, patterns, and colors, to the count of petals in a flower. Mathematics is embedded in each and every aspect of our lives.
MACHINE LEARNING
Machine learning is the science (or art) of programming computers so that can learn from data.
TYPES OF MACHINE LEARNING
Basically there are 2 types of machine.
They are:
1.Supervised Learning
2.Unsupervised Learning
SOME MACHINE LEARNING MODELS INCLUDE:
Linear Regression.
Logistic Regression.
Decision Tree.
support vector machine (SVM)
Naive Bayes.
k-Nearest Neighbours (KNN)
K-Means.
Random Forest.
WHY DO I NEED TO STUDY DATA SCIENCE
THANK YOU
FOLLOW ME ON:
DATA SCIENCE
By Abdulsamod Azeez
DATA SCIENCE
- 315