UNDERSTANDING DATA SCIENCE FUNDAMENTALS

 

WHAT IS DATA SCIENCE

Data science is an interdisciplinary field used to process, analyse and derive  insight from different data

It relies on a plethora of techniques such as: visualization , statistics, machine learning e.t.c to make sense of data 

DATA SCIENCE

 

1. DS is used to formulate the right questions.

 

2. In DS, the data for analysis is prepare by processing, massaging, cleansing, visualizing and organizing the data. 

 

3. DS uses data from several datasets for solving real-world problems

DATA ANALYTICS

 DA is used to solve questions coming from a business perspective.

 

DA helps data to identify patterns and discover correlations.

 

DA  identifies data quality issues and uses a single data sets

DIFFERENCES BETWEEN DATA SCIENCE AND DATA ANALYTICS

DATA SCIENCE (DS)

APPLICATION OF DATA SCIENCE AND MACHINE LEARNING

PROGRAMMING IN DATA SCIENCE

Text

PYTHON

Data scientists and programmers like Python because it is a general-purpose and dynamic programming language.
​This language also contains good packages for natural language processing and data learning and is inherently object-oriented.

 

RSTUDIO

R is better for ad hoc analysis and exploring datasets than Python. It is an open-source language and software for statistical computing and graphics.

 

SQL (Structured Query Language)

SQL (Structured Query Language) is a domain-specific language used for managing data in a relational database management system. SQL tables and SQL queries are critical for every data scientist to know and be comfortable with. 

SOME BASIC PYTHON PACKAGES USED IN DATA SCIENCE

NUMERICAL PYTHON (NUMPY)

PANDAS​

MATPLOTLIB

DATA PREPROCESSING AND DATA WRANGLING

Data preprocessing/preparation/cleaning process of detecting and correcting(or removing) corrupt or inaccurate records from a dataset.

Data Wrangling is the process of converting and mapping data from its raw form to another format with the purpose of making it more valuable and appropriate for advance tasks such as Data Analytics and Machine Learning.

DATA VISUALIZATION

Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps. Data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.

MATHEMATICS AND STATISTIC IN DATA SCIENCE

Math and Statistics for Data Science are essential because these disciples form the basic foundation of all the Machine Learning Algorithms. In fact, Mathematics is behind everything around us, from shapes, patterns, and colors, to the count of petals in a flower. Mathematics is embedded in each and every aspect of our lives.

MACHINE LEARNING

Machine learning is the science (or art) of programming computers so that can learn from data.

TYPES OF MACHINE LEARNING

 

Basically there are 2 types of machine.

They are:

1.Supervised Learning

2.Unsupervised Learning

SOME MACHINE LEARNING MODELS INCLUDE:

  • Linear Regression.

  • Logistic Regression.

  • Decision Tree.

  • support vector machine (SVM)

  • Naive Bayes.

  • k-Nearest Neighbours (KNN)

  • K-Means.

  • Random Forest.

WHY DO I NEED TO STUDY DATA SCIENCE

THANK YOU 

FOLLOW ME ON:

 

  

DATA SCIENCE

By Abdulsamod Azeez