About

Data Science : Most In-Demand Field of IT

Data Science:  Most In-Demand Field of Information Technology


Data Science



What is Data Science:

Data Science is an interdisciplinary field which uses scientific methods and algorithms to work on structured or unstructured data to extract the meaningful insights from it. 

This insights are very important to understand the growth, and performance of any business domain and  then  give the decisions for further development of business. 

In simple term data science is nothing but working on raw dat to make it into analyzable form to extract knowledge or meaningful insights to take appropriate decision of any business problem. 

Data science means study of data. It is process of collection of data, transformation of data, analysis of data, building of model, testing of model and finally making of prediction .

Importance of Data Science:

Data science combines the knowledge  of programming languages, mathematics and statistics to extract the insights from data to make the decisions.

Data science is used in health-care to predict the models such as patient is diabetic or not, cancer is benign or malignant.

Data science is very important in E-commerce for segmentation purposes. It is used to take care of customers data and find buyers behavior to recommend particular product.

Data science is important in financial sector. It is very important to predict stock prices, risk analysis of stocks etc.

Data science is used in education to analyze the students academic performance, to predict students results.

Data science is used in weather and forecast to predict the weather.

Data science is very important in transportation. It helps to predict delay of plane , train and analysis of route.

Data science plays very important role in sport. Exploratory data analysis of cricket world cup, IPL, Olympic Games and predict which team will win cricket, which teams will perform better.

Data science is also important in entertainment and telecommunication industry to predict the movie will Blockbuster or not, news are fake or real etc.

Data science plays very important role for social media. It is used for data analytics purpose. 

Main Concepts to be covered in Data Science:

1. Measures of central Tendency: 

It is descriptive summary of data set which gives mean, median and mode values. 

2. Measures of variability : 

It is spread or dispersion of data. It is used to compare various samples. It gives 

Range: Gives largest and smallest values in data set. Quartile1, quartile2, quartile3, quartile4.

Standard Deviation: It is indicator of financial risk, it is used for comparing populations.

Variances: it is average of squared deviation.

Measures of Shapes: 

Skewness: left skew and right skewed

Kurtosis: peakednes of  distribution.

2. Descriptive Statistics: 

Data Types: Nominal, ordinal, interval and ratios

Classification of data: continuous data, discreet data,

3. Hypothesis Testing:

It is statistical method used in making decisions. It is a tentative result or assumptions which is then tested to get result.

H0 means null hypothesis: 

If pvalue > 0.05 Accept null hypothesis

HE means Alternative Hypothesis:

If pvalue < 0.05 Accept alternate hypothesis.

P high null fly --- accept null hypothesis

P low null go ---  reject null hypothesis.

4. Machine Learning:

Machine learning is a branch of artificial intelligence and computer science which deals with data by using algorithms and build model to make prediction. 

Machine learning is devided into two parts. 

1. Supervised Machine Learning: 

When we are aware with target variable or dependent variable or y variable then we use algorithms and predict the data is called supervised machine learning.

Supervised learning algorithms are trained with labels data to predict accurate outcomes.

Important Algorithms in Supervised Machine Learning:

Supervised machine learning further devided into two groups

1. Regression

When response variable is continuous i.e. numeric then it is problem of regression.

It has various algorithms,

Linear Regression: 

Predict the values based on independent variables.

Decision Tree:

Splitting our data set to extract more information.

Support Vector Regression:

It is used for classification or regression problems. It can be used to create two homogeneous partitions

Lasso Regression: 

It is regularization technique. It is used in regression problems to get very accurate model and predictions.

Random Forest Regressor:

It uses ensemble method for both classification and regression problems. It constructs decision trees during training time and provide predictions for every decision tree.

2. Classification

Classification is a process of categorising a structured or unstructured data sets into classes.

Support Vector Machine: 

Support vector machine is used for both classification and regression problem. Mostly used for classification problems.

Discriminant Analysis:

This models are based on dimensionality reduction method and used marketing predictive analysis, image recognition etc.

Naive Bayes: 

This algorithm is used for binary and multi class classification problems.

KNN: K Nearest Neighbor: 

It can be used for both classification and regression problems.


2. UnSupervised Machine Learning

Unsupervised machine learning work with analyze and cluster unlabeled data. It helps to discover hidden patterns and as per the similar features group them into different segments or classes. These algorithms works without intervention of human by creating patterns.

Clustering

Clustering means forming different groups from given data set as per their characteristics.

 Hierarchical Clustering:

This is a process of grouping of unlebled data points as per similar characteristics.Hierarchy is represented as like dendrogram or tree structure

kMeans Clustering:

This algorithm helps to identified k number of centroids and then put every data point to its nearest neighbor.

DBSCAN Clustering:

It is Density Based Spatil Clustering  of Applications with Noise. It can discover hidden patterns and cluster them from different shapes and size of large amount of data sets.

Neural Network:

These are set of algorithms are used to recognize the hidden patterns among data sets.It is also used to model complex data sets.

Association Rules:

It discovers the relations between variables in large data sets.It is rule based learning and data mining technique that finds the relationship in different variables or features.

Recommendation Engine or Recommender Machines:

Based on buyers behavior this algorithm helps users to recommend products or services. It helps to discover the products or services nd recommend it to users.

Main Libraries In Python Used for Data Science Algorithms:

1. Numpy:

Numpy means numerical Python. It is the fundamental package in Python which is used for numerical calculations.

2. Pandas

Python data analysis is most important in data science life cycle.

3. MatPlotLib:

It is very powerful library used for visualization. It is a plotting library to plot different graphs.

4. Seaborn:

It is also used to visualization purpose. With the help of seaborn statistical graph are plotted.

5. SciPy:

This library is used to solve scientific, mathematical, statistical and engineering problems.

6. Scikit-learn:

Scikit-learn also known as sklearn is used most of the machine learning algorithms.

7. TensorFlow:

It is developed by Google for fast numerical calculations.

8. Keras:

Kelas is popular library which is used mostly in deep learning and neural networking algorithms.


Written by,

Prof.Dr.Manisha Mor

Post a Comment

0 Comments