Linear Regression | Supervised Learning | Simple Linear Regression | Machine Learning Algorithm
In this blog, you will get to know about the Linear Regression algorithm.
Points covered in this blog:-
- What is supervised learning
- What is Linear Regression Algorithm? Types of Linear Regression
- Simple Linear Regression
- Linear Regression Line
- Finding the best fit line
- Model Performance
- Working Of linear regression.
- Python code Using Skitlearn library
What is Supervised Learning?
Supervised learning is the types of machine learning in which machines are trained using training data, and on basis of that data, machines predict the output. In supervised learning, the training data provided to the machines work as the teacher/supervisor that teaches the machines to predict the output correctly.
For eg: the student learns under the supervision of the teacher.
Supervised learning can be further divided into two types of problems:
- Regression: Regression algorithms are used if there is a relationship between the input variable and the output variable.
- Classification: Classification algorithms are used when the output variable is categorical, which means there are two classes such as Yes-No, Male-Female, True-false, etc.
What is Linear regression?
Linear Regression is a Machine Learning algorithm in supervised learning. It performs regression task. It is mostly used for finding out the relationship between variable and forecasting. Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age, product price, etc. The linear regression model provides a sloped straight line representing the relationship between the variables.
Linear regression performs task to predict a dependent variable value(y) based on a given independent value x.
Mathematically, we can represent a linear regression as:
y= a0+a1x
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
Types of Linear Regression
- Simple Linear Regression: If you have one independent variable then it is a simple linear regression.
- Multiple Linear regression: If you have more than one independent variable then it is multiple linear regression.
Simple Linear Regression
Simple Linear Regression is a type of Regression algorithms that models the relationship between a dependent variable and a single independent variable. The key point in Simple Linear Regression is that the dependent variable must be a continuous/real value. However, the independent variable can be measured on continuous or categorical values.
Linear Regression Line
A linear line showing the relationship between the dependent and independent variables is called a regression line. A regression line shows two types of relationship:
- Positive Linear Relationship: If the dependent variable increases on the Y-axis and the independent variable increases on X-axis
Equation y = a0+a1x
- Negative Linear Relationship: If the dependent variable decreases on the Y-axis and the independent variable increases on the X-axis
Equation y = -a0+a1x
Finding the best fit line
Our main goal is to find the best fit line that means the error between predicted values and actual values should be minimized. The best fit line will have the least error. The different coefficient of lines (a0, a1) gives a different line of regression, so we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate this we use the cost function
Cost Function
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of squared error that occurred between the predicted values and actual values.
Equation for MSE
Model Performance
To check the goodness of the regression model we use the R-Square method
R-squared method: R-squared is a statistical method that determines the goodness of fit. The high value of R-square determines the less difference between the predicted values and actual values and hence represents a good model.
Now I have attached a question to give you a good understanding of what we have learned till now.
Python Implementation of linear regression model using scikit-learn library.
Ques. Predict the per capita income of the USA using this dataset