Multivariate regression in machine learning

Multivariate Regression in Machine Learning is a supervised machine learning algorithm. In this section we will take a close look.

How does this linear regression model work? How can we train our model to predict an outcome based on a formula?

In this section we will try to understand this important concept and dig some deeper into this topic.

However, before starting the discussion, we must go back and recollect to the previous section where we have discussed linear regression in detail.

In the simplest linear regression machine learning model we have seen that there is one independent variable and another is a dependent variable.

In multivariate regression there are multiple independent variables but one dependent variable.

Remember the simplest linear regression algorithm.

The formula is like below.

y = mx + b

# y is dependent variable
# x is independent variable

Based on the independent variable, we train the machine to predict the output.

In the above formula, ‘m’ is the coefficient, and the ‘b’ is the intercept. 

The same thing happens when there are multiple variables instead of one variable.

For example before we get a look at the data frame, we can take a look at the formula.

# y is dependent variable
# x1, x2 are independent variable

# y = m1x1 + m2x2 + b

Let’s import the necessary machine learning libraries. And use the Pandas read_csv() function.

import pandas as pd
import numpy as np
from sklearn import datasets, linear_model, model_selection
import matplotlib.pyplot as plt

url = 'https://raw.githubusercontent.com/sanjibsinha/Machine-Learning-Primer/basic/bmiindex%20.csv'
data_frame = pd.read_csv(url)
data_frame

Here goes the data table.

    age	height	weight	bmi
0	25	6.0	    143	    19.4
1	48	5.0	    180	    35.2
2	65	5.5	    210	    33.9
3	34	5.2	    78	    14.1
4	56	6.7	    156	    17.0

Based on the age, height and weight the body-mass-index or BMI indicates whether the person is overweight, underweight or normal.

In our case, the variable ‘bmi’ is the dependent variable and we have seen the formula.


# y is dependent variable
# x is independent variable

# y = m1x1 + m2x2 + b

# y = bmi
# x1 = age, x2 = height, x3 = weight # multiplied by coefficient
# b = intercept

Now implementing this formula we can now pass the multiple variables and get the possible prediction. 

This is nothing new. We have done the same thing before with simple linear regression.

lin_reg = linear_model.LinearRegression()
lin_reg.fit(data_frame.drop('bmi', axis='columns'), data_frame.bmi)
lin_reg.predict([[39, 5.1, 160]])

# output
array([29.66437152])

The above output is the BMI of a person who is 39 years old, whose height is 5.1 feet and weight is 160 pounds.

However, in our prediction it indeicates that the person is overweight.

Consequently, we can give the same inputs to any online BMI claculator, and we will get the same output.

Besides this prediction, we can also check the same using the formula.

lin_reg.coef_ # output: array([-0.01201971, -7.75434944,  0.17689678])
lin_reg.intercept_ # output: 41.37683796001647

# the formula:
# y = m1x1 + m2x2 + m3x3 + b

39*-0.01201971 + 5.1*-7.75434944 + 160*0.17689678 + 41.37683796001647
# output:
29.66437192601648

Therefore it proves our prediction is correct.

For more such Machine Learning primers please visit the respective GitHub Repository.

What Next?

Books at Leanpub

Books in Apress

My books at Amazon

GitHub repository

TensorFlow, Machine Learning, AI and Data Science

Flutter, Dart and Algorithm

C, C++, Java and Game Development

Twitter

Comments

Leave a Reply