Logarithm and Data Analysis

What is the relationship between logarithm and data analysis? In machine learning and data science we need basic mathematics.

With reference to this topic, in our previous section we have seen how we can use basic exponent and logarithm in Python.

Certainly the previous article was an introduction, towards machine learning, data science and mathematics.

If you are new to the concept of logarithm, please read the last article.

In this section we will take a quick look at why we need logarithm.


If you are a complete beginner your journey to learn TensorFlow might start from here.

For the TensorFlow beginners we have a dedicated category – TensorFlow for Beginners.
But besides that, you may need to learn several other machine learning and data science libraries.

As a result, you may check these categories as well – NumPy, Pandas, Matplotlib.

However, without learning Python, you cannot learn the usages of these libraries. Why? Because they all use Python as the Programming language.

Therefore please learn Python at the very beginning and start learning TensorFlow.

And, finally please check our Mathematics, Discrete Mathematics and Data Structures categories specially. We have tried to discuss from basic to intermediate level so that you can pick up the core ideas of TensorFlow.

Basically, when we compare between disparate data, it gets difficult to plot them.

Why? 

Because when someone is 1 year old and the other is 99 years, one bar chart looks very small compared to the bigger one.

For example, let’s have a dataset of name and age. 

As a consequence we can import the Pandas library and use the read_csv() function.

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/sanjibsinha/Machine-Learning-Primer/main/log.csv')
df

As a result, we get the following output.

    name	    age
0	John	    2
1	Json	    20
2	Emily	    57
3	Catty	    1
4	Elizabeth	3
5	James	    99

Now we can easily plot this data frame with the help of a plot() function. 

However, the output can frustrate us. 

Why? Because a few bars look terribly little. 

In fact, it becomes very difficult to analyze data.

Let’s take a look.


Logarithm and data analytics one
Logarithm and data analytics one

In such cases, logarithm saves us. 

Because we have plotted age on the Y axis, and name on the X axis, we can change the value of the Y axis to log base 2 of 100.

As a result, we get three distinct sections.

The first bar starts with 1, which is equivalent to 10 to the power 0.

And to do that, all we have to do is add another parameter ‘logy=True’.

Let’s get a look at the code.

df.plot(x='name', y='age', kind='bar', logy=True)

It changes the whole scenario. Consequently we are no longer at a loss.

On the contrary we can compare the ages.


Logarithm and data analytics two
Logarithm and data analytics two

As we move forward we will see more simple mathematics examples based on which we can analyze data and train our machine.

For more such machine learning primer code please visit the respective GitHub repository.

What Next?

Books at Leanpub

Books in Apress

My books at Amazon

GitHub repository

TensorFlow, Machine Learning, AI and Data Science

Flutter, Dart and Algorithm

C, C++, Java and Game Development

Twitter

Comments

Leave a Reply