What are three data structures in pandas?

Three data structures in Pandas are Series, DataFrame and Panel. In short, we can say, one, two and multi dimensional arrays.

One dimensional array means simply a column of data. We will see that in a minute with Pandas library. 

On the other hand, a DataFrame represents roots and columns. 

We hardly use the Panel. So in this section, we concentrate on the first two types of data structures in Pandas.

Let’s start by importing the Pandas library first.

Next, we will read the CSV file from the GitHub source.

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/sanjibsinha/Machine-Learning-Primer/main/world_internet_user.csv', encoding = 'unicode_escape', engine ='python')
df.head()

That is pretty simple and straightforward. 

As a result, we get a DataFrame.

Pandas DataFrame output of first five rows
Pandas DataFrame output of first five rows

With reference to our topic, we can check whether this is a Pandas DataFrame or not.


If you are a complete beginner your journey to learn TensorFlow might start from here.

For the TensorFlow beginners we have a dedicated category – TensorFlow for Beginners.
But besides that, you may need to learn several other machine learning and data science libraries.

As a result, you may check these categories as well – NumPy, Pandas, Matplotlib.

However, without learning Python, you cannot learn the usages of these libraries. Why? Because they all use Python as the Programming language.

Therefore please learn Python at the very beginning and start learning TensorFlow.

And, finally please check our Mathematics, Discrete Mathematics and Data Structures categories specially. We have tried to discuss from basic to intermediate level so that you can pick up the core ideas of TensorFlow.

Therefore, we can use the type() method that will determine the type of the ‘df’ object.

type(df)

# output
pandas.core.frame.DataFrame

As an outcome, we can say that Pandas DataFrame is nothing but a two dimensional array. 

Moreover, a two dimensional array always represents a row and column data structure. Right? 

On the other hand, we can operate on this DataFrame as we do in a Python list.

As the image suggests, we can easily omit the first row which gets us the data of the whole world. 

Hence, we can start the DataFrame from the second row.

df = df[1:]
df

# output
242 rows × 5 columns

Now it more sense.

On that account, we now know the number of rows and columns.

Besides, we have also come to know that we can do all type of operations on Pandas DataFrame.

Now we can change this DataFrame to a series.

There are two ways to do this operation.

df['Country']
or
df.Country

Certainly the second method will not work if the column name has space between them or uses any character.

It should be a single word, like ‘Country’.

Hence we get the output of a Series.

1           Afganistan
2              Albania
3              Algeria
4       American Samoa
5              Andorra
            ...       
238    Wallis & Futuna
239     Western Sahara
240              Yemen
241             Zambia
242           Zimbabwe
Name: Country, Length: 242, dtype: object

For any kind of integers and floating point values, we can also easily calculate the mean, and standard deviation.

df['% of Population'].mean()
df['% of Population'].std()

However, to get more statistical data, there is another good method. We can also use that.

df.describe()

# output
        Population	Internet Users	% of Population
count	2.420000e+02	2.420000e+02	242.000000
mean	3.272950e+07	2.241356e+07	69.927066
std	    1.348221e+08	8.995328e+07	27.483660
min	    5.960000e+02	4.140000e+02	0.080000
25%	    3.257340e+05	1.930792e+05	52.180000
50%	    5.279970e+06	2.846423e+06	77.970000
75%	    1.958173e+07	9.840202e+06	91.020000
max	    1.448314e+09	1.010740e+09	120.700000

We will see more Pandas DataFrame operations in the coming section.

For more such machine learning primers, please visit the respective GitHub Repository.

What Next?

Books at Leanpub

Books in Apress

My books at Amazon

GitHub repository

TensorFlow, Machine Learning, AI and Data Science

Flutter, Dart and Algorithm

C, C++, Java and Game Development

Twitter

Comments

Leave a Reply