Pandas describe method

In the previous section we have learned how we can play with pandas head and tail methods. Pandas describe method is also important.

Why? Because pandas describe method generates descriptive statistics that we need especially for studying statistical data. 

Before moving ahead, let’s take a look at the code first.

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/sanjibsinha/Machine-Learning-Primer/main/world_internet_user.csv', encoding = 'unicode_escape', engine ='python')
df.head()

# output
     Country	    Region	 Population	Internet Users	% of Population
0	_World	        NaN	     7920539977	5424080321	    68.48
1	Afganistan	    Asia	 40403518	9237489	        22.86
2	Albania	        Europe	 2872758	2191467	        76.28
3	Algeria	        Africa	 45150879	37836425	    83.80
4	American Samoa	Oceania	 54995	    34800	        63.28

Now as we take a look at the head method, we know that it gives us output of the first five rows.

# summary statistics we need for data science
df.describe()

# output
        Population	   Internet Users	% of Population
count	2.430000e+02	2.430000e+02	243.000000
mean	6.518963e+07	4.464264e+07	69.921111
std	    5.235850e+08	3.579556e+08	27.426974
min	    5.960000e+02	4.140000e+02	0.080000
25%	    3.321120e+05	2.049585e+05	52.190000
50%	    5.302778e+06	2.864000e+06	77.940000
75%	    2.025171e+07	9.898751e+06	91.010000
max	    7.920540e+09	5.424080e+09	120.700000

However, when we want the descriptive statistics we use the describe method.


If you are a programming beginner you may take an interest in the following posts.

Steps in program development

Learn Programming Techniques

The levels of programming languages

What is high level language?

What is language portability?

Programming languages translators

Learn structured programming

Machine language to Assembly language


It includes many important statistical data that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding “NaN“ values.

We can also use the shape method to get the number of rows and columns at a glance.

# number of rows and columns
df.shape
    
# output
(243, 5)

As we take a look at the datasets, we know that the data types are not the same in each column.


If you are a complete beginner your journey to learn TensorFlow might start from here.

For the TensorFlow beginners we have a dedicated category – TensorFlow for Beginners.
But besides that, you may need to learn several other machine learning and data science libraries.

As a result, you may check these categories as well – NumPy, Pandas, Matplotlib.

However, without learning Python, you cannot learn the usages of these libraries. Why? Because they all use Python as the Programming language.

Therefore please learn Python at the very beginning and start learning TensorFlow.

And, finally please check our Mathematics, Discrete Mathematics and Data Structures categories specially. We have tried to discuss from basic to intermediate level so that you can pick up the core ideas of TensorFlow.

However, the pandas describe method analyzes both numeric and object series. Besides, it gives us an idea about the DataFrame column sets of mixed data types.

To get the output we can use the “dtypes” attribute. The output will vary depending on what is provided. 

Let’s see the code.


# data type of each column of which some of them are objects
df.dtypes

# output
Country             object
Region              object
Population           int64
Internet Users       int64
% of Population    float64
dtype: object

If we want only the object data type, we can certainly mention it.   

df.describe(include=['object'])

# output
        Country	Region
count	243	    242
unique	243	    6
top	   _World	Africa
freq	1	    58

However, the describe() method gives us more detail and we can observe the data from a data science perspective.

  • DataFrame.count: Count number of non-NA/null observations.
  • DataFrame.max: Maximum of the values in the object.
  • DataFrame.min: Minimum of the values in the object.
  • DataFrame.mean: Mean of the values.
  • DataFrame.std: Standard deviation of the observations.
  • DataFrame.select_dtypes: Subset of a DataFrame including/excluding

For more such code please visit the respective GitHub Repository.

What Next?

Books at Leanpub

Books in Apress

My books at Amazon

GitHub repository

TensorFlow, Machine Learning, AI and Data Science

Flutter, Dart and Algorithm

C, C++, Java and Game Development

Twitter

Comments

One response to “Pandas describe method”

Leave a Reply