Pandas head and tail

For beginners in data science and machine learning, Pandas head and tail functions are two most basic tools that we use.

We have already discussed a few Pandas topics before.

If you have not already had an idea about how Pandas Python library works, you may check the following links.

Firstly, Pandas library in Python is a fast, powerful, and flexible tool. In addition, it’s also easy to use.

But what do we do with Pandas?

Pandas library is basically an open source data analysis and manipulation tool.

Since it uses the Python programming language, we can apply all Python coding tricks on Pandas.


If you are a programming beginner you may take an interest in the following posts.

Steps in program development

Learn Programming Techniques

The levels of programming languages

What is high level language?

What is language portability?

Programming languages translators

Learn structured programming

Machine language to Assembly language


Certainly, it’s an added advantage.

Next, we can analyze and manipulate data in a variety of formats.

For example, we can use tabular data (like Excel spreadsheets), time series data, and more.

Let’s see how we can use Pandas DataFrame first.

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/sanjibsinha/Machine-Learning-Primer/main/world_internet_user.csv', encoding = 'unicode_escape', engine ='python')

df.head()

We have used Google Colab for coding. On the other hand we could have used the Jupyter notebook also for local usages.

As a result we see the first five rows and columns.

     Country	    Region	 Population	Internet Users	% of Population
0	_World	        NaN	     7920539977	5424080321	    68.48
1	Afganistan	    Asia	 40403518	9237489	        22.86
2	Albania	        Europe	 2872758	2191467	        76.28
3	Algeria	        Africa	 45150879	37836425	    83.80
4	American Samoa	Oceania	 54995	    34800	        63.28

Before we discuss the speciality of the Pandas head and tail functions, let’s take a look at the main features of Pandas library.

Firstly, Pandas represent Data structures for storing and manipulating large and complex datasets. As an outcome we can use different types of functions and attributes for manipulating data.

We can take a look at some code snippets.

df.Country

# output
     
0               _World
1           Afganistan
2              Albania
3              Algeria
4       American Samoa
            ...       
238    Wallis & Futuna
239     Western Sahara
240              Yemen
241             Zambia
242           Zimbabwe
Name: Country, Length: 243, dtype: object

We can use the bracket notation instead of dot notation.

df['Country']

# output
  
0               _World
1           Afganistan
2              Albania
3              Algeria
4       American Samoa
            ...       
238    Wallis & Futuna
239     Western Sahara
240              Yemen
241             Zambia
242           Zimbabwe
Name: Country, Length: 243, dtype: object

The dot notation will not work if there is a space between the words.

# won't work
df.Internet Users

# output
     
  File "<ipython-input-4-14d0402b7c36>", line 2
    df.Internet Users
                ^
SyntaxError: invalid syntax

-----
df['Internet Users'] # will work

# output
     
0      5424080321
1         9237489
2         2191467
3        37836425
4           34800
          ...    
238          6200
239         28000
240       8353377
241       9870427
242       8400000
Name: Internet Users, Length: 243, dtype: int64

With reference to the main features of Pandas we can think of the second main feature.

Padas is an important tool for reading and writing data from a variety of sources. For example we have read data above from a GitHub Resource. 

Thse formats include CSV, Excel, and SQL databases.

Thirdly, Pandas have a lot of functions using which we can index and slice data. Moreover we can clean and reshape data. 

In later discussions we will take a close look at these various important functions of Pandas.

Finally we can merge and join data from different sources and use Visualization tools for quickly exploring and understanding data.

Let’s see some more head and tail functions.

df = pd.DataFrame({'animals': ['humans', 'pig', 'falcon', 'lion', 'monkey', 'parrot', 'shark', 'whale', 'zebra']})

df.head(5)
# output    
animals
0	humans
1	pig
2	falcon
3	lion
4	monkey

df.head(-6)
# output
     
animals
0	humans
1	pig
2	falcon

df.tail(-3) 

# output
	animals
3	lion
4	monkey
5	parrot
6	shark
7	whale
8	zebra

In each head and tail function takes a numerical parameter. As we pass the parameter, it gives us output accordingly.

For instance, if we pass 5 inside the head function, we get the first five elements.

If we pass -3, through head it will calculate the total number of elements first. After that, it will subtract the last three elements from the Pandas DataFrame.

The tail function on the contrary counts from the bottom. 

In addition, the functionality is the same as the head function.

For more Pandas code snippets, you may visit the respective GitHub Repository.

What Next?

Books at Leanpub

Books in Apress

My books at Amazon

GitHub repository

TensorFlow, Machine Learning, AI and Data Science

Flutter, Dart and Algorithm

C, C++, Java and Game Development

Twitter

Comments

3 responses to “Pandas head and tail”

  1. […] the previous section we have learned how we can play with pandas head and tail methods. Pandas describe method is also […]

  2. […] the previous section we have learned how we can play with pandas head and tail methods. Pandas describe method is also […]

  3. […] Pandas sort by column name is nothing but a certain type of data analysis. For beginners especially, we can do it to give an idea. […]

Leave a Reply