For beginners in data science and machine learning, Pandas head and tail functions are two most basic tools that we use.
We have already discussed a few Pandas topics before.
If you have not already had an idea about how Pandas Python library works, you may check the following links.
Firstly, Pandas library in Python is a fast, powerful, and flexible tool. In addition, it’s also easy to use.
But what do we do with Pandas?
Pandas library is basically an open source data analysis and manipulation tool.
Since it uses the Python programming language, we can apply all Python coding tricks on Pandas.
If you are a programming beginner you may take an interest in the following posts.
Steps in program development
Learn Programming Techniques
The levels of programming languages
What is high level language?
What is language portability?
Programming languages translators
Learn structured programming
Machine language to Assembly language
Certainly, it’s an added advantage.
Next, we can analyze and manipulate data in a variety of formats.
For example, we can use tabular data (like Excel spreadsheets), time series data, and more.
Let’s see how we can use Pandas DataFrame first.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/sanjibsinha/Machine-Learning-Primer/main/world_internet_user.csv', encoding = 'unicode_escape', engine ='python')
df.head()
We have used Google Colab for coding. On the other hand we could have used the Jupyter notebook also for local usages.
As a result we see the first five rows and columns.
Country Region Population Internet Users % of Population
0 _World NaN 7920539977 5424080321 68.48
1 Afganistan Asia 40403518 9237489 22.86
2 Albania Europe 2872758 2191467 76.28
3 Algeria Africa 45150879 37836425 83.80
4 American Samoa Oceania 54995 34800 63.28
Before we discuss the speciality of the Pandas head and tail functions, let’s take a look at the main features of Pandas library.
Firstly, Pandas represent Data structures for storing and manipulating large and complex datasets. As an outcome we can use different types of functions and attributes for manipulating data.
We can take a look at some code snippets.
df.Country
# output
0 _World
1 Afganistan
2 Albania
3 Algeria
4 American Samoa
...
238 Wallis & Futuna
239 Western Sahara
240 Yemen
241 Zambia
242 Zimbabwe
Name: Country, Length: 243, dtype: object
We can use the bracket notation instead of dot notation.
df['Country']
# output
0 _World
1 Afganistan
2 Albania
3 Algeria
4 American Samoa
...
238 Wallis & Futuna
239 Western Sahara
240 Yemen
241 Zambia
242 Zimbabwe
Name: Country, Length: 243, dtype: object
The dot notation will not work if there is a space between the words.
# won't work
df.Internet Users
# output
File "<ipython-input-4-14d0402b7c36>", line 2
df.Internet Users
^
SyntaxError: invalid syntax
-----
df['Internet Users'] # will work
# output
0 5424080321
1 9237489
2 2191467
3 37836425
4 34800
...
238 6200
239 28000
240 8353377
241 9870427
242 8400000
Name: Internet Users, Length: 243, dtype: int64
With reference to the main features of Pandas we can think of the second main feature.
Padas is an important tool for reading and writing data from a variety of sources. For example we have read data above from a GitHub Resource.
Thse formats include CSV, Excel, and SQL databases.
Thirdly, Pandas have a lot of functions using which we can index and slice data. Moreover we can clean and reshape data.
In later discussions we will take a close look at these various important functions of Pandas.
Finally we can merge and join data from different sources and use Visualization tools for quickly exploring and understanding data.
Let’s see some more head and tail functions.
df = pd.DataFrame({'animals': ['humans', 'pig', 'falcon', 'lion', 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
df.head(5)
# output
animals
0 humans
1 pig
2 falcon
3 lion
4 monkey
df.head(-6)
# output
animals
0 humans
1 pig
2 falcon
df.tail(-3)
# output
animals
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
In each head and tail function takes a numerical parameter. As we pass the parameter, it gives us output accordingly.
For instance, if we pass 5 inside the head function, we get the first five elements.
If we pass -3, through head it will calculate the total number of elements first. After that, it will subtract the last three elements from the Pandas DataFrame.
The tail function on the contrary counts from the bottom.
In addition, the functionality is the same as the head function.
For more Pandas code snippets, you may visit the respective GitHub Repository.
What Next?
TensorFlow, Machine Learning, AI and Data Science
Leave a Reply