Pandas sort by column name is nothing but a certain type of data analysis. For beginners especially, we can do it to give an idea.
Think about the data structure as a spreadsheet where we have multiple rows and columns. Right?
Now we can use Pandas to handle a large amount of data because this library offers highly performant data manipulation capabilities.
Let us start by importing the Pandas package first. After that we will read a CSV file.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/sanjibsinha/Machine-Learning-Primer/main/world_internet_user.csv', encoding = 'unicode_escape', engine ='python')
df.head()
As a result, we can see the first five rows of the DataFrame. Based on which we can use the sort methods.
# sorting the 'Country' Series in ascending order (it will always return a Series)
df.Country.sort_values().head()
# output
1 Afganistan
2 Albania
3 Algeria
4 American Samoa
5 Andorra
Name: Country, dtype: object
There are several ways we can sort the values and analyze the data.
We can sort a pandas DataFrame by the values of one or more columns. It also depends on which parameter we are going to use.
By default, Pandas use the ascending parameter to change the sort order. But we can change it. Right?
# we can sort in descending order instead
df.Country.sort_values(ascending=False).head()
# output
0 _World
242 Zimbabwe
241 Zambia
240 Yemen
239 Western Sahara
Name: Country, dtype: object
Consequently, we can sort a DataFrame in place using the “inplace” argument set to True.
Since we have already read the values we can change the parameters and see the output.
# sort the entire DataFrame by the 'Internet Users' Series (it always returns a DataFrame)
df.sort_values('Internet Users').head()
As a result, as the number of the internet users ascends, the rows progress accordingly.
By the way, a DataFrame represents a data structure with labeled axes for both rows and columns.
As an outcome, we can sort a DataFrame by row or column value as well as by row or column index.
# sorting the DataFrame first by 'Internet Users', then by 'Country' = it makes two columns in ascending orders
df.sort_values(['Internet Users', 'Country']).head()
In the above example we have used two columns, however, we can use two columns also.
For that reason, sorting our DataFrame on a Single Column looks much easier than more than one column.
We use the sort_values() method.
As we said, by default, this will return a new DataFrame sorted in ascending order.
In no circumstances, it does not modify the original DataFrame.
df.sort_values(by=['Country'])
We can also use more than one column in the same way.
df.sort_values(by=['Internet Users', '% of Population'])
#output
We can take a look at the table.
![Pandas sort](https://i0.wp.com/sanjibsinha.com/wp-content/uploads/2023/01/Pandas-sort.webp?ssl=1)
For the full code please visit the respective branches of the GitHub Repository.
What Next?
TensorFlow, Machine Learning, AI and Data Science
Leave a Reply