In the previous section we have learned how we can play with pandas head and tail methods. Pandas describe method is also important.
Why? Because pandas describe method generates descriptive statistics that we need especially for studying statistical data.
Before moving ahead, let’s take a look at the code first.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/sanjibsinha/Machine-Learning-Primer/main/world_internet_user.csv', encoding = 'unicode_escape', engine ='python')
df.head()
# output
Country Region Population Internet Users % of Population
0 _World NaN 7920539977 5424080321 68.48
1 Afganistan Asia 40403518 9237489 22.86
2 Albania Europe 2872758 2191467 76.28
3 Algeria Africa 45150879 37836425 83.80
4 American Samoa Oceania 54995 34800 63.28
Now as we take a look at the head method, we know that it gives us output of the first five rows.
# summary statistics we need for data science
df.describe()
# output
Population Internet Users % of Population
count 2.430000e+02 2.430000e+02 243.000000
mean 6.518963e+07 4.464264e+07 69.921111
std 5.235850e+08 3.579556e+08 27.426974
min 5.960000e+02 4.140000e+02 0.080000
25% 3.321120e+05 2.049585e+05 52.190000
50% 5.302778e+06 2.864000e+06 77.940000
75% 2.025171e+07 9.898751e+06 91.010000
max 7.920540e+09 5.424080e+09 120.700000
However, when we want the descriptive statistics we use the describe method.
If you are a programming beginner you may take an interest in the following posts.
Steps in program development
Learn Programming Techniques
The levels of programming languages
What is high level language?
What is language portability?
Programming languages translators
Learn structured programming
Machine language to Assembly language
It includes many important statistical data that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding “NaN“ values.
We can also use the shape method to get the number of rows and columns at a glance.
# number of rows and columns
df.shape
# output
(243, 5)
As we take a look at the datasets, we know that the data types are not the same in each column.
If you are a complete beginner your journey to learn TensorFlow might start from here.
For the TensorFlow beginners we have a dedicated category – TensorFlow for Beginners.
But besides that, you may need to learn several other machine learning and data science libraries.
As a result, you may check these categories as well – NumPy, Pandas, Matplotlib.
However, without learning Python, you cannot learn the usages of these libraries. Why? Because they all use Python as the Programming language.
Therefore please learn Python at the very beginning and start learning TensorFlow.
And, finally please check our Mathematics, Discrete Mathematics and Data Structures categories specially. We have tried to discuss from basic to intermediate level so that you can pick up the core ideas of TensorFlow.
However, the pandas describe method analyzes both numeric and object series. Besides, it gives us an idea about the DataFrame column sets of mixed data types.
To get the output we can use the “dtypes” attribute. The output will vary depending on what is provided.
Let’s see the code.
# data type of each column of which some of them are objects
df.dtypes
# output
Country object
Region object
Population int64
Internet Users int64
% of Population float64
dtype: object
If we want only the object data type, we can certainly mention it.
df.describe(include=['object'])
# output
Country Region
count 243 242
unique 243 6
top _World Africa
freq 1 58
However, the describe() method gives us more detail and we can observe the data from a data science perspective.
- DataFrame.count: Count number of non-NA/null observations.
- DataFrame.max: Maximum of the values in the object.
- DataFrame.min: Minimum of the values in the object.
- DataFrame.mean: Mean of the values.
- DataFrame.std: Standard deviation of the observations.
- DataFrame.select_dtypes: Subset of a DataFrame including/excluding
For more such code please visit the respective GitHub Repository.
What Next?
TensorFlow, Machine Learning, AI and Data Science
Leave a Reply