Category: Data Science

  • Pandas sort by column name

    Pandas sort by column name is nothing but a certain type of data analysis. For beginners especially, we can do it to give an idea. Think about the data structure as a spreadsheet where we have multiple rows and columns. Right? Now we can use Pandas to handle a large amount of data because this…

  • Pandas describe method

    In the previous section we have learned how we can play with pandas head and tail methods. Pandas describe method is also important. Why? Because pandas describe method generates descriptive statistics that we need especially for studying statistical data.  Before moving ahead, let’s take a look at the code first. Now as we take a…

  • Pandas head and tail

    For beginners in data science and machine learning, Pandas head and tail functions are two most basic tools that we use. We have already discussed a few Pandas topics before. If you have not already had an idea about how Pandas Python library works, you may check the following links. Firstly, Pandas library in Python…

  • NumPy and Pandas for Machine Learning

    Both the NumPy and Pandas are essential for scientific computation. It includes machine learning and data science.  By now we have come to know that both are libraries and almost in every step we need them. In this section we will take a close look at what are the key differences between these two libraries.…

  • Install Jupyter Pandas Matplotlib

    How to install Jupyter Pandas Matplotlib on our machine?  Firstly, let’s start with the Jupyter Notebook on Ubuntu that I use. On that account you need to open your terminal and type a few commands. And that’s all you need to do. In this section we will take a close look at this topic and…

  • What are three data structures in pandas?

    Three data structures in Pandas are Series, DataFrame and Panel. In short, we can say, one, two and multi dimensional arrays. One dimensional array means simply a column of data. We will see that in a minute with Pandas library.  On the other hand, a DataFrame represents roots and columns.  We hardly use the Panel.…

  • Logarithm and Data Analysis

    What is the relationship between logarithm and data analysis? In machine learning and data science we need basic mathematics. With reference to this topic, in our previous section we have seen how we can use basic exponent and logarithm in Python. Certainly the previous article was an introduction, towards machine learning, data science and mathematics.…

  • What is machine learning in simple words?

    When we say TensorFlow is a machine learning or ML library, what does that mean? To know that we need to know what ML is. Firstly, machine learning is a type of artificial intelligence or AI.  What is artificial intelligence or AI? Artificial intelligence allows software applications to become more accurate at predicting outcomes. However,…

  • Pandas DataFrame update column row

    How do we update Pandas DataFrame column value? Moreover, can we apply logic on these changes that take place? Well, as a beginner, we face such questions and we try to find simple solutions. For that reason, in this section, we will discuss this topic. In addition we will also try to move forward and…

  • How to filter Pandas DataFrame

    We have already seen how versatile the Pandas package is. In this section we’ll find how we can filter Pandas DataFrame.  We have been working with the Pandas GitHub repository of data. The DataFrame is simple. Let’s see the code. There are altogether 244 rows and 7 columns.  By the way, we refer to rows…

  • Pandas DataFrame Rows and Columns

    Whenever we think about tabular data in Pandas, we think about rows and columns. We can also think about rows as entries.  And in the Pandas package, altogether we call them a DataFrame.  That’s why, Pandas can deal with data structures better than many other python packages. As a result, if we want to read,…

  • Pandas DataFrame iloc and loc

    What is the difference between the Pandas iloc and loc methods? In this section we will take a quick look and try to understand it. Firstly, as the name suggests, the “iloc” method refers to integer location. Whereas, the “loc” method works on labels. Well, as a beginner you may find this subtle difference a…

  • Pandas DataFrame operations introduction

    In this section we will have an introduction to the Pandas package and DataFrame operations. We’ve learned what the Pandas package means. In addition, we have seen how DataFrame works. As a Data Science beginner we need data analysis. And to learn that we need the Pandas package and we must have a clear vision…

  • Pandas DataFrame and Python Dictionary

    Pandas is a Python package. It provides fast and flexible data structures. However, is there any similarity between DataFrame and Python? In this section we will try to answer this question. To do that, we need to know how the Pandas package works. Firstly, the Pandas data structures can work with either “relational” or “labelled”…

  • How to install Jupyter notebook and work locally

    In this section we will learn how to install Jupyter notebook and work locally. We can say it’s a local version of Google Colab.  Or you may think just the opposite while working in Google Colab. Firstly, for Linux Debian like Ubuntu, or for Mac, the installation mechanism is the same. However, for Windows we…

  • Pandas reading writing Tabular Data

    In this section we will see how reading and writing tabular data gets easier with the Pandas package in Python.   In the last section, we have seen how we can read tabular data in Pandas. Besides reading, writing tabular data in Pandas is also easy. Let’s see how we cam do that. Let’s first read…

  • How do I read and write tabular data?

    In our previous section we learned how to use the Pandas package in python. In this section we’ll learn to read and write tabular data.  Tabular data is nothing but a two dimensional array. We place them as rows and columns.  In any relational database, we get tabular data. Not only that, we can get…

  • An Introduction to Pandas Package in Python

    Why do we need the Pandas package in Python? That’s the first question we need to answer. There are several reasons though. However, the main reason is, of course, Pandas can deal with data structures better than many other python packages. As a result, if we want to read, change, modify or manipulate data structures, Pandas…

  • How to apply categorical plot in Matplotlib

    We can plot categorical data in Matplotlib in many ways. Certainly, we can use Pandas library to create the categories. And after that, we can plot the graph with the help of the Matplotlib library. We will see how we can do that with Pandas and Matplotlib first. And then we will see how Matplotlib makes…

  • Linear Regression and Machine Learning Algorithm

    Linear Regression refers to many points. It’s one of the core machine learning algorithms. Also, it’s the easiest of all. As a result, it makes predictions for continuous and real numerical values, such as sales. At the same time, it is a core statistical and data science concept. Why? Because in statistics, or in data science…

  • Can we skew Median in Data Science

    In our previous section we have seen that we can trust Median than Mean. But in reality we can skew the Median. And we can make the Median look much greater than it should be.  In data science, as well as in statistics, we can prove that. We can skew the Median and maximise its…