# 5 must known techniques in pandas

In this article I used a dataset on kaggle

__1. Sorting DataFrame__

what is a data frame?

A DataFrame is a data structure with labelled axes for both rows and columns. You can sort a DataFrame by row or column value as well as by row or column index.

__Sorting by Values__

__Sorting by Values__

1st: lets sort by values in a DataFrame along either axis (columns or rows)

`df.sort_values("Poet")`

*1. Sorting by a Column in Ascending Order*

To use .sort_values(), you pass a single argument to the method containing the name of the column you want to sort by

`df.sort_values("Poet")`

*2. Changing the Sort Order to Descending Order*

Another parameter of .sort_values() is ascending. By default .sort_values() has ascending set to True. If you want the DataFrame sorted in **descending order**, then you can pass False to this parameter:

`df.sort_values(by="Poet",ascending=False)`

*3.Choosing a Sorting Algorithm*

It’s good to note that pandas allows you to choose different **sorting algorithms** to use with both .sort_values() and .sort_index(). The available algorithms are quicksort, mergesort, and heapsort.

You can do that with the kind parameter in .sort_values() or .sort_index(), like this

`df.sort_values(by="Poet",ascending=False,kind="mergesort")`

**Note:** In pandas, kind is ignored when you sort on more than one column or label.

*4.Sorting on Multiple Columns*

In data analysis, it’s common to want to sort your data based on the values of multiple columns.To sort by two keys, you can pass a list of column names to by:

`df.sort_values(by=["Poem", "Poet"])`

*5.Sorting Multiple Columns with different Order*

Up to this point, We've sorted only in ascending order on multiple columns. In the next example, W'll sort in descending order based on the Poet,Poem and Title columns.

`df.sort_values(by=["Poet", "Poem", "Title"],ascending=[True, True, False])`

__Sorting Indexes__

__Sorting Indexes__

Before sorting on the index, it’s a good idea to know what an index represents. A DataFrame has an **.index** property, which by default is a numerical representation of its rows’ locations. You can think of the index as the row numbers. It helps in quick row lookup and identification.

Start by creating a new sorted DataFrame using .sort_values():

`assigned_index_df = df.set_index(["Poet", "Poem"])`

*1. Sorting by Index *

Using this method, you replace the default integer-based row index with two axis labels. This is considered a MultiIndex or a **hierarchical index**. Your DataFrame is now indexed by more than one key, which you can sort on with .sort_index():

`assigned_index_df.sort_index()`

*2. Sorting by Index in Descending Order*

You can also use the column labels of a DataFrame as the sorting key for .sort_index(). Setting axis to 1 sorts the columns of your DataFrame based on the column labels:

`df.sort_index(axis=1)`

*3. Sorting the columns in descending order*

`df.sort_index(axis=1, ascending=False)`

__2. Apply Function__

__2. Apply Function__

The apply() function is used to apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1)

If you want to iterate over each row in the data frame you probably guessed it right!! use (lambda)

if we want to apply the function to the rows rather than the columns then we should use (axis=1)

Let's assume that we want to add a text after the poet name.

The text is (he was talented)

we can make it into two steps:

1- function to add the name:

```
def add_r(x):
return(x+' He was' +' Tallented')
```

2- applying the lambda function

`df.apply(lambda row: add_r(row['Poet']),axis=1)`

the result is:

```
0 Michelle Menting He was Tallented
1 Lucia Cherciu He was Tallented
2 Ted Kooser He was Tallented
3 Grace Cavalieri He was Tallented
4 Connie Wanek He was Tallented
...
13 Jody Gladding He was Tallented
14 William Jay Smith He was Tallented
15 William Jay Smith He was Tallented
0 Hailey Leithauser He was Tallented
1 Wendy Videlock He was Tallented
Length: 13854, dtype: object
```

we can also apply a numpy function in the apply function.

i used a numerical dataframe that i created to sqaure root the demands

```
df1['demand'].apply(np.sqrt)
0 17.029386
1 16.124515
2 16.970563
3 17.320508
4 17.606817
5 17.406895
6 18.138357
7 18.439089
8 17.776389
9 18.165902
10 17.549929
11 17.606817
Name: demand, dtype: float64
```

__3. Pandas Cut __

Pandas cut() function is used to separate the array elements into different bins . The cut function is mainly used to perform statistical analysis on scalar data.

let's create a new column containing the range of demands:

`df1['bins'] = pd.cut(x=df1['demand'], bins=[250, 290, 330, 370])`

We can check the frequency of each bin:

`print(df1['bins'].unique())`

```
[(250, 290], (290, 330], (330, 370]]
Categories (3, interval[int64]): [(250, 290] < (290, 330] < (330, 370]]
```

__4. Pandas Query__

Talking about one of my absolute favourites is Pandas Query

there are many methods that enable you to retrieve subsets based on row and column labels or by integer index of the rows and columns. however those tools can be a little cumbersome syntactically.

In many ways, the Pandas .query method solves those problems.

Query enables you to “query” a DataFrame and retrieve subsets based on logical conditions.

Moreover, the syntax is a little more streamlined than Pandas bracket notation.

For example if i want to find the months where the demand exceeded 300

`df1.query('demand > 300').month`

and the result is:

```
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
Name: month, dtype: int64
```

__5. Pandas Where !?__

searching for something a vast dataframe? Where is your savoir.

Pandas **where()** method is used to check a data frame for one or more condition and return the result accordingly. By default, The rows not satisfying the condition are filled with NaN value.

It is surprisingly easy to use and the method is very versatile.

```
filter = df1["demand"]==300
df1.where(filter, inplace = True)
# display
df1
```

you can also make as many filters as you want and add the filters this way:

`df1.where(filter1 & filter2, inplace = True)`

inplace= True means that the change will be applied on the original dataframe.

This is all for this Article. Hope it can enlighten your way :)

## Comments