How can I remove none row or column from pandas DataFrame ?
Pandas is very useful to handle table data.
In table data, sometimes it contains None
data.
In that case we would like to remove None
from specific column.
So how can we remove None
?
Today I will introduce about "How to remove none from pandas DataFrame".
How to remove none from pandas DataFrame
In order to remove None data, use dropna()
method.
As its name, dropna()
drops None
data.
We can use it like below.
import pandas as pd
data_list1 = [
[1,2,None],
[2,None,4],
[None,4,5],
[4,5,6]
]
col_list1 = ["c1","c2","c3"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)
# c1 c2 c3
# 0 1.0 2.0 NaN
# 1 2.0 NaN 4.0
# 2 NaN 4.0 5.0
# 3 4.0 5.0 6.0
df2 = df1.dropna()
print(df2)
# c1 c2 c3
# 3 4.0 5.0 6.0
With using dropna()
, we could extract rows that does not have None
.
Then how can we handle more complex data ?
None in specific column
We could remove data that has None
.
Then how can we check None
in specific column ?
In order to set column condition in dropna()
, we can use subset
.
We can set column names in subset
like below.
df3 = df1.dropna(subset=["c1","c2"])
print(df3)
# c1 c2 c3
# 0 1.0 2.0 NaN
# 3 4.0 5.0 6.0
Now it removed rows that contain None
in column c1 or c2.
None in all columns
So how can we remove data that has none in all columns ?
This case, use how="all"
.
If you set how="all"
, you can get data without rows that has none in all columns.
data_list1 = [
[1,2,None],
[2,None,4],
[None,None,None],
[4,5,6]
]
col_list1 = ["c1","c2","c3"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)
# c1 c2 c3
# 0 1.0 2.0 NaN
# 1 2.0 NaN 4.0
# 2 NaN NaN NaN
# 3 4.0 5.0 6.0
df2 = df1.dropna()
print(df2)
# c1 c2 c3
# 3 4.0 5.0 6.0
df4 = df1.dropna(how="all")
print(df4)
# c1 c2 c3
# 0 1.0 2.0 NaN
# 1 2.0 NaN 4.0
# 3 4.0 5.0 6.0
Remove column that has none
With using dropna()
, we could remove rows that has None
.
Then how can we drop columns ?
In order to remove column, use axis=1
option.
data_list1 = [
[1,2,None],
[2,None,4],
[3,4,5],
[4,5,6]
]
col_list1 = ["c1","c2","c3"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)
# c1 c2 c3
# 0 1.0 2.0 NaN
# 1 2.0 NaN 4.0
# 2 3.0 4.0 5.0
# 3 4.0 5.0 6.0
df5 = df1.dropna(axis=1)
print(df5)
# c1
# 0 1
# 1 2
# 2 3
# 3 4
Now it removed columns that contain None
.
Conclusion
Today I described about "How to remove none from pandas DataFrame".
In order to remove None
, we can use dropna()
.
And we can use these options.
- Filter by specific columns:
subset=["column name"]
- Remove rows that has None in all columns:
how="all"
- Remove columns:
axis=1
It is useful. So I'd like to remember it.