本文共 5188 字,大约阅读时间需要 17 分钟。
Pandas DataFrame dropna() function is used to remove rows and columns with Null/NaN values. By default, this function returns a new DataFrame and the source DataFrame remains unchanged.
Pandas DataFrame dropna()函数用于删除具有Null / NaN值的行和列。 默认情况下,此函数返回一个新的DataFrame,而源DataFrame保持不变。
We can create null values using None, pandas.NaT, and numpy.nan variables.
我们可以使用None,pandas.NaT和numpy.nan变量创建空值。
The dropna() function syntax is:
dropna()函数的语法为:
dropna(self, axis=0, how="any", thresh=None, subset=None, inplace=False)
Let’s look at some examples of using dropna() function.
让我们看一些使用dropna()函数的示例。
This is the default behavior of dropna() function.
这是dropna()函数的默认行为。
import pandas as pdimport numpy as npd1 = {'Name': ['Pankaj', 'Meghna', 'David', 'Lisa'], 'ID': [1, 2, 3, 4], 'Salary': [100, 200, np.nan, pd.NaT], 'Role': ['CEO', None, pd.NaT, pd.NaT]}df = pd.DataFrame(d1)print(df)# drop all rows with any NaN and NaT valuesdf1 = df.dropna()print(df1)
Output:
输出:
Name ID Salary Role0 Pankaj 1 100 CEO1 Meghna 2 200 None2 David 3 NaN NaT3 Lisa 4 NaT NaT Name ID Salary Role0 Pankaj 1 100 CEO
We can pass axis=1
to drop columns with the missing values.
我们可以传递axis=1
来删除缺少值的列。
df1 = df.dropna(axis=1)print(df1)
Output:
输出:
Name ID0 Pankaj 11 Meghna 22 David 33 Lisa 4
import pandas as pdimport numpy as npd1 = {'Name': ['Pankaj', 'Meghna', 'David', pd.NaT], 'ID': [1, 2, 3, pd.NaT], 'Salary': [100, 200, np.nan, pd.NaT], 'Role': [np.nan, np.nan, pd.NaT, pd.NaT]}df = pd.DataFrame(d1)print(df)df1 = df.dropna(how='all')print(df1)df1 = df.dropna(how='all', axis=1)print(df1)
Output:
输出:
Name ID Salary Role0 Pankaj 1 100 NaT1 Meghna 2 200 NaT2 David 3 NaN NaT3 NaT NaT NaT NaT Name ID Salary Role0 Pankaj 1 100 NaT1 Meghna 2 200 NaT2 David 3 NaN NaT Name ID Salary0 Pankaj 1 1001 Meghna 2 2002 David 3 NaN3 NaT NaT NaT
import pandas as pdimport numpy as npd1 = {'Name': ['Pankaj', 'Meghna', 'David', pd.NaT], 'ID': [1, 2, pd.NaT, pd.NaT], 'Salary': [100, 200, np.nan, pd.NaT], 'Role': [np.nan, np.nan, pd.NaT, pd.NaT]}df = pd.DataFrame(d1)print(df)df1 = df.dropna(thresh=2)print(df1)
Output:
输出:
Name ID Salary Role0 Pankaj 1 100 NaT1 Meghna 2 200 NaT2 David NaT NaN NaT3 NaT NaT NaT NaT Name ID Salary Role0 Pankaj 1 100 NaT1 Meghna 2 200 NaT
The rows with 2 or more null values are dropped.
具有2个或更多空值的行将被删除。
import pandas as pdimport numpy as npd1 = {'Name': ['Pankaj', 'Meghna', 'David', 'Lisa'], 'ID': [1, 2, 3, pd.NaT], 'Salary': [100, 200, np.nan, pd.NaT], 'Role': ['CEO', np.nan, pd.NaT, pd.NaT]}df = pd.DataFrame(d1)print(df)df1 = df.dropna(subset=['ID'])print(df1)
Output:
输出:
Name ID Salary Role0 Pankaj 1 100 CEO1 Meghna 2 200 NaN2 David 3 NaN NaT3 Lisa NaT NaT NaT Name ID Salary Role0 Pankaj 1 100 CEO1 Meghna 2 200 NaN2 David 3 NaN NaT
We can specify the index values in the subset when dropping columns from the DataFrame.
当从DataFrame中删除列时,我们可以在子集中指定索引值。
df1 = df.dropna(subset=[1, 2], axis=1)print(df1)
Output:
输出:
Name ID0 Pankaj 11 Meghna 22 David 33 Lisa NaT
The ‘ID’ column is not dropped because the missing value is looked only in index 1 and 2.
因为缺少的值仅在索引1和2中查找,所以不会删除“ ID”列。
We can pass inplace=True
to change the source DataFrame itself. It’s useful when the DataFrame size is huge and we want to save some memory.
我们可以传递inplace inplace=True
来更改源DataFrame本身。 当DataFrame很大并且我们想节省一些内存时,这很有用。
import pandas as pdd1 = {'Name': ['Pankaj', 'Meghna'], 'ID': [1, 2], 'Salary': [100, pd.NaT]}df = pd.DataFrame(d1)print(df)df.dropna(inplace=True)print(df)
Output:
输出:
Name ID Salary0 Pankaj 1 100.01 Meghna 2 NaN Name ID Salary0 Pankaj 1 100.0
翻译自:
转载地址:http://xfqzd.baihongyu.com/