input contains nan, infinity value large for dtype(‘float64’) – Code Example

Total
0
Shares

In this article I will provide you code examples in Python to resolve valueerror: input contains nan, infinity or a value too large for dtype(‘float64’). As indicated by the error, it occurs when data contains NaN or infinity. Such data can’t be processed because they have no definite bounds.

Code Example

Error Code – Let’s first replicate the error –

matrix = np.random.rand(5,5)
matrix[0,:] = np.inf
matrix[2,:] = -np.inf

print(matrix)

# Output:
array([[       inf,        inf,        inf,        inf,        inf],
       [0.87362809, 0.28321499, 0.7427659 , 0.37570528, 0.35783064],
       [      -inf,       -inf,       -inf,       -inf,       -inf],
       [0.72877665, 0.06580068, 0.95222639, 0.00833664, 0.68779902],
       [0.90272002, 0.37357483, 0.92952479, 0.072105  , 0.20837798]])

This matrix has infinite numbers. If you perform some operations like in sklearn, you will get this error –

valueerror: input contains nan, infinity or a value too large for dtype('float64')

Solutions

The obvious solution is to check for NaN and infinity in your matrix and replace those values with something meaningful and workable.

Method 1 – Check NaN & infinity using np.any() & np.all()

np.any(np.isnan(matrix))
np.all(np.isfinite(matrix))

Method 2 – For dataframes, use this function for cleaning –

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
    return df[indices_to_keep].astype(np.float64)

Method 3 – Reset index of dataframe –

df = df.reset_index()

But, this method will add an index to the dataframe.

Method 4 – Replace NaN & infinite with some value –

df.replace([np.inf, -np.inf], np.nan, inplace=True)

The above code will replace all infinite values with NaN. Next, we will replace NaN with some number –

df.fillna(999, inplace=True)

Method 5 – Using numpy nan_to_num() function –

df = np.nan_to_num(df)

Method 6 – For X_train –

X_train = X_train.replace((np.inf, -np.inf, np.nan), 0).reset_index(drop=True)

Method 7 – Detect all NaN and infinite in your data –

index = 0
for i in p[:,0]:
    if not np.isfinite(i):
        print(index, i)
    index +=1

This will print all the values which are not finite including NaN and infinite.

Method 8 – Dropping all NaN & infinite

df = df.replace([np.inf, -np.inf], np.nan)
df = df.dropna()
df = df.reset_index()

Method 9 – Replace NaN & infinite with max float64

inputArray[inputArray == inf] = np.finfo(np.float64).max