In this article I will provide you code examples in Python to resolve valueerror: input contains nan, infinity or a value too large for dtype(‘float64’). As indicated by the error, it occurs when data contains NaN or infinity. Such data can’t be processed because they have no definite bounds.
Code Example
Error Code – Let’s first replicate the error –
matrix = np.random.rand(5,5) matrix[0,:] = np.inf matrix[2,:] = -np.inf print(matrix) # Output: array([[ inf, inf, inf, inf, inf], [0.87362809, 0.28321499, 0.7427659 , 0.37570528, 0.35783064], [ -inf, -inf, -inf, -inf, -inf], [0.72877665, 0.06580068, 0.95222639, 0.00833664, 0.68779902], [0.90272002, 0.37357483, 0.92952479, 0.072105 , 0.20837798]])
This matrix has infinite numbers. If you perform some operations like in sklearn
, you will get this error –
valueerror: input contains nan, infinity or a value too large for dtype('float64')
Solutions
The obvious solution is to check for NaN
and infinity
in your matrix and replace those values with something meaningful and workable.
Method 1 – Check NaN
& infinity
using np.any()
& np.all()
np.any(np.isnan(matrix)) np.all(np.isfinite(matrix))
Method 2 – For dataframes
, use this function for cleaning –
def clean_dataset(df): assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame" df.dropna(inplace=True) indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1) return df[indices_to_keep].astype(np.float64)
Method 3 – Reset index of dataframe –
df = df.reset_index()
But, this method will add an index
to the dataframe.
Method 4 – Replace NaN
& infinite
with some value –
df.replace([np.inf, -np.inf], np.nan, inplace=True)
The above code will replace all infinite
values with NaN
. Next, we will replace NaN
with some number –
df.fillna(999, inplace=True)
Method 5 – Using numpy
nan_to_num()
function –
df = np.nan_to_num(df)
Method 6 – For X_train –
X_train = X_train.replace((np.inf, -np.inf, np.nan), 0).reset_index(drop=True)
Method 7 – Detect all NaN
and infinite
in your data –
index = 0 for i in p[:,0]: if not np.isfinite(i): print(index, i) index +=1
This will print all the values which are not finite including NaN
and infinite
.
Method 8 – Dropping all NaN
& infinite
–
df = df.replace([np.inf, -np.inf], np.nan) df = df.dropna() df = df.reset_index()
Method 9 – Replace NaN
& infinite
with max float64
–
inputArray[inputArray == inf] = np.finfo(np.float64).max