valueerror: trailing data – Code Example

Total
0
Shares

Python throws valueerror trailing data when you try to read JSON from a file line by line and there are extra characters at the end like \n or \r\n. In this article we will see the code example to resolve this error.

Causes for this error

There are two reasons behind this error –

  1. The JSON file has trailing characters like \n or \r\n. This is due to file encodings like LF, CRLF etc.
  2. Wrong file path. If your JSON file is in different directory then pd.read_json() could throw this error.

Code Example

Error Code – Let’s reproduce this error first –

{"a": "ironman", "b": "Tony"}
{"a": "captain", "b": "Steve"}
{"a": "hulk", "b": "Bruce"}
{"a": "spiderman", "b": "Peter"}

This is a json file of superheroes. Now we will read it using pd.read_json()

import pandas as pd

df = pd.read_json("superhero.json")

It will throw valueerror: trailing data. So, output –

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json
    date_unit).parse()
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse
    self._parse_no_numpy()
  File "/Users/d/anaconda/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Trailing data

Solution

1. Use lines=True parameter in read_json

import pandas as pd

data = pd.read_json('superhero.json', lines=True)

This will read the file line by line.

2. Use file open function and read lines –

import json
import pandas as pd

with open('superhero.json', encoding="utf8") as f:
    data = f.readlines()
    data = [json.loads(line) for line in data] #convert string to dict format
df = pd.read_json(data) # Load into dataframe

3. If \n or \r\n are causing issues then use this code –

import pandas as pd

with open('superhero.json', 'r') as f:
    data = f.readlines()

# strip slashes
data = map(lambda x: x.rstrip(), data)

df = pd.read_json(data)

4. If you want to put all json objects in single array, then use this code –

import pandas as pd

with open('superhero.json', 'r') as f:
    data = f.readlines()

# strip slashes
data = map(lambda x: x.rstrip(), data)

data = "[" + ','.join(data) + "]"

df = pd.read_json(data)

5. Check if the location of json file is correct. Sometimes the file is in parent folder and we refer it from current location of python script. This causes valueerror: trailing data.

import pandas as pd

data = pd.read_json('../superhero.json', lines=True)