Question

Is there a way to read sequentially pretty-printed JSON objects in Python?

Suppose you have a JSON file like this:

{
  "a": 0
}
{
  "a": 1
}

It's not JSONL, because each object takes more than one line. But it's not a single valid JSON object either. It's sequentially listed pretty-printed JSON objects.

json.loads in Python gives an error about invalid formatting if you attempt to load this, and the documentation indicates it only loads a single object. But tools like jq can read this kind of data without issue.

Is there some reasonable way to work with data formatted like this using the core json library? I have an issue where I have some complex objects and while just formatting the data as JSONL works, for readability it would be better to store the data like this. I can wrap everything in a list to make it a single JSON object, but that has downsides like requiring reading the whole file in at once.

There's a similar question here, but despite the title the data there isn't JSON at all.

2 80 2

1 Jan 1970

Solution

You can partially decode text as JSON with json.JSONDecoder.raw_decode. This method returns a 2-tuple of the parsed object and the ending index of the object in the string, which you can then use as the starting index to partially decode the text for the next JSON object:

import json

def iter_jsons(jsons, decoder=json.JSONDecoder()):
    index = 0
    while (index := jsons.find('{', index)) != -1:
        data, index = decoder.raw_decode(jsons, index)
        yield data

so that:

jsons = '''\
{
  "a": 0
}
{
  "a": 1
}'''
for j in iter_jsons(jsons):
    print(j)

outputs:

{'a': 0}
{'a': 1}

Demo here

Note that the starting index as the second argument to json.JSONDecoder.raw_decode is an implementation detail, and that if you want to stick to the publicly documented API you would have to use the less efficient approach of slicing the string (which involves copying the string) from the index before you pass it to raw_decode:

def iter_jsons(jsons, decoder=json.JSONDecoder()):
    index = 0
    while (index := jsons.find('{', index)) != -1:
        data, index = decoder.raw_decode(jsons := jsons[index:])
        yield data

2024-07-19

blhsing

Solution

First Approach

Here is a way: Attempt to json.loads(), then

If succeeded, we are at the end of the string
If not, load the object up to the error spot, error.pos

Code:

import json

text = """
{
  "a": 0
}
{
  "a": 1
}
"""

obj_list = []
while True:
    try:
        obj_list.append(json.loads(text))
        # Success means we have reached the end of the string
        break
    except json.decoder.JSONDecodeError as error:
        # error.pos is where the error happens within the text
        valid_text, text = text[:error.pos], text[error.pos:]
        obj_list.append(json.loads(valid_text))

print(obj_list)

Second Approach

We are to turn text into a valid JSON text before decoding:

import json
import re

text = """
{
  "a": 0
}
{
  "a": 1
}
"""
text = "[" + re.sub(r'}\s+{', '},{', text) + "]"
obj_list = json.loads(text)
print(obj_list)

2024-07-19

Hai Vu

Solution

The best option would be to use build-in python library pprint Here

stuff is a dictionary object. If json is listed in a file you can load it using stuff = json.load(file_path)

otherwise if it is a file then you can use

stuff = json.load(file_path). As for the printing is concerned ppprint will do the job for you.

pprint.pp(stuff)

2024-07-19

Abhimanyu