How to iterate through a nested JSON, search for a specific value and print the key with Python

Written by - 0 comments

Published on - Listed in Coding Python


While I was working on a Python script, I needed to find a way to iterate through a nested JSON (which is actually a dict inside Python), search for a specific field (key) and if the value of this key meets a condition, print the parent (top layer) key.

Python

The JSON data

The JSON data was loaded from an external command and saved as "data"

data = json.loads(output)

The full JSON data coming from the "output" variable is quite large. To work with only with a certain part of the data, I created another variable holding specific data:

healthdetail = data['Oem']['Hpe']['AggregateHealthStatus']

After defining "healthdetail", this variable now holds the following nested JSON:

{
 "AgentlessManagementService": "Unavailable",
 "AggregateServerHealth": "OK",
 "BiosOrHardwareHealth": {
  "Status": {
   "Health": "OK"
  }
 },
 "FanRedundancy": "Redundant",
 "Fans": {
  "Status": {
   "Health": "OK"
  }
 },
 "Memory": {
  "Status": {
   "Health": "OK"
  }
 },
 "Network": {
  "Status": {
   "Health": "OK"
  }
 },
 "PowerSupplies": {
  "PowerSuppliesMismatch": false,
  "Status": {
   "Health": "OK"
  }
 },
 "PowerSupplyRedundancy": "Redundant",
 "Processors": {
  "Status": {
   "Health": "OK"
  }
 },
 "SmartStorageBattery": {
  "Status": {
   "Health": "OK"
  }
 },
 "Storage": {
  "Status": {
   "Health": "OK"
  }
 },
 "Temperatures": {
  "Status": {
   "Health": "OK"
  }
 }
}

Note: Yes, you may have noticed that this data comes from a Redfish server health API.

As you can see from the output, there are a couple of different components listed, such as "PowerSupplies" or "Storage". Each component has its own Health Status, nested under ComponentName.Status.Health.

Now my goal was to iterate through all components and identify a component where the "Health" is not set to OK.

First but failed approach: A for loop

Python coding is not something I am particularly good at, but I challenged myself to do this in Python - and here we are. In a BASH script, I would have used jq in combination with its search filter (select), but how do I approach this in Python? 

I thought I'd go with a for loop: For each component, print the Status.Health sub-value (if it exists):

      healthdetail = data['Oem']['Hpe']['AggregateHealthStatus']
      print(healthdetail)
      for item in healthdetail:
        if item['Status']['Health']:
          print(item)

But Python quickly told me nope:

ck@mintp ~/Git/python_script $ python3 python_script.py
{'AgentlessManagementService': 'Unavailable', 'AggregateServerHealth': 'OK', 'BiosOrHardwareHealth': {'Status': {'Health': 'OK'}}, 'FanRedundancy': 'Redundant', 'Fans': {'Status': {'Health': 'OK'}}, 'Memory': {'Status': {'Health': 'OK'}}, 'Network': {'Status': {'Health': 'OK'}}, 'PowerSupplies': {'PowerSuppliesMismatch': False, 'Status': {'Health': 'OK'}}, 'PowerSupplyRedundancy': 'Redundant', 'Processors': {'Status': {'Health': 'OK'}}, 'SmartStorageBattery': {'Status': {'Health': 'OK'}}, 'Storage': {'Status': {'Health': 'OK'}}, 'Temperatures': {'Status': {'Health': 'OK'}}}
Traceback (most recent call last):
  File "/home/ck/Git/python_script/python_script.py", line 193, in <module>
    check_computersystem()
  File "/home/ck/Git/python_script/python_script.py", line 169, in check_computersystem
    if item['Status']['Health']:
TypeError: string indices must be integers

The first line (printing the full JSON) worked, but then the for loop failed miserably. The TypeError message doesn't make sense to a occasional Python script writer like me - but research revealed the reason behind the TypeError:

we are trying to access a value using a key within another key and that’s leading to the occurrence of the error. The key is of the type string and not dict.

You can't use a value as a key

In other words: Python is parsing through each component (from healthdetail). The JSON data held in "healthdetail" is a dict, which can be verified:

healthdetail = data['Oem']['Hpe']['AggregateHealthStatus']
print(healthdetail)
print(type(healthdetail))

ck@mintp ~/Git/python_script $ python3 python_script.py
{'AgentlessManagementService': 'Unavailable', 'AggregateServerHealth': 'OK', 'BiosOrHardwareHealth': {'Status': {'Health': 'OK'}}, 'FanRedundancy': 'Redundant', 'Fans': {'Status': {'Health': 'OK'}}, 'Memory': {'Status': {'Health': 'OK'}}, 'Network': {'Status': {'Health': 'OK'}}, 'PowerSupplies': {'PowerSuppliesMismatch': False, 'Status': {'Health': 'OK'}}, 'PowerSupplyRedundancy': 'Redundant', 'Processors': {'Status': {'Health': 'OK'}}, 'SmartStorageBattery': {'Status': {'Health': 'OK'}}, 'Storage': {'Status': {'Health': 'OK'}}, 'Temperatures': {'Status': {'Health': 'OK'}}}
<class 'dict'>

But that is not true for the "item" inside the for loop.

Each component is a value from the previous JSON output. But the for loop tries to use the component name value as key (item) inside the for loop:

healthdetail = data['Oem']['Hpe']['AggregateHealthStatus']
print(healthdetail)
print(type(healthdetail))
for item in healthdetail:
   print(type(item))
   if item['Status']['Health']:
   print(item)

The Python output shows that "item" is now a string (str):

ck@mintp ~/Git/python_script $ python3 python_script.py
{'AgentlessManagementService': 'Unavailable', 'AggregateServerHealth': 'OK', 'BiosOrHardwareHealth': {'Status': {'Health': 'OK'}}, 'FanRedundancy': 'Redundant', 'Fans': {'Status': {'Health': 'OK'}}, 'Memory': {'Status': {'Health': 'OK'}}, 'Network': {'Status': {'Health': 'OK'}}, 'PowerSupplies': {'PowerSuppliesMismatch': False, 'Status': {'Health': 'OK'}}, 'PowerSupplyRedundancy': 'Redundant', 'Processors': {'Status': {'Health': 'OK'}}, 'SmartStorageBattery': {'Status': {'Health': 'OK'}}, 'Storage': {'Status': {'Health': 'OK'}}, 'Temperatures': {'Status': {'Health': 'OK'}}}
<class 'dict'>
<class 'str'>
Traceback (most recent call last):
  File "/home/ck/Git/python_script/python_script.py", line 193, in <module>
    check_computersystem()
  File "/home/ck/Git/python_script/python_script.py", line 169, in check_computersystem
    if item['Status']['Health']:
TypeError: string indices must be integers

To solve this, I need to move away from the for loop using an "item" as key, as this will always be a string and therefore will not work with the nested dictionary.

Deep/nested iteration lookup function

After investing quite some time for a solution, I eventually came across an answer on StackOverflow, which offers a  function doing a "deep iteration lookup" into nested dictionaries.

After having adjusted the "parse_json_recursively" function to my needs, the function now looks like this:

def parse_json_recursively(json_object, target_key, *parent_key):
    if type(json_object) is dict and json_object:
        for key in json_object:
            print("Key: {}, Parent Key: {}".format(key, parent_key))
            if key == target_key: # Found Health key
                print("Health status of {} is {}".format(parent_key, json_object[key]['Health']))

                print("---------------------------")
            else: # Continue iterating
                parse_json_recursively(json_object[key], target_key, key)

The function itself is also a loop, handling the given JSON input (json_object), searching for a specific key/field (target_key). If a specific key is not found but a nested JSON exists, run the same function again with the nested JSON - a loop within a loop until either the wanted JSON key is found or the nested JSON data has ended.

I also added an optional parent_key as third input. This is used while the function parses a nested JSON. When it calls itself, the current key is added as third input, which is then used as parent_key.

Executing the following Python code seems to do what I need it to do:

healthdetail = data['Oem']['Hpe']['AggregateHealthStatus']
healthkey = 'Status'
print(healthdetail)
print(type(healthdetail))
parse_json_recursively(healthdetail, healthkey)

ck@mintp ~/Git/python_script $ python3 python_script.py
{'AgentlessManagementService': 'Unavailable', 'AggregateServerHealth': 'OK', 'BiosOrHardwareHealth': {'Status': {'Health': 'OK'}}, 'FanRedundancy': 'Redundant', 'Fans': {'Status': {'Health': 'OK'}}, 'Memory': {'Status': {'Health': 'OK'}}, 'Network': {'Status': {'Health': 'OK'}}, 'PowerSupplies': {'PowerSuppliesMismatch': False, 'Status': {'Health': 'OK'}}, 'PowerSupplyRedundancy': 'Redundant', 'Processors': {'Status': {'Health': 'OK'}}, 'SmartStorageBattery': {'Status': {'Health': 'OK'}}, 'Storage': {'Status': {'Health': 'OK'}}, 'Temperatures': {'Status': {'Health': 'OK'}}}
<class 'dict'>
Key: AgentlessManagementService, Parent Key: ()
Key: AggregateServerHealth, Parent Key: ()
Key: BiosOrHardwareHealth, Parent Key: ()
Key: Status, Parent Key: ('BiosOrHardwareHealth',)
Health status of ('BiosOrHardwareHealth',) is OK
---------------------------
Key: FanRedundancy, Parent Key: ()
Key: Fans, Parent Key: ()
Key: Status, Parent Key: ('Fans',)
Health status of ('Fans',) is OK
---------------------------
Key: Memory, Parent Key: ()
Key: Status, Parent Key: ('Memory',)
Health status of ('Memory',) is OK
---------------------------
Key: Network, Parent Key: ()
Key: Status, Parent Key: ('Network',)
Health status of ('Network',) is OK
---------------------------
Key: PowerSupplies, Parent Key: ()
Key: PowerSuppliesMismatch, Parent Key: ('PowerSupplies',)
Key: Status, Parent Key: ('PowerSupplies',)
Health status of ('PowerSupplies',) is OK
---------------------------
Key: PowerSupplyRedundancy, Parent Key: ()
Key: Processors, Parent Key: ()
Key: Status, Parent Key: ('Processors',)
Health status of ('Processors',) is OK
---------------------------
Key: SmartStorageBattery, Parent Key: ()
Key: Status, Parent Key: ('SmartStorageBattery',)
Health status of ('SmartStorageBattery',) is OK
---------------------------
Key: Storage, Parent Key: ()
Key: Status, Parent Key: ('Storage',)
Health status of ('Storage',) is OK
---------------------------
Key: Temperatures, Parent Key: ()
Key: Status, Parent Key: ('Temperatures',)
Health status of ('Temperatures',) is OK
---------------------------

The output now shows the component name (by using the parent_key input) and the health status of it. This looks promising and almost done, but why is the component name written inside parentheses and single quotes?

Convert tuple to string

It turned out that the parent_key was stored as a tuple type variable which I was able to find out by using a print out of the parent_key type:

def parse_json_recursively(json_object, target_key, *parent_key):
    if type(json_object) is dict and json_object:
        for key in json_object:
            print(type(parent_key))
            print("Key: {}, Parent Key: {}".format(key, parent_key))
            if key == target_key: # Found Health key
                pkey = "".join(parent_key) # Parent Key is a tuple, need to convert to str
                print("Health status of {} is {}".format(pkey, json_object[key]['Health']))
                print("---------------------------")
            else: # Continue iterating
                parse_json_recursively(json_object[key], target_key, key)

healthdetail = data['Oem']['Hpe']['AggregateHealthStatus']
healthkey = 'Status'
parse_json_recursively(healthdetail, healthkey)

ck@mintp ~/Git/python_script $ python3 python_script.py
<class 'tuple'>
Key: AgentlessManagementService, Parent Key: ()
<class 'tuple'>
Key: AggregateServerHealth, Parent Key: ()
<class 'tuple'>
Key: BiosOrHardwareHealth, Parent Key: ()
<class 'tuple'>
Key: Status, Parent Key: ('BiosOrHardwareHealth',)
Health status of BiosOrHardwareHealth is OK
---------------------------
<class 'tuple'>
Key: FanRedundancy, Parent Key: ()
<class 'tuple'>
Key: Fans, Parent Key: ()
<class 'tuple'>
Key: Status, Parent Key: ('Fans',)
Health status of Fans is OK
---------------------------
[...]

To get rid of the ('Parent Key',) syntax, I had to add a conversion from tuple to string:

def parse_json_recursively(json_object, target_key, *parent_key):
    if type(json_object) is dict and json_object:
        for key in json_object:
            print(type(parent_key))
            print("Key: {}, Parent Key: {}".format(key, parent_key))
            if key == target_key: # Found Health key
                pkey = "".join(parent_key) # Parent Key is a tuple, need to convert to str
                print("Health status of {} is {}".format(pkey, json_object[key]['Health']))

                print("---------------------------")
            else: # Continue iterating
                parse_json_recursively(json_object[key], target_key, key)

healthdetail = data['Oem']['Hpe']['AggregateHealthStatus']
healthkey = 'Status'
parse_json_recursively(healthdetail, healthkey)

And the output is now to my liking:

ck@mintp ~/Git/python_script $ python3 python_script.py
<class 'tuple'>
Key: AgentlessManagementService, Parent Key: ()
<class 'tuple'>
Key: AggregateServerHealth, Parent Key: ()
<class 'tuple'>
Key: BiosOrHardwareHealth, Parent Key: ()
<class 'tuple'>
Key: Status, Parent Key: ('BiosOrHardwareHealth',)
Health status of BiosOrHardwareHealth is OK
---------------------------
<class 'tuple'>
Key: FanRedundancy, Parent Key: ()
<class 'tuple'>
Key: Fans, Parent Key: ()
<class 'tuple'>
Key: Status, Parent Key: ('Fans',)
Health status of Fans is OK
---------------------------
<class 'tuple'>
Key: Memory, Parent Key: ()
<class 'tuple'>
Key: Status, Parent Key: ('Memory',)
Health status of Memory is OK
---------------------------
[...]



Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   OpenSearch   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder