Convert pandas MultiIndex DataFrame to Nested Dictonary
Background/Problem
Just as yesterday’s post, I started writing this post about a year ago, but never finished it. Let’s change that now.
So, I had the following problem: I wanted to turn the data in a
pandas.DataFrame [1] with a MultiIndex into a list of nested
dictionaries. Wait what? Let me clarify with an example:
The following short script, sets up an example DataFrame:
import pandas as pd
import numpy as np
one = 6 * ["Alice"] + 6 * ["Bob"]
two = 2 * (2 * ["foo"] + 2 * ["bar"] + 2 * ["baz"])
three = 6 * ["one", "two"]
tuples = list(zip(*[one, two, three]))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second", "third"])
np.random.seed(42)
df = pd.DataFrame(
np.random.randint(low=1, high=11, size=(3, 12)),
index=["A", "B", "c"],
columns=index
)
If you print it (print(df)), you get the following output:
first Alice Bob
second foo bar baz foo bar baz
third one two one two one two one two one two one two
A 7 4 8 5 7 10 3 7 8 5 4 8
B 8 3 6 5 2 8 6 2 5 1 10 6
c 9 1 10 3 7 4 9 3 5 3 7 5
At this point, I needed a way to convert every row (A, B, and C) into a
nested dictionary. Let’s say the dictionary for row A is stored in dict_A.
Printing it (print(dict_A)) should lead to the following output (I
reformatted it a little bit for better readability):
{
'Alice': {
'bar': {'one': 8, 'two': 5},
'baz': {'one': 7, 'two': 10},
'foo': {'one': 7, 'two': 4}
},
'Bob': {
'bar': {'one': 8, 'two': 5},
'baz': {'one': 4, 'two': 8},
'foo': {'one': 3, 'two': 7}
}
}
If you think to yourself: Why would you want to do that? Well, I had a
Jinja [2] template and the data in the DataFrame was
supposed to be used in that template. A nested dictionary seemed to be the best
data format for passing the data into the Jinja’s render function.
Ok, let’s look at how I accomplished that.
Solution
First I thought that this should be pretty easy to do. Every row in a
DataFrame is of type <class 'pandas.core.series.Series'>. You can verify
that with:
print(type(df.iloc[0]))
Series objects have a method called
to_dict
[3] that I was hoping would to the job:
print(df.iloc[0].to_dict())
But the output of that command looks like this (again a little reformatted):
{
('Alice', 'foo', 'one'): 7,
('Alice', 'foo', 'two'): 4,
('Alice', 'bar', 'one'): 8,
('Alice', 'bar', 'two'): 5,
('Alice', 'baz', 'one'): 7,
('Alice', 'baz', 'two'): 10,
('Bob', 'foo', 'one'): 3,
('Bob', 'foo', 'two'): 7,
('Bob', 'bar', 'one'): 8,
('Bob', 'bar', 'two'): 5,
('Bob', 'baz', 'one'): 4,
('Bob', 'baz', 'two'): 8
}
Well, not exactly what I was hoping for.
After some research and some tinkering around, I wrote the following function:
def convert_series_to_dict(series):
if (series.index.nlevels == 1):
return series.to_dict()
else:
output_dict = {}
for header in series.index.levels[0]:
if (header in series.index):
output_dict[header] = convert_series_to_dict(series.xs(header))
return output_dict
It’s a recursive function that checks the number of levels in the index of the
passed in Series. It that number is one, it just calls the to_dict method
on it. Otherwise it traverses through the levels and builds up a nested
dictionary. If you call this function on the first row of the DataFrame:
print(convert_series_to_dict(df.iloc[0]))
you get exactly, what I was trying to get (see above in the Background/Problem section).
As a final touch, to create the list of nested dictionaries that I wanted, the following was added to the script:
list_of_dicts = []
for index, row in df.iterrows():
list_of_dicts.append(convert_series_to_dict(row))
I won’t add the output you would get by printing list_of_dicts. I guess you
can imagine what is looks like.
Changelog
2024-10-23:
- Added missing line in first code listing
Take care,
Andreas
References
- NumFOCUS Inc., “pandas.DataFrame.” [Online]. Available at: https://docs.u-boot.org/en/latest/. [Accessed: 07-Feb-2024].
- Pallets, “Jina,” 2007. [Online]. Available at: https://jinja.palletsprojects.com/en/3.1.x/. [Accessed: 07-Feb-2024].
- NumFOCUS Inc., “pandas.DataFrame.” [Online]. Available at: https://pandas.pydata.org/docs/reference/api/pandas.Series.to_dict.html. [Accessed: 26-Jan-2024].
Leave a comment