Question
How to format the dataframe into a 2D table
I have following issue with formatting a pandas dataframe into a 2D format. My data is:
+----+------+-----------+---------+
| | Jobs | Measure | Value |
|----+------+-----------+---------|
| 0 | Job1 | Temp | 43 |
| 1 | Job1 | Humidity | 65 |
| 2 | Job2 | Temp | 48 |
| 3 | Job2 | TempS | 97.4 |
| 4 | Job2 | Humidity | nan |
| 5 | Job3 | Humidity | 55 |
| 6 | Job1 | Temp | 41 |
| 7 | Job1 | Duration | 23 |
| 8 | Job3 | Temp | 39 |
| 9 | Job1 | Temp | nan |
| 10 | Job1 | Humidity | 55 |
| 11 | Job2 | Temp | 48 |
| 12 | Job2 | TempS | 97.4 |
| 13 | Job2 | Humidity | nan |
| 14 | Job3 | Humidity | 55 |
| 15 | Job1 | Temp | nan |
| 16 | Job1 | Duration | 25 |
| 17 | Job3 | Temp | nan |
| 18 | Job2 | Humidity | 61 |
+----+------+-----------+---------+
and my code for now is:
from tabulate import tabulate
import pandas as pd
df = pd.read_csv('logs.csv')
#print(df)
print(tabulate(df, headers='keys', tablefmt='psql'))
grouped = df.groupby(['Jobs','Measure'], dropna=True)
average_temp = grouped.mean()
errors = df.groupby(['Jobs','Measure']).agg(lambda x: x.isna().sum())
frames = [average_temp, errors]
df_merged = pd.concat(frames, axis=1).set_axis(['Avg', 'Error'], axis='columns')
print(df_merged)
and the output of the print is:
Table-1
Avg Error
Jobs Measure
Job1 Duration 24.0 0
Humidity 60.0 0
Temp 42.0 2
Job3 Humidity 55.0 0
Temp 39.0 1
Job2 Humidity 61.0 2
TempS 97.4 0
Temp 48.0 0
How can I format this table into something like this:
Table-2
Jobs Avg.Temp Err.Temp Avg.Humidity Err.Humidity Avg.Duration ...
Job1 42.0 2 60.0 0 24.0
Job2 48.0 0 61.0 0 -
Job3 39.0 1 55.0 1 -
So, what we see is that for example, Avg.Temp for Job1 in Table-2 is the Avg. value of Job1->Temp in Table-1. Another thing is that not all Jobs need to have the same measure fields and can also differ like for Job2 we have 'TempS'.
Update: using the answer from user24714682 the table looks like this.
+--------------+----------------+----------------+--------------+------------+------------------+------------------+----------------+--------------+
| Jobs | Avg.Duration | Avg.Humidity | Avg.S.Temp | Avg.Temp | Error.Duration | Error.Humidity | Error.S.Temp | Error.Temp |
|--------------+----------------+----------------+--------------+------------+------------------+------------------+----------------+--------------|
| Job1 | 24 | 60 | nan | 42 | 0 | 0 | nan | 2 |
| Job3 | nan | 55 | nan | 39 | nan | 0 | nan | 1 |
| Job2 | nan | 61 | 97.4 | 48 | 1 | 2 | 0 | 0 |
+--------------+----------------+----------------+--------------+------------+------------------+------------------+----------------+--------------+
How can I now sort the columns in that way to first show the Measure that has the highest total Error count first and the descending to the rest of total Error counts. example:
+--------------+------------+--------------+----------------+------------------+...
| Jobs | Avg.Temp | Error.Temp | Avg.Humidity | Error.Humidity |
|--------------+------------+--------------|----------------+------------------+...
| Job1 | 42 | 2 | 60 | 0 |
| Job3 | 39 | 1 | 55 | 0 |
| Job2 | 48 | 0 | 61 | 2 |
+--------------+------------+--------------+----------------+------------------+...
In the above table the columns are sorted 1st Avg.Temp bacause it is the sensor with highest total error count of 3 and then it shows Avg.Humidity because it has the 2nd highest total error count and so on.