Question
How can I preserve the previous value to find the row that is greater than it?
This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'start': [3, 11, 9, 19, 22],
'end': [10, 17, 10, 25, 30]
}
)
And expected output is creating column x
:
start end x
0 3 10 10
1 11 17 17
2 9 10 NaN
3 19 25 25
4 22 30 NaN
Logic:
I explain it row by row. For row 0
, x
is df.end.iloc[0]
. Now this value of x
needs to be preserved until a greater value is found in the next rows and in the start
column.
So 10 should be saved then the process moves to row 1
. Is 11 > 10? If yes then x
of second row is 17. For the next row, Is 9 > 17? No so the value is NaN
.
The process moves to next row. Since no values is found that is greater than 17, 17 is preserved. Is 19 > 17? Yes so x
is set to 25. And for the last row since 22 < 25, NaN
is selected.
I have provided additional examples with different df
and the desired outputs:
df = pd.DataFrame({'start': [3, 20, 11, 19, 22],'end': [10, 17, 21, 25, 30]})
start end x
0 3 10 10.0
1 20 17 17.0
2 11 21 NaN
3 19 25 25.0
4 22 30 NaN
df = pd.DataFrame({'start': [3, 9, 11, 19, 22],'end': [10, 17, 21, 25, 30]})
start end x
0 3 10 10.0
1 9 17 NaN
2 11 21 21.0
3 19 25 NaN
4 22 30 30.0
df = pd.DataFrame({'start': [3, 11, 9, 19, 22],'end': [10, 17, 21, 25, 30]})
start end x
0 3 10 10.0
1 11 17 17.0
2 9 21 NaN
3 19 25 25.0
4 22 30 NaN
This gives me the result. Is there a vectroized way to do this?
l = []
for ind, row in df.iterrows():
if ind == 0:
x = row['end']
l.append(x)
continue
if row['start'] > x:
x = row['end']
l.append(x)
else:
l.append(np.NaN)