Question

Extract the start and end date of a sick leave in power query (gaps and islands)

I would like to have a max and min date of the different types of absence for every employee for project. I would like to have it in Power BI but I don't know how to do it.

There are 3 columns that this should be grouped: Project ID, Person ID and Time Status.

I thought about grouping also on Month and Year but then if there will be situation that there is 2 sick leaves in the same month it will not work properly. Here is an example of data:

enter image description here

And the outcome should be like that: enter image description here

Thanks.

Here is the sample of the data:

Project ID  Person ID   Time Status     Date
10          1           Sick leave      21.06.2024
10          1           Sick leave      09.07.2024
10          1           Sick leave      10.07.2024
10          1           Sick leave      11.07.2024
10          1           Sick leave      12.07.2024
9           2           Annual leave    12.08.2024
9           2           Annual leave    13.08.2024
9           2           Annual leave    14.08.2024
9           2           Annual leave    15.08.2024
9           2           Annual leave    31.08.2024
 2  91  2
1 Jan 1970

Solution

 3

I added a GetStartAndEndDates function to check dates[i]-dates[i-1] for each projectId, personId and timeStatus.

If dates[i]-dates[i-1] > 1, I assume date[i] is the start of a new "leave".

enter image description here

let
  Source = Excel.CurrentWorkbook(),
  Navigation = Source{[Name = "Table1"]}[Content],
  GetStartAndEndDates = (source as table)=> 
    let 
      sorted = Table.Sort(source, {{"Date", Order.Ascending}}),
      addIndex = Table.AddIndexColumn(sorted, "Index", 0, 1, Int64.Type),
      addDuration = Table.AddColumn(addIndex, "Duration", each if [Index] = 0 then 0 else Duration.Days([Date] - addIndex[Date]{[Index]-1}) ),
      addKey = Table.AddColumn(addDuration, "Key", each if [Index] = 0 then 0 else if [Duration] <= 1 then null else [Index]),
      filled = Table.FillDown(addKey, {"Key"}),
      results = Table.Group(filled, {"Key"}, {{"Start Date", each List.Min([Date]), type nullable date}, {"End Date", each List.Max([Date]), type nullable date}})
    in 
      results,
  grouped = Table.Group(Navigation, {"Project ID", "Person ID", "Time Status"}, {{"All", each _, type nullable table[#"Project ID" = any, #"Person ID" = any, #"Time Status" = any, Date = any]}}),
  startEndDates = Table.AddColumn(grouped, "StartEndDates", each GetStartAndEndDates([All])),
  expanded = Table.ExpandTableColumn(startEndDates, "StartEndDates", {"Start Date", "End Date"}, {"Start Date", "End Date"}),
  removed = Table.RemoveColumns(expanded, {"All"})
in
  removed
2024-07-22
rachel

Solution

 3

If you have a large set of data, the following will execute significantly faster.

It assumes the data is already sorted as you show in your example -- by dates ascending within each Project ID | Person ID. If they are not, then this will need to be sorted which will lengthen the processing time.

It uses an Index column and the 4th and 5th arguments of the Table.Group function to ensure the grouping is by consecutive dates:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{
        {"Project ID", Int64.Type}, {"Person ID", Int64.Type}, {"Time Status", type text}, {"Date", type date}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),


    #"Grouped Rows" = Table.Group(#"Added Index", {"Project ID", "Person ID", "Time Status","Date", "Index"}, {
        {"Start Date", each List.Min([Date]), type date},
        {"End Date", each List.Max([Date]), type date}
        }, GroupKind.Local,(x,y)=>Duration.Days(y[Date]-x[Date])- (y[Index]-x[Index]) ),
        
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Index"})
in
    #"Removed Columns"
2024-07-22
Ron Rosenfeld

Solution

 2

enter image description here

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Project ID", Int64.Type}, {"Person ID", Int64.Type}, {"Time Status", type text}, {"Date", type date}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Sequence", each try if Value.Is( Record.Field( #"Changed Type"{[Project ID = [Project ID], Date =  Date.AddDays( [Date],-1)] }, "Date") , type date) then null else [Date] otherwise [Date]),
    #"Filled Down" = Table.FillDown(#"Added Custom",{"Sequence"}),
    #"Grouped Rows" = Table.Group(#"Filled Down", {"Project ID", "Person ID", "Time Status", "Sequence"}, {{"Start", each List.Min([Date]), type date}, {"End", each List.Max([Date]), type date}}),
    #"Renamed Columns" = Table.RenameColumns(#"Grouped Rows",{{"Sequence", "Date"}}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns",{{"Date", type date}})
in
    #"Changed Type1"
2024-07-22
davidebacci