Question

Finding Product Switches on Account Between Dates

I am working on a problem were we have a table that holds id, product code , accountid , date

CREATE TABLE AccountBalanceAndProduct
(
    [ID] [int] NOT NULL IDENTITY(1, 1),
    [EffectiveDate] [date] NULL,
    [AccountID] [varchar] (20) NULL,
    [Product] [varchar] (20)  
)

INSERT INTO AccountBalanceAndProduct (EffectiveDate, Accountid, Product)
VALUES 
    ('2024-07-10', 'Acc1', 'ABA'),
    ('2024-07-11', 'Acc1', 'ABA'),
    ('2024-07-12', 'Acc1', 'ABB'),
    ('2024-07-12', 'Acc1', 'ABA'),
    ('2024-07-13', 'Acc1', 'ABB'),
    ('2024-07-14', 'Acc1', 'ABA'),
    ('2024-07-15', 'Acc1', 'ABC'),
    ('2024-07-16', 'Acc1', 'ABC'),
    ('2024-07-17', 'Acc1', 'ABA'),
    ('2024-07-10', 'Acc2', 'ABA'),
    ('2024-07-11', 'Acc2', 'ABA'),
    ('2024-07-12', 'Acc2', 'ABB'),
    ('2024-07-13', 'Acc2', 'ABB'),
    ('2024-07-14', 'Acc2', 'ABA'),
    ('2024-07-15', 'Acc2', 'ABC'),
    ('2024-07-16', 'Acc2', 'ABC'),
    ('2024-07-17', 'Acc2', 'ABA')

What I am trying to work out is when an account changes to a new product with To and From dates of the change along with To and From Product codes

I have wrote the below code but it does not take into account if an account changes from one product to another then back again. I am struggling to find something online to see how this can be done.

Records are added daily to this table with the current product and a account and can hold two at the same time briefly while a manual operations process removes the old product- normally this happens before the new product is added.

;WITH ProductSwitch AS
(
    SELECT
        a.id,
        a.Accountid,
        a.Product,
        a.EffectiveDate[DateSwitchedTo],
        ROW_NUMBER() OVER (PARTITION BY AccountID ORDER BY a.id ASC) [RowNumber]
    FROM 
        (SELECT
             MIN(bal.ID) [id],
             bal.AccountId,
             bal.Product,
             MIN(bal.EffectiveDate) [EffectiveDate]
         FROM 
             AccountBalanceAndProduct bal
         WHERE 
             1 = 1
         GROUP BY 
             bal.Product, bal.Accountid) a
)
SELECT 
    a.AccountID,
    a.Product[ProductFrom],
    a.DateSwitchedTo[FromDate],
    b.Product[ProductTo],
    b.DateSwitchedTo[ToDate],
    b.RowNumber - 1 [TotalNumberOfSwitchesAtTimeOfSwitch]
FROM 
    (SELECT 
         asw.AccountID,
         asw.Product,
         asw.DateSwitchedTo,
         asw.RowNumber
     FROM  
         ProductSwitch asw) a
JOIN 
    (SELECT 
         asw.AccountID,
         asw.Product,
         asw.DateSwitchedTo,
         asw.RowNumber
     FROM
         ProductSwitch asw
     WHERE 
         asw.RowNumber > 1) b ON a.RowNumber = b.RowNumber - 1 
                              AND a.AccountID = b.AccountID

I have created a fiddle https://dbfiddle.uk/8Y7ybsKp

This populates the table with dummy data and shows by current results along with expected result set

I hope that makes sense.

Expected results:

CREATE TABLE ExpectedResults
(
    AccountID  [varchar](20) NULL,
    ProductFrom  [varchar](20) NULL,
    FromDate [date] NULL,
    ProductTo [varchar](20) NULL,
    ToDate [date] NULL,
    TotalNumberOfSwitchesAtTimeOfSwitch int null
)

INSERT INTO ExpectedResults (AccountID, ProductFrom, FromDate, 
                             ProductTo, ToDate, TotalNumberOfSwitchesAtTimeOfSwitch)
VALUES 
  ('Acc1',  'ABA'   ,'2024-07-10'   ,'ABB'  ,'2024-07-12',  1),
   ('Acc1', 'ABB'   ,'2024-07-12'   ,'ABA'  ,'2024-07-14',  2),
   ('Acc1', 'ABA'   ,'2024-07-14'   ,'ABB'  ,'2024-07-16',  3),
    ('Acc1',    'ABA'   ,'2024-07-16'   ,'ABC'  ,'2024-07-17',  4),
    ('Acc2',    'ABA'   ,'2024-07-10'   ,'ABB'  ,'2024-07-12',  1),
   ('Acc2', 'ABB'   ,'2024-07-12'   ,'ABA'  ,'2024-07-14',  2),
   ('Acc2', 'ABA'   ,'2024-07-14'   ,'ABB'  ,'2024-07-16',  3),
    ('Acc2',    'ABA'   ,'2024-07-16'   ,'ABC'  ,'2024-07-17',  4)
 3  93  3
1 Jan 1970

Solution

 3

Your source data is ambiguous when you have date-only columns with duplicate dates for the same account ('2024-07-12', 'Acc1', 'ABB') and ('2024-07-12', 'Acc1', 'ABA'). This could be interpreted either as ABA -> ABB -> ABA -> ABB or ABA -> ABA -> ABB -> ABB. SQL Server does not have an implicit row order other than that which is explicitly defined.

If we ignore that ambiguity, your question is a variation of a gaps-and-islands problem. In this case, the gaps are defined as changes to the Product value for a given AccountID across ascending EffectiveDate values. These can be identified using the LAG() fonction to compare product values to identify changes. For most gaps-and-islands problems, we would then group and combine the rows forming each island between the gaps, but in this case, we can just ignore the rows where Product does not change.

It then becomes a matter of using the LEAD() function to look ahead at the remaining rows to fill in the ProductTo and ToDate values, and a windowed COUNT(*) to get the desired TotalNumberOfSwitchesAtTimeOfSwitch values.

The resulting query would be something like:

WITH Changes AS (
    SELECT *
    FROM (
        SELECT
            *,
            LAG(Product)
                OVER(PARTITION BY AccountID ORDER BY EffectiveDate)
                AS LagProduct
        FROM AccountBalanceAndProduct
    ) A
    WHERE LagProduct IS NULL OR LagProduct <> Product
)
SELECT
    AccountID,
    Product AS ProductFrom,
    EffectiveDate AS FromDate,
    LEAD(Product)
        OVER(PARTITION BY AccountID ORDER BY EffectiveDate)
        AS ProductTo,
    LEAD(EffectiveDate)
        OVER(PARTITION BY AccountID ORDER BY EffectiveDate)
        AS ToDate,
    COUNT(*)
        OVER(PARTITION BY AccountID ORDER BY EffectiveDate
             ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        AS TotalNumberOfSwitchesAtTimeOfSwitch
FROM Changes
ORDER BY AccountID, EffectiveDate

Results (after excluding the ambiguous 2024-07-12 row):

AccountID ProductFrom FromDate ProductTo ToDate TotalNumberOfSwitches
AtTimeOfSwitch
Acc1 ABA 2024-07-10 ABB 2024-07-12 1
Acc1 ABB 2024-07-12 ABA 2024-07-14 2
Acc1 ABA 2024-07-14 ABC 2024-07-15 3
Acc1 ABC 2024-07-15 ABA 2024-07-17 4
Acc1 ABA 2024-07-17 null null 5
Acc2 ABA 2024-07-10 ABB 2024-07-12 1
Acc2 ABB 2024-07-12 ABA 2024-07-14 2
Acc2 ABA 2024-07-14 ABC 2024-07-15 3
Acc2 ABC 2024-07-15 ABA 2024-07-17 4
Acc2 ABA 2024-07-17 null null 5

Note that the above also includes the final change with starting but no ending values. It also has several other differences from your posted expected results, but I believe the above accurately reflects the input data. Please review your expected results. If the above is not what you are looking for, please expand on your description of your intended logic.

See this db<>fiddle for a demo.

2024-07-19
T N

Solution

 0

Only a partial answer, but it should point you in the right direction.

What I would do is use LAG() instead of Row_Number() to define the window, so that I can filter for rows where the function result does not match the current value:

WITH PriorProducts As
(
    SELECT ID, EffectiveDate, AccountID, Product,
        LAG(Product) OVER (PARTITION BY AccountID ORDER BY EffectiveDate, ID) ProductFrom,
        LAG(EffectiveDate) OVER (PARTITION BY AccountID ORDER BY EffectiveDate, ID) PriorProductEndDate
    FROM AccountBalanceAndProduct
)
SELECT ID, AccountID
    , Product ProductTo
    , EffectiveDate DateStarted
    , ProductFrom
    , PriorProductEndDate
FROM PriorProducts
WHERE ProductFrom <> Product
    AND ProductFrom IS NOT NULL

This isn't a perfect set of what you want (it doesn't handle the full dates of set... rather it shows rows for the boundaries). But it should get you there a little easier.

Otherwise, what you're doing is called gaps and islands, and there's lots of help online for ways to do this.

2024-07-18
Joel Coehoorn

Solution

 0

This is near to approach, shown by @TN.

I would like to consider in more detail the case when there are two rows on the same date. In this case, let's try to minimize the number of transitions from product to product. If one of the lines of this day matches the value in the previous day, put this line first in the list. See rows with id 3 and 4.

ID AccountID EffectiveDate Product rn rnk intraNum newGR grn prevProduct prevEffectiveDate
1 Acc1 2024-07-10 ABA 1 1 0 0 0 ABA 2024-07-10
2 Acc1 2024-07-11 ABA 2 2 0 0 0 ABA 2024-07-10
4 Acc1 2024-07-12 ABA 4 3 0 0 0 ABA 2024-07-11
3 Acc1 2024-07-12 ABB 3 3 1 1 1 ABA 2024-07-12
5 Acc1 2024-07-13 ABB 5 5 1 0 1 ABB 2024-07-12

For rows (4,Acc1,2024-07-12,ABA) and (3,Acc1,2024-07-12,ABB) we take first row (4,Acc1,2024-07-12,ABA), because previous row is with product ABA

See example

with intraRange as(
  select * 
    ,lag(Product,1+rn-rnk,Product)over(partition by AccountId order by EffectiveDate,Id) prevProduct0
    ,case when lag(Product,1+rn-rnk,Product)
          over(partition by AccountId order by EffectiveDate,Id) =Product 
     then 0 
     else 1 
     end intraNum
  from (
    select ID,AccountID, EffectiveDate, Product
      ,row_number()over(partition by AccountId order by EffectiveDate,Id)rn
      ,rank()over(partition by AccountId order by EffectiveDate)rnk
    from AccountBalanceAndProduct
   )t
)
,tgaps as (
select * 
    ,case when lag(Product,1,Product)
          over(partition by AccountId order by EffectiveDate,intraNum) =Product 
     then 0 
     else 1 
     end newGR
from intraRange
)
,tgr as(
select ID, AccountID, EffectiveDate,intraNum,Product
  ,sum(newgr)over(partition by AccountId order by EffectiveDate,intraNum)+1 grn
  ,rn,rnk,newGR
from tgaps
)
,tgr_pn as(
select AccountId,grn,min(Product) Product
  ,min(EffectiveDate) minDate,max(EffectiveDate) maxDate
  ,min(rn)rnMin,max(rn)rnMax
  ,lead(min(EffectiveDate),1,max(EffectiveDate))over(partition by AccountId order by grn)nextDate
  ,lead(min(Product),1,min(Product))over(partition by AccountId order by grn)nextProduct
from tgr
group by AccountId,grn
)
select AccountId,grn
  ,Product FromProduct, minDate FromDate
  ,nextProduct ToProduct,nextDate ToDate
nextProduct
from tgr_pn
order by accountId,grn;

Fiddle

2024-07-19
ValNik