Question
Extracting specific substrings from a text field in SQL
Consider this snippet code to regenerate the result set:
CREATE SCHEMA "clean";
CREATE TABLE "clean"."InvoiceFact" (
"InvoiceTitle" TEXT
);
INSERT INTO "clean"."InvoiceFact" ("InvoiceTitle") VALUES
('Renewal of advanced payment gateway services (12/22/2023 to 01/20/2024) -limoome1| Lemon'),
('The difference in payment and settlement services (05/15/2024 to 07/15/2024) -fxbotshop | Forex bot shop'),
('Advanced payment gateway services (01/15/2024 to 04/15/2024)-Khan.academy|Khan Academy'),
('The difference in payment and settlement portal services (10/24/2023 to 12/21/2023) -Frenchattractivemen2 | Jack J. Merc'),
('Advanced payment gateway services (09/04/2023 to 12/04/2023) -shakibaazemati'),
('The difference in advanced payment gateway services (04/21/2023 to 07/21/2023) - rFdbb | Toure Du France'),
('Renewal of advanced settlement services (05/25/2024 to 06/25/2024) + difference payment until 05/24 + Yalux hotel extension until 04/07-Ariaz|Ariaz Ads'),
('Renewal of advanced payment gateway services (11/23/2023 to 12/22/2023) - Broca'),
('Renewal of payment and settlement portal services (05/10/2024 to 06/09/2024) - inria2 | Educational Research Collection of Basic Education'),
('Renewal of payment portal services (03/22/2024 to 04/20/2024) - YooEnglishtube | Wow English Tube (settlement 0.2%)'),
('Renewal of advanced payment and settlement portal services (from 10/22/2023 to 01/20/2024)-googleads|Google advertising agency (first payment)'),
('Renewal of advanced payment gateway services (from 01/21/2024 to 04/21/2024) - accsell2| Axel store (first payment)'),
('Renewal +0.2% advanced payment and settlement gateway services (12/20/2023 to 03/20/2024) Yasamankumari | Dr. Kumarii training group (second payment)'),
('Advanced payment and settlement portal services (04/15/2023 to 08/13/2023) - YEnglishtube | Wow English Tube (settlement 0.2%)'),
('Renewal +0.2% advanced payment and settlement gateway services - (09/20/2023 to 12/20/2023) -Yasamankumari |Dr. Kumarii''s educational group'),
('Renewal of advanced payment gateway services (07/06/2024 to 10/06/2024) - Tesmino'),
('Renewal +0.2% advanced payment and settlement gateway services - (09/20/2023 to 12/20/2023) -Yasamankumari | Dr. Kumarii''s educational group'),
('settlement0.2% payment gateway services and advanced settlement (from 05/21/2024 to 06/20/2024) - komolife | Kumolife'),
('Advanced payment gateway services (09/13/2022 to 03/20/2024) -awajr9 | Ava Vincent Jr'),
('Advanced payment gateway services (07/15/2024 to 08/14/2024) - Cityfortnite'),
('Advanced payment gateway services (01/14/2024 to 04/14/2024) -Gifkart|Gifkart');
SELECT
"InvoiceTitle",
CASE
WHEN "InvoiceTitle" ILIKE '%difference%' OR
"InvoiceTitle" ILIKE '%first payment%' OR
"InvoiceTitle" ILIKE '%second payment%' THEN 'Installments'
WHEN "InvoiceTitle" ILIKE '%Renewal%' THEN 'Renewal'
ELSE 'New'
END AS ServiceType,
SUBSTRING("InvoiceTitle" FROM '\((.*?) to') AS StartDate,
SUBSTRING("InvoiceTitle" FROM 'to (.*?)\)') AS EndDate,
SUBSTRING("InvoiceTitle" FROM '-(.*?)(\||$)') AS UserName,
SUBSTRING("InvoiceTitle" FROM '\|(.*?)$') AS MerchantName
FROM "clean"."InvoiceFact"
This table has a text field named InvoiceTitle
which contains various pieces of information delimited by special characters. Here's an example of the data:
'Renewal +0.2% advanced payment and settlement gateway services - (09/20/2023 to 12/20/2023) -Yasamankumari |Dr. Kumarii's educational group'
Renewal +0.2% advanced payment and settlement gateway services (12/20/2023 to 03/20/2024) Yasamankumari | Dr. Kumarii training group (second payment)
I'm trying to extract specific substrings from this field, namely the service type, start date, end date, username, and merchant name.
However, I'm facing an issue with the extraction of the username. For records like the example above, the username is extracted as (09/20/2023 to 12/20/2023) -Yasamankumari
, but I want it to be Yasamankumari
.
How can I modify my query to correctly extract the username, as well as MerchantName dose not contains extra info with parenthesis?
Additionally, an invoice may be paid in multiple installments, not just one or two. So, I need to generalize the '%second payment%' condition to support any number of installments.
How can I modify my query to handle any number of installments?
EDIT
Additionally, need to consider such cases:
Advanced payment gateway services - (06/19/1402 to 09/19/1402) -komoliVe | Kumolife
Fixed fee of Square advanced settlement service (4 months) - (KarmaP) - second installment
Advanced payment and settlement portal services - (06/28/1402 to 09/28/1402) - TaraRz |TaraRz