Question

reg_replace in PostgreSql with pattern matching

I have below dataset,

1. '{SHEET,2730377,SHEET,5708283,DATA,3015937}'
2. '{SHEET,2730377,SHEET,5708283,DATA,3015937,DATA,0010965}'
3. '{SHEET,5708283,DATA,3015937,DATA,0010965}'
3. '{SHEET,5708283,DATA,3015937}'

I need the result as,

1. {DATA,3015937}
2. {DATA,3015937,DATA,0010965}
3. {DATA,3015937,DATA,0010965}
4. {DATA,3015937}

Basically I wanted to replace SHEET followed by 7 digit number and extract only DATA followed by 7 digit number.

I tried using regexp_replace('{SHEET,2730377,SHEET,5708283,DATA,3015937}', 'SHEET,(\d\d\d\d\d\d\d),', '' ) the result of this is {SHEET,5708283,DATA,3015937}, as well as using similar to [0-9][0-9][0-9][0-9][0-9][0-9][0-9]

Finding difficulties to get the accurate result, any suggestion would be appreciated

 2  47  2
1 Jan 1970

Solution

 0

Instead of REGEXP_REPLACEing what you want to remove, just SELECT what you want to extract as follows (all of the code below is available on the fiddle here):

CREATE TABLE t
(
  str TEXT NOT NULL
);

populate:

INSERT INTO t (str) VALUES
('{SHEET,2730377,SHEET,5708283,DATA,3015937}'),
('{SHEET,2730377,SHEET,5708283,DATA,3015937,DATA,0010965}'),
('{SHEET,5708283,DATA,3015937,DATA,0010965}'),
('{SHEET,5708283,DATA,3015937}');

and then we run the following:

SELECT
  '{' || ARRAY_TO_STRING(REGEXP_MATCHES(str, '(DATA,.*})+$', 'g'), ',') AS result
FROM
  t;

Result:

result
{DATA,3015937}
{DATA,3015937,DATA,0010965}
{DATA,3015937,DATA,0010965}
{DATA,3015937}

Et voilà!

2024-07-21
Vérace