Question

Sorting data algorithm 900000 rows of data

I have four columns AA, AB, AC, and AD with values that I want to sort after a certain pattern:

Column AA Column AB Column AC Column AD Column AE
Row 1 123 444 6666
Row 2 A tz s4 23
Row 3 1111 56 hh 23
Row 4 D 56 F 4
Row 5 56 F A
Row 6 456 55 3333 23
Row 7 A 333 A56 55555
Row 8 1 555 VBC A
Row 9 A 5899 B6 23
Row 10 2 TZU 98 56
  1. All purely numerical values must be moved to the front before all other alphanumerical values.

  2. Empty fields must be put to the end.

  3. The numerical value with the largest number of characters must be placed in the first column. For example, the value "55555" in Row 7 and Column AD must be after execution of the macro in Row 7 and Column AA.

The final result of the example above should look like this:

Column AA Column AB Column AC Column AD Column AE
Row 1 6666 123 444
Row 2 23 A tz s4
Row 3 1111 56 23 hh
Row 4 56 4 D F
Row 5 56 F A
Row 6 3333 456 55 23
Row 7 55555 333 A A56
Row 8 555 1 VBC A
Row 9 5899 23 A B6
Row 10 98 56 2 TZU

My solution is below. The problem is that I couldn't realize point 2, moving the empty field to the end. Also, I'm using loops, and I want to transform 900000 (!) rows of data. Running the macro like this takes days...any other solution is appreciated. Thank you.

Option Explicit

Sub resort()

    Dim i As Long
    Dim j As Long
    Dim temp As Range
    
    With Worksheets("Tabelle1")
    
    For j = 1 To 10
    
        For i = 2 To 15
        
            If IsNumeric(.Range("AA" & i)) = False And IsNumeric(.Range("AB" & i)) = True Then
            
                .Range("AB" & i).Copy Destination:=.Range("AE" & i)
                .Range("AA" & i).Copy Destination:=.Range("AB" & i)
                .Range("AE" & i).Copy Destination:=.Range("AA" & i)
                .Range("AE" & i).Clear
                            
            End If
        
        Next i
        
        For i = 2 To 15
        
            If IsNumeric(.Range("AB" & i)) = False And IsNumeric(.Range("AC" & i)) = True Then
            
                .Range("AC" & i).Copy Destination:=.Range("AE" & i)
                .Range("AB" & i).Copy Destination:=.Range("AC" & i)
                .Range("AE" & i).Copy Destination:=.Range("AB" & i)
                .Range("AE" & i).Clear
                            
            End If
        
        Next i
        
        For i = 2 To 15
        
            If IsNumeric(.Range("AC" & i)) = False And IsNumeric(.Range("AD" & i)) = True Then
            
                .Range("AD" & i).Copy Destination:=.Range("AE" & i)
                .Range("AC" & i).Copy Destination:=.Range("AD" & i)
                .Range("AE" & i).Copy Destination:=.Range("AC" & i)
                .Range("AE" & i).Clear
                            
            End If
        
        Next i
        
     Next j
        
        
     '++++++++++++++++++++++++++++++++++++++++++++++++++++'
        
     For j = 1 To 10
        
        For i = 2 To 15
        
            If IsNumeric(.Range("AB" & i)) = True Then
            
                If Len(.Range("AB" & i)) > Len(.Range("AA" & i)) Then
            
                    .Range("AB" & i).Copy Destination:=.Range("AE" & i)
                    .Range("AA" & i).Copy Destination:=.Range("AB" & i)
                    .Range("AE" & i).Copy Destination:=.Range("AA" & i)
                    .Range("AE" & i).Clear
                    
                End If
                            
            End If
        
        Next i
    
        For i = 2 To 15
        
            If IsNumeric(.Range("AC" & i)) = True Then
            
                If Len(.Range("AC" & i)) > Len(.Range("AB" & i)) Then
                
                    .Range("AC" & i).Copy Destination:=.Range("AE" & i)
                    .Range("AB" & i).Copy Destination:=.Range("AC" & i)
                    .Range("AE" & i).Copy Destination:=.Range("AB" & i)
                    .Range("AE" & i).Clear
                                
                End If
                
            End If
        
        Next i
      
        For i = 2 To 15

            If IsNumeric(.Range("AD" & i)) = True Then

                If Len(.Range("AD" & i)) > Len(.Range("AC" & i)) Then

                    .Range("AD" & i).Copy Destination:=.Range("AE" & i)
                    .Range("AC" & i).Copy Destination:=.Range("AD" & i)
                    .Range("AE" & i).Copy Destination:=.Range("AC" & i)
                    .Range("AE" & i).Clear

                End If

            End If

        Next i
    
    Next j
        
    End With

End Sub
 3  84  3
1 Jan 1970

Solution

 6

You have 3 tasks to solve:

a) You have a huge amount of data, so you need a fast routine.
This is easy: Read all your data into memory on one go (into a 2-dimensional array). Work on that array. When everything is sorted, write the data back into Excel in one go.

Sub sortMydata()
    With Worksheets("Tabelle1")
        Dim rowcount As Long
        rowcount = .Range("AA1").CurrentRegion.Rows.Count
        ' Read Excel data into 2-dimensional array
        Dim data
        data = .Range("AA1").Resize(rowcount, 4)
        ' Sort all rows
        For row = 2 To rowcount
            sortrow data, row
        Next
        ' Write sorted data back into sheet    
        .Range("AA1").Resize(rowcount, 4) = data
    End With
End Sub

b) You need to sort your data (row by row). For this we need a sorting algorithm. As we always sort only very few values (4 per row), a simple bubble sort is the best option. There are tons of implementations that can be found on the internet. The only thing we need to know is that we want to sort values of one row while most algorithms assume you want to sort data by one (or several) columns.

Sub sortrow(data, row As Long)
    Dim i As Long, j As Long
    ' A simple Bubble Sort to sort the values of one Row
    For i = LBound(data, 2) To UBound(data, 2) - 1
        For j = i To UBound(data, 2)
            If sortBefore(data(row, j), data(row, i)) Then
                Dim tmp As Variant
                tmp = data(row, i)
                data(row, i) = data(row, j)
                data(row, j) = tmp
            End If
        Next
    Next
End Sub

c) You need an algorithm that compares 2 values to decide which one will come "first". The sorting algorithm will use that to sort your data.

Function sortBefore(v1 As Variant, v2 As Variant) As Boolean
    If v1 = "" Then
        sortBefore = False            ' Blanks to the end
    ElseIf IsNumeric(v1) Then
        If IsNumeric(v2) Then
            sortBefore = v1 > v2      ' Compare Numeric values: Larger first
        Else
            sortBefore = True         ' Number before string
        End If
    Else
        If IsNumeric(v2) Then
            sortBefore = False        ' String after numeric
        Else
            sortBefore = LCase(v1) < LCase(v2)      ' Compare string values: Smaller first
        End If
    End If
End Function

It took 1 or 2 seconds to run it with 100.000 rows of data.

2024-07-24
FunThomas

Solution

 2

Other approach could be using Power Query, also this approach should work quickly on big data.

The code to use:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Column AA"}, "column name", "Value"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"column name"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "sort rank", each try Text.Length(Number.ToText([Value])) otherwise 0),
    #"Grouped Rows" = Table.Group(#"Added Custom", {"Column AA"}, {{"row", each _}}),
    sorted = Table.TransformColumns(#"Grouped Rows", {"row", each Table.Sort(_, {"sort rank", Order.Descending})}),
    index_added = Table.TransformColumns(sorted, {"row", each Table.AddIndexColumn(_, "index")}),
    #"Expanded row" = Table.ExpandTableColumn(index_added, "row", {"Value", "index"}, {"Value", "index"}),
    #"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Expanded row", {{"index", type text}}, "en-GB"), List.Distinct(Table.TransformColumnTypes(#"Expanded row", {{"index", type text}}, "en-GB")[index]), "index", "Value")
in
    #"Pivoted Column"

Output: (left: sorted, right: original)

enter image description here

2024-07-24
M&#225;t&#233; Juh&#225;sz

Solution

 0

I used the ArrayList but I am not sure if this is applicable for you.

Option Explicit

Sub SortIt()
    Dim list As Object, listA As Object
    Set list = CreateObject("System.Collections.ArrayList")
    Set listA = CreateObject("System.Collections.ArrayList")

    'Set list = New ArrayList
    'Set listA = New ArrayList

    Dim v As Variant
    Dim i As Long
    Dim sngRow As Range
    Dim wks As Worksheet
    Set wks = ActiveSheet
    
    For Each sngRow In wks.Range("A1:D10").Rows
        v = sngRow.Value
        For i = LBound(v, 2) To UBound(v, 2)
            If Len(v(1, i)) > 0 Then
                If IsNumeric(v(1, i)) Then
                    list.Add v(1, i)
                Else
                    listA.Add v(1, i)
                End If
            End If
        Next
        list.Sort
        list.Reverse
        Debug.Print list.Count, listA.Count
    
        ' write to sheet to the right in column 6
        
        v = list.toarray
        With wks
            .Range(.Cells(sngRow.Row, 6), .Cells(sngRow.Row, 6 + list.Count - 1)) = v
            v = listA.toarray
            .Range(.Cells(sngRow.Row, 6 + list.Count), .Cells(sngRow.Row, 6 + list.Count + listA.Count - 1)) = v
        End With
        list.Clear
        listA.Clear
    Next sngRow
End Sub

As there are still a lot of worksheet accesses it is probably not that fast. But that can be fixed and one could still continue using the approach with the Array List.

Changed it in such a way that the number of worksheet accesses is minimized

Option Explicit

Sub sortMemory()
    Dim list As Object, listA As Object
    Set list = CreateObject("System.Collections.ArrayList")
    Set listA = CreateObject("System.Collections.ArrayList")

    'Set list = New ArrayList
    'Set listA = New ArrayList

    Dim v As Variant
    Dim vDat As Variant
    Dim i As Long, j As Long, k As Long, listCount As Long
    
    Dim wks As Worksheet
    Set wks = ActiveSheet
        
    ' Data in Region of A1
    vDat = wks.Range("A1").CurrentRegion.Value
    
    For j = LBound(vDat, 1) To UBound(vDat, 1)
        For i = LBound(vDat, 2) To UBound(vDat, 2)
            If Len(vDat(j, i)) > 0 Then
                If IsNumeric(vDat(j, i)) Then
                    list.Add vDat(j, i)
                Else
                    listA.Add vDat(j, i)
                End If
            End If
        Next
        list.Sort
        list.Reverse
            
        ' Change array
        v = list.toarray
        For k = LBound(v) To UBound(v)
            vDat(j, k + 1) = v(k)
        Next k
        
        v = listA.toarray
        listCount = list.Count
        
        For k = 1 + listCount To UBound(vDat, 2)
            vDat(j, k) = ""
        Next
        If UBound(v) >= 0 Then
            For k = LBound(v) To UBound(v)
                vDat(j, k + 1 + listCount) = v(k)
            Next k
        End If
        
        list.Clear
        listA.Clear
    Next
    
    ' Output starting in Column 6, adjust accordingly
    With wks
        .Range(.Cells(1, 6), .Cells(UBound(vDat, 1), 6 + UBound(vDat, 2) - 1)) = vDat
    End With

End Sub
2024-07-24
Storax