Question

Haskell list sum generalization

I was working in Python and created an example of how we can generalize the usual sumList function to sum any arbitrary list of lists:

  1. To sum any list of numbers in Python we can write:
def sumList(xs): 
  if xs==[]:  
    return 0 
  else:  
    return xs[0]+sumList(xs[1:])

And in Haskell:

sumList [] = 0 
sumList (x:xs) = x + sumList xs
  1. but to sum an arbitrary nested list, in Python we have:
def sumList2(xs): 
  if xs==[]:  
    return 0 
  elif isinstance(xs,int):  
    return xs 
  else:  
    return sumList2(xs[0])+sumList2(xs[1:])

But the following Haskell code throws a compiler error in the type inference:

sumList [] = 0 
sumList (x:xs) = sumList x + sumList xs 
sumList x = x
<interactive>:2:38: error:
• Couldn't match expected type ‘t’ with actual type ‘[t]’
‘t’ is a rigid type variable bound by
the inferred type of sumList :: t -> [t]
at <interactive>:(1,1)-(3,13)
• In the first argument of ‘sumList’, namely ‘xs’
In the second argument of ‘(+)’, namely ‘sumList xs’
In the expression: sumList x + sumList xs
• Relevant bindings include
xs :: [t] (bound at <interactive>:2:12)
x :: t (bound at <interactive>:2:10)
sumList :: t -> [t] (bound at <interactive>:1:1)

I am using IHaskell, for what I understand, when using sumlist [] = 0 as pattern matching, Haskell inferrs that sumList :: [a] -> number, but then when I pass sumList x in the next line, x is the head of the list, so sumList :: a -> number, so it gets an absurd. What I figured is that I need to match the case where the input is a list and do the 2 first lines, and then the case where the input is an integer and do the last line, but I cannot figure out how to do it.

I am trying to create a sum type of Either Int [a], but I don't know if that is the better answer, because I want to let the compiler figure out everything related to types (ultimately I will write the type definition, but my initial thesis is that contrary to popular belief, it is possible to do quick experimentation in Haskell, and so that means to not be concerned with types in the first stage of programing). Note the shortcomings of the for loop, it would be very difficult or impossible to generalize the sum of the list using for loops, because of the linearity of the loop, I think it is a very compelling argument for the scaling of machine learning software.

 2  75  2
1 Jan 1970

Solution

 5

Int, [Int], [[Int]], and so on are different types, and the type of a function is chosen at the moment it's called. If we expect our function to return an Int, and we have the two clauses

sumList (x:xs) = {- does not matter what goes here -}
sumList x = x

then we're guaranteed that no matter which input type we choose, we have a type error. If we choose Int, then the first clause is an error because x:xs can only match lists; if we choose any list type, then the second clause is an error because it does not return an Int.

However, not all is lost. One thing we can do is split these two clauses into separate functions that happen to share a name using type classes.

class Sums a where sums :: a -> Int
instance Sums Int where sums x = x
instance Sums a => Sums [a] where
    sums [] = 0
    sums (x:xs) = sums x + sums xs

Now when we call sums, we are never calling a function that has both clauses; instead, we are asking the compiler to call one or the other of these functions depending on which concrete type was chosen. The line

    sums (x:xs) = sums x + sums xs

may look suspicious, since we know that x and xs do not have the same type. But we are rescued by the fact that these are not calling the same sums function! Again, the type of a function is chosen at the moment it's called, and here there are two different calls that may choose two different types, at which point the compiler will choose different functions to call.

The other option we have is to make a new type. The trouble with Int, [Int], [[Int]], and so on being different types is that this means we must statically know how deeply nested a list is. If we don't know that, then we need a type that can express the fact that its nesting depth is only known dynamically, and which tracks the data it needs to know to determine how deep that nesting is. One way to do so would be:

data DynList a = NotNested a | Nested [DynList a]

sumDynList :: DynList Int -> Int
sumDynList (NotNested x) = x
sumDynList (Nested []) = 0
sumDynList (Nested (x:xs)) = sumDynList x + sumDynList (Nested xs)

I've written this to match as closely as possible your original code, but in reality I'd probably write the nested case this way:

sumDynList (NotNested x) = x
sumDynList (Nested xs) = sum (map sumDynList xs)

This particular type allows different amounts of nesting at different points in the data structure. If the intent is instead that there is a single nesting depth, and all list elements have that same depth, then we need a slightly more advanced technique.

data SharedDepthList a = Depth0 a | Deeper (SharedDepthList [a])

sumSharedDepthList :: SharedDepthList Int -> Int
sumSharedDepthList = go id where
    go :: (a -> Int) -> SharedDepthList a -> Int
    go f (Depth0 x) = f x
    go f (Deeper xs) = go (sum . map f) xs

I think mostly I do not recommend this. Data structures with this much precision about the allowed possible values get quite unwieldy in practice.

2024-07-24
Daniel Wagner

Solution

 3

An arbitrarily nested list is tree.

data Tree a = Empty | Leaf a | Branch (Tree a) (Tree a)

So, your sumList2 function in Python would become a sumTree function in Haskell.

sumTree :: Tree Int -> Int
sumTree Empty = 0
sumTree (Leaf n) = n
sumTree (Branch l r) = sumTree l + sumTree r

To understand why an arbitrarily nested list is a tree, let's look at some visualizations. Consider the following list in Python.

primes = [2, 3, 5, 7]

In Haskell, we can visualize this list as follows.

 (:)
 / \
2  (:)
   / \
  3  (:)
     / \
    5  (:)
       / \
      7   []

The inner nodes are the cons constructors, i.e. (:). The left hand side of the cons is the head and the right hand side is the tail. The left children of the cons nodes are the values of the list. The rightmost node is the empty list.

As you can see, a list a just a very unbalanced tree.

Now, consider the following arbitrarily nested list in Python.

primes = [[2, [[], 3]], [5, [7, []]]]

In Haskell, we can visualize this list as follows.

     (:)
     / \
    /   \
   /     \
 (:)     (:)
 / \     / \
2  (:)  5  (:)
   / \     / \
 []   3   7   []

As you can see, it looks like a tree. The leaves are either empty or they contain some data. Lists in Haskell can't represent any tree-like structure. Hence, we create a Tree data type. Hence, let's update the visualization.

               Branch
                / \
               /   \
              /     \
             /       \
            /         \
           /           \
     Branch             Branch
      / \                / \
Leaf 2   Branch    Leaf 5   Branch
          / \                / \
     Empty   Leaf 3    Leaf 7   Empty

And, that's why an arbitrarily nested list in Python becomes a tree in Haskell.

2024-07-24
Aadit M Shah

Solution

 2

Note that the native sum library function in Haskell is polymorphic:

$ ghci
GHCi, version 9.4.5: https://www.haskell.org/ghc/  :? for help
ghci> 
ghci> :type  sum
sum :: (Foldable t, Num a) => t a -> a
ghci> 

We need to find some type transformer that:

  1. can represent “generalized lists”
  2. has some Foldable instance.

Fortunately, the Haskell library provides such a thing, in the guise of free monads.

Essentially we have the following type definition:

data Free ft a = Pure a |  Impure (ft (Free ft a))

which is of course quite similar to the one provided in Daniel's excellent answer. It turns out that if ft is a functor, then Free ft is a monad, known as the “free” monad.

For example, the proper rendering of [42,[5,10]] in such a context is:

Impure [ Pure 42, Impure [ Pure 5, Pure 10 ] ]

Let's try, using [] as our functor:

ghci> 
ghci> import Control.Monad.Free
ghci> type GL = Free []
ghci> 
ghci> xss = Impure [ Pure 42, Impure [ Pure 5, Pure 10 ] ]  ::  GL Int
ghci> 
ghci> :instances GL
...
instance [safe] Foldable (Free [])
  -- Defined in ‘Control.Monad.Free’
...
ghci> 
ghci> :type xss
xss :: GL Int
ghci> 

Thus the library does provide us with a Foldable instance, so we are almost done here.

ghci> 
ghci> foldl (+) 0 xss
57
ghci> sum xss
57
ghci> 

Voilà !

2024-07-24
jpmarinier