Tuesday, September 06, 2016

Making Sense Of Symbols In Scala's Collections API

Scala comes under a lot of criticism for the leeway it gives to the use of symbols. A criticism I can understand, since if abused, can easily lead to codebases that might be difficult to read at first glance.

But even with that, I still think the criticism is aggrandized, because when it is all said and done, symbols are still very valid and capable tools for capturing and expressing ideas. Especially when they can do so in very crisp and mathematically fashion without introducing the overhead/need for verbal processing.

"But it is almost impossible to google these symbols"! The complaint goes...

True, but when you think about it, this isn't an inadequacy of Scala, but the inadequacy with most search engines and their inability to deal with symbols in search queries.

To deal with this drawback, I have learnt to use http://symbolhound.com/ whenever I hap upon a symbol in Scala which I want to search. Symbolhound.com can deal with special symbols and does not escape them like Google et al. does.


Also, I have come to discover that using the english translations of these symbols in search queries yield surprisingly good results more often than not. For example "underscore plus underscore scala" for searching about "_ + _" or "triple colon scala" for searching about ":::"

Another thing that helps, is to figure out the pattern in a symbol and how it relates to the concept it is supposed to represent. Put in another way, find a heuristic for deciphering what a symbol represents.

This can be done for most symbols I have encountered within Scala and this is exactly what this post seeks to do: provide such a simple heuristic for making sense of frequently encountered symbols (especially the ones that deals with joining collections) when working with Scala’s collections.

Ever found it difficult to remember what ++, ++:, +:, /:,::: etc are meant to be used for and how they are different? Then you will find this post helpful.

To help with making sense of these multitude of symbols, let us approach the task at hand by trying to establish some certain rules:

++ is for concatenating two collections

++, double plus, is used for concatenating two collections:

leftOperand ++ rightOperand

When used, both operands needs to be a collection. For example:

scala> List(1,2,3) ++ List(4,5,6)
res22: List[Int] = List(1, 2, 3, 4, 5, 6)

scala> Set(1,2,3) ++ Set(4,5,6)
res23: scala.collection.immutable.Set[Int] = Set(5, 1, 6, 2, 3, 4)

The result of the ++ operator is always a collection. If the two operands are collections, but of different types, say a Set and a List, the collection on the left side determines the type of the resulting collection (since when used this way, the ++ is defined on the left operand - more on this when I touch on infix notation later in the post)

If the second operand is not a collection, you get a compile error:

 scala> List(1,2,3) ++ 4  
 <console>:11: error: type mismatch;  
  found  : Int(4)  
  required: scala.collection.GenTraversableOnce[?]  
     List(1,2,3) ++ 4  
            ^  

It is not difficult then, to see (and remember) that a "double plus" translates to an operation that involves two (double) collections.

+ is for adding a single element

Whenever + is encountered, this should be interpreted to mean an operation with a collection and a single element or a tuple of elements. (Differs from ++ which deals with two collections).

The + is only defined on Sets.

The other symbol we will encounter which is similar to + is :+ and +: (i.e. the plus signs proceeded and preceded with a colon). But we won’t explore these now, not until we look at the part a colon plays in a method/operator in Scala.

= usually depicts mutating operations

An equal sign, as part of the symbol indicates the symbol performs an "in place" update. That is, a mutating operation. So Instead of creating a new collection, the collection is updated in place, and returned.

scala> val buffer = scala.collection.mutable.ArrayBuffer.empty[Int]
buffer: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer()

scala> buffer += 100
res18: buffer.type = ArrayBuffer(100)

buffer += 100 above adds 100 to the ArrayBuffer and returns it.

The effect of = can also be used to easily understand what ++= would do. Since ++ generally represents concatenation of two collections, ++= would mean an "in place" update of a collection with another collection by concatenating the second collection to the first.

: at the end of a method name is for right associativity

Having the colon ending a symbol signifies right associativity. To understand this we need to look at what infix notation is and how it is implemented in Scala.

When an expression is written in the form of:

leftOperand operator rightOperand // eg 1 + 2

we have an infix notation.

There is also the dot notation. Which is a notation that expresses a method call on an object in the form of:

Object.method(argument) // eg. "scala".toUpperCase

Scala makes possible the syntactic sugar that allows dot notation expression to be written in infix notation. So instead of having:

Object.method(argument)

It can be written as:

Object method argument

Meaning that, in essence, when you have an infix notation in Scala, it is actually de-sugared into a method call on the left operand.

So when we write something like 1 + 2, in effect we are actually calling the method "+" that is defined on the Int type (since 1 is of type Int). That is 1 + 2 is 1.+(2)

So what does all these got to do with the colon in a symbol?

First thing is that, in the strict sense, Scala does not have operators. It only has methods defined on objects. And when a Symbol (that is, a method name) ends with a colon, it means it is actually defined on the right operand, instead of it being a method on the left operand.

So:

leftOperand operator: rightOperand 

is actually:

rightOperand.operator:(leftOperand)

if "operator" is alpha numeric, we need an underscore before the colon. For example "operator_:". Which is a general rule in Scala which states that mixing alpha/non-alpha characters must be separated by an underscore.

Now these 4 symbols (++, +,= and :) can be used to unravel and make sense of the remaining symbols: ++:, +:, :+ etc

This we do with the remainder of the post.

++: for concatenating two collections but with right associativity

Based on our initial definition of ++, it is a symbol used to indicate concatenation of two collections.

Add that together with the knowledge of how : is used to indicate right associativity, then it becomes clear that ++: represents the same concatenation operation of two collections, but with the concatenating method defined on the right operand. So basically:

scala> List(1,2,3) ++: List(4,5,6)
res11: List[Int] = List(1, 2, 3, 4, 5, 6)

is in essence:

scala> List(4,5,6).++:(List(1,2,3))
res12: List[Int] = List(1, 2, 3, 4, 5, 6)

+: and :+

+: is an appending operation between a collection and a single element (or tuple of elements). The : in the symbol indicates that this is an operator with right associativity, that is, it is defined on the right side operand.

scala> List(1,2,4) :+ 5
res20: List[Int] = List(1, 2, 4, 5)

scala> List(1,2,4) :+ (5,6,7)
res21: List[Any] = List(1, 2, 4, (5,6,7))

It is interesting to note that the result of List(1,2,4) :+ (5,6,7) is
List(1, 2, 4, (5,6,7))and not List(1, 2, 4, 5,6,7) showing that the result of :+  is indeed an append and not a concatenation.

+: is the version that is defined on the right operand. That is:

scala> 1 +: List(2,3)
res27: List[Int] = List(1, 2, 3)

To demonstrate the right associativity of :in +: if we flip the operands above, we would run into a syntax error:

 scala> List(2,3) +: 1  
 <console>:11: error: value +: is not a member of Int  
     List(2,3) +: 1  
          ^  

This is because List(2,3) +: 1 is actually 1.+:(List(2,3)) which is a compile time error as the Int class (which is what 1 is) does not defined a method +:


:\ and /: Fold Right and Fold Left

:\ represents foldRight. /: represents foldLeft.

scala> (List(1,2,3) :\ 1)(_ - _)
res31: Int = 1

Which is same as:

scala> List(1,2,3).foldRight(1)(_ - _)
res33: Int = 1

/: ends with a : so when used in infix notation, it represents method defined on the right operand.

scala> (1 /: List(1,2,3))(_ - _)
res32: Int = -5

Which is the same as

scala> List(1,2,3)./:(1)(_ - _)
res34: Int = -5

Or better still expressed without the use of the symbol.

scala> List(1,2,3).foldLeft(1)(_ - _)
res36: Int = -5

...and by the way, if you look hard enough at the symbols, you will see the visual clue :)

With :\ the backslash looks as if it is folding over the : from the right, hence foldRight.
With /: the forwardslash looks as if it is folding over the : from the left, hence foldLeft.

Double colon :: and Triple colon :::

:: represents prepending of a single element or tuple of elements to a List
::: represents prepending another List with a List

As both the double colon and triple colon end with a colon, they both refer to a method call on the right operand:

scala> 1 :: List(2,3)
res40: List[Int] = List(1, 2, 3)

Is the same as:

scala> List(2,3).::(1)
res41: List[Int] = List(1, 2, 3)

While:

scala> List(1,2) ::: List(3,4)
res42: List[Int] = List(1, 2, 3, 4)

is the same as:

scala> List(3,4).:::(List(1,2))
res43: List[Int] = List(1, 2, 3, 4)

It should be noted that both :: and ::: are only defined on the List. (this is largely due to historical reasons which we would not bother with in this post).

List also has a subclass called :: which can be used to create a new list (yes, we have a method named :: and also a class named ::and obviously they are two different things - one is a method name, the other is a class - even though they have the same name).

The type signature of the :: constructor gives an inkling of how it can be used to create a new list

new ::(head: B, tl: List[B])

So we can take an existing List and use it to create a new List, like thus:

scala> new ::(1, List(2,3))
res47: scala.collection.immutable.::[Int] = List(1, 2, 3)

Or by using Nil, which is another sub (subobject) of List which represents an empty list. Thus

scala> new ::(1, Nil)
res48: scala.collection.immutable.::[Int] = List(1)

In summary...

We can start off with the following axioms:

++ Joins two collections together.
+ Joins a single element (or tuple of elements) to a collection. Defined for a Set
= Indicates a mutating operation. I.e an operation that updates a collection “in place”
: When the colon ends an operator, it means the operator is defined as a method on the right operand.

And then, use them to infer the following:

++: Joins two collections together with the method defined on the right operand
:+ Joins a single element (or tuple of elements) to a collection. Defined on a Seq
+: Joins a single element (or tuple of elements) to a collection. Defined on a Seq but with right associativity
+= Updates a mutable collection by adding an element (or tuple of elements) to it.
+=: Updates a mutable collection by adding an element (or tuple of elements) with the method defined on the right operand.
++= Updates a mutable collection by adding another collection to it
++=: Updates a mutable collection by adding another collection to it but with the operator defined as a method on the right operand
:\ Fold left
/: Fold right
:: Prepends an element (or tuple of elements) to a list. The operator being defined on the right hand operand
::: Prepends a collection to a list. The operator being defined on the right operand
:: A class which extends List which can be used to create new lists
+ is only available on Set
:: and ::: are only defined on List




1 comment:

Unknown said...

Scala has definitely too many operators. So easy to produce unreadable code.