When it comes to using LINQ expression syntax, I was a relatively late adopter not needing to really understand how it works. I'd just quite happily work with Select(o=>) and SelectMany() without knowing how fundamental they are in LINQ query syntax expressions. That is until I started using languageExt(though never mind about that for now). 

I won't be talking about LanguageExt in this article, rather how types, in general, interact with LINQ. This is however crucial when considering how languageExt types interact with LINQ.

An important aspect of understanding how the LINQ query syntax works is understanding why some objects can be used in LINQ query expressions while others can't.

In most cases, we seem to be able to use them on objects that implement the IEnumerable<T> interface. Why? But before I answer why, let see the LINQ syntax query in action:

from item in ienumerable

We also have this variant:

x = from itemA in IEnumerableA
    from itemB in IEnumerableB
    select itemA + itemB;

Ok, let's try and figure out what these things are doing and why we should worry about them at all.

What are these trying to achieve? Fundamentally what IEnumerable and Linq expressions are trying to do allow you to do this:

foreach(var a in listA)
    {
        foreach(var b in listB)
        {
             yield return Process(a, b);
        }
    }

Cool? Maybe not, interesting? definitely. So this means that it's trying to prevent you from having to keep writing for-each statements like the above. Why?

Because that's boiler-plate code - common code that you tend to write over and over again. What's wrong with the boiler-plate code?

Well its tedious and if we can remove the need to write it, then there will be no errors in any of it that you write. There are other reasons but another on is that you've replaced a conditional syntax with an expression. Nevermind that for now.

So if you look at a typical the LINQ syntax query again:

x = from itemA in IEnumerableA
    from itemB in IEnumerableB
    select itemA + itemB;

How this relates to the looping constructs we saw earlier, is that each "from X in Y" is represented by a for-each X in Y.

If you can think of it that way, it'll be much easier. Then when you're IN that for-each, theoretically, the next "from X in Y" represents the next level for-each(), so you're in the inner for-each loop: 

foreach(var a in listA) // from a in listA
    {
        foreach(var b in listB) // from b in listB
        {
             yield return Process(a, b); // select Process(A,b)
        }
    }

Looked another way:

x = from itemA in IEnumerableA // foreach(var itemA in IEnumerableA) {
    from itemB in IEnumerableB //    foreach(var itemB in IEnumerableB) {
    select itemA + itemB;      //      return itemA+itemB;
                               //    }
                               // }

So we've substituted a for-each conditional for a Linq expression. 

 So we've established that "from X in Y" is executed when in a Linq Expression where Y is an IEnumerable type and X is each item that it gets in the for-each loop. Ok, that's cool, but where is this for-each stuff thats supposedly being substituted for "from X in Y" written? Don't get me wrong i love not having to write foreach statements but there is it?

The answer that this common for-each code is put into the Select() and SelectMany() extension methods for the IEnumerable type. This is so that we don't have to write those for-each statements, they have been put into Select() and SelectMany() functions elsewhere. Extension methods are after-the-fact add-ons for classes, they are static classes however.They 'hook' onto the type without yuo needing to incorporate that method into the types code/definition. Here is what I mean:

To see the image in full res please go here

This would be a good time to go for a cup of tea and then come back once this stuffs percolated in your mind(or has started an internal fight in your mind, either way)

Ok, let's break down this diagram.

Firstly Select() and SelectMany() do roughly the same thing initially and then SelectMany() just does one extra bit.

Select and SelectMany() both perform a for-each on an item in the IEnumerable, effectively extracting each item from the IEnumerable and then it runs a transformation function on each of those items it got. 

That's basically what Select() does and the result of that function is then put back into a new IEnumerable. SelectMany() is the same up to that point. We'll get to how SelectMany() differs later but for now there are  two observations now need to make about Select()'s implementation:

  1. The function that is run on each obtained item in the IEnumerable is called a mapping function and its supplied by the user. You can see its passed in as a parameter to Select and called map.
  2. The result of running of mapping function on each item has the results put into the same place of the original item but in a new IEnumerable.(the original Enumerable is untouched and in fact, IEnumerables cannot be modified which is why they are made into a similar but new IEnumerable from the first)
  3. This has effectively mapped each item in the original IEnumerable to the result of the map(item) function into a new IEnumerable. We also say that this has transformed the original item and put it into a similar but new IEnumerable as the original (but now it has the mapped() result version instead of the original item)

See, no magic, You can see the for-each stuff that's used in the Linq expressions now:

public static class EnumerableExtensions
    {
        public static IEnumerable<B> Select<A, B>(this IEnumerable<A> self, Func<A, B> map)
        {
            foreach(var item in self)
            {
                yield return map(item);
            }
        }       
    }

Note that extension methods like Select work on a Type indicated by the this operator in the first parameter to the extension method. In this instance the extension method if for the IEnumerable type.

Anyway, all this code is what happens when you see this:

x = from itemA in IEnumerableA  // foreach var itemA in IEnumerableA
    select itemA.Upper(); // this is basically the mapping function run for each item in IEnumerableA
                               
                               

the select clause (select item.Upper()) is passed in as the map() function.

Ok, but what about when you have two from clauses as the previous examples had? We've just explained that Select() makes the above work but that only has on from clause in it.

When you have two from clauses in it, thats when SelectMany() function is used:

public static class EnumerableExtensions
    {        
        public static IEnumerable<C> SelectMany<A, B, C>(
            this IEnumerable<A> self, 
            Func<A, IEnumerable<B>> bind, 
            Func<A, B, C> project)
        {
            foreach(var a in self)
            {
                foreach(var b in bind(a))
                {
                    yield return project(a, b);
                }
            }
        }
    }

As I said earlier, SelectMany() starts of the same way that Select() does, it performs a foreach on each item in the IEnumerable and runs a function on it. However this function is called a binding function and this functions results aren't passed immeditable back into the IEnumerable, they have to do a little bit more:

  1. Again it gets each item in the IEnumerable (exactly like the Select() implementation)
  2. It runs a bind() function instead of a map() function on each item - conceptually this is the same in the Select() implementation
  3. The bind() function is provided by the user and it must be designed to return another IEnumerable (though we still wont pass the result back into the IEnumerable yet)
  4. Another foreach it performed on the result of the Bind() function 
  5. Each of those items(in the newly produced list) are passed into a Project() function, another function that the user provides, now the results of this secondary function are passed back into a similar but new IEnumerable as the original.

Quick observations about the SelectMany() implementation:

  • You provide two functions in SelectMany() instead of one (as in the Select())
  • The first is like a map() in Select() but is called bind() and both operate on each item in the IEnumerable (and they both transform)
  • Where map returns a value, bind() returns an IEnumerable, those list of values are further processed by a project() function 
  • The project function is the 2nd function passed into SelectMany() - Select() doesn't have a 2nd function.

Look at my diagram again and see if this starts to make sense:

 So that's fundamentally why the Linq Syntax query works on IEnumerables (or any types that have Select() and SelectMany()).

Why this is important in languageExt is because its types behave like IEnumerables that have also implemented Select() and SelectMany() and thus appear in Linq query syntax like this:

using static LanguageExt.Prelude;
 
   Option<string> optionA = Some("Hello, ");
   Option optionB = Some("World");
   Option optionNone = None;
 
   var result1 = from x in optionA
                 from y in optionB
                 select x + y;
 
   var result2 = from x in optionA
                 from y in optionB
                 from z in optionNone
                 select x + y + z;

So Option<T>, Either<L,R> can be viewed being able to do a for-each(var item in Option<t>) ...because that's exactly what happens in the above when its used in Linq syntax query expressions!

In the same way that a for-each gets each item in a list or an IEnumerable, the for-each gets an item from the option, or either and depending on its value, will determine if the next inner for-each will run or not.

And this article goes a long way into explaining how these types can be in those Linq expressions, to begin with. In the next instalment in this functional programming series, I'll explain a bit more about how actually works, but with this introduction, you can see that it has manipulated the Select() and SelectMany() methods so that it can be used in LINQ queries.

Note: List<T>, Arraylist<T> etc are all IEnumerables


Comments powered by CComment