The power of yield (return)#

Yield is one of the coolest, yet underused, new .NET 2.0 features that you may not be familiar with. Yes, I'm talking about a 2.0 feature - even though 3.5 just came out, it is my experience that a large number of developers have still not embraced all that 2.0 gave us. Hopefully this article will help make this powerful, yet initially confusing, feature more approachable.

Some background facts

When you "foreach" over an object (array, collection, etc), it must implement IEnumerable or IEnumerable<T> - the generic version which returns strongly typed objects (for the rest of this article, I'll just refer to IEnumerable).

All arrays, and most collections (ArrayList, List<T>) implement IEnumerable, which is why you can "foreach" over them.

If ALL you need to do to a collection is cycle through its members - you only need to reference it as an IEnumerable. That explains why the following code is perfectly valid:

IEnumerable<int> numbers = new int[] { 1, 2, 3 }; foreach (int i in numbers) { Console.WriteLine(i); }

It is always better to depend on abstract types (interfaces, abstract base classes) rather than concrete types. This allows you to change the implementation of a method without impacting any of its callers.

Consider this example where CallingMethod gets a set of numbers from MethodBeingCalled. It really doesn't care if it gets an array or a List or whatever. It just does a foreach over it, which means it only needs the functionality exposed by the IEnumerable interface.

public void CallingMethod() { List<int> numbers = MethodBeingCalled(); foreach (int i in numbers) { Console.WriteLine(i); } } private List<int> MethodBeingCalled() { List<int> numbers = new List<int>(); numbers.AddRange(new int[] { 1, 2, 3 }); return numbers; }

The problem with the current design is that the CallingMethod declares the list of numbers List<int>. That is a specific implementation detail. If CallingMethod needed to add more numbers to the list after it was returned from MethodBeingCalled, or if individual numbers were being accessed by their index, then it would make sense to declare it as a List<T> (or better, IList<T>). But it really only needs to cycle through the numbers in order. If MethodBeingCalled wanted to change its implementation to return an int[] instead, it would have to change its signature, and CallingMethod would also have to change. However, if CallingMethod had just declared its local variable as IEnumerable<int> (because that is the only functionality it truly depends on), then it would not have to change if MethodBeingCalled changes - as long as it still returns something that implements IEnumerable<int>.

So what does foreach do? It asks the IEnumerable if it has any items. If so, it gets the next item and uses it within the loop. It then asks the IEnumerable again if it has more items, gets the next one, and so on, until there are no more items. The important thing to note here is that foreach never works with the entire collection all it once - it only deals with one item at a time.

Introducing yield

How can we use all of this information? If you have a method that returns a collection of data, and you can be sure that callers are only going to be interested in looping over each item in the collection, one at a time (which is very common), you can declare your method's return value as IEnumerable<T>.

When you declare your method's return value as IEnumerable<T>, something magic happens: the C# compiler will now let you use the yield keyword within the body! So why does that matter? Because it allows you to take advantage of the way that foreach really works. Remember, foreach will just ask for one value at a time. When it does, it will call into your method - your method will run until it encounters a yield return statement. When it reaches a yield return statement, that value is immediately returned to the caller. In the next iteration of the caller's loop, it will ask again if there is another value, your method will start executing again, but this time it will start on the line after the last yield statement that was run. Yes, you read that right: execution alternates between the calling method and the method being called, all the way through the loop. This is completely different from the normal way that methods interact, where one method gives up control to another method, which doesn't return until it is completely finished executing.

This has at least 2 very important implications for the method that returns the IEnumerable<T>:
1) It does not need to build up a big list of all of the items it plans to return, holding them all in memory at the same time. If the list of items is very large, holding them all in memory at once can be a significant burden.

2) It does not necessarily need to do the work to produce all of the items in the list. If the caller breaks out of the foreach loop before the entire collection has been traversed - all of the work needed to produce the remaining items in the colleciton is avoided. Without using the yield keyword, you would have to build up the entire collection to return, even if the caller was only going to look at the first few items.

Demonstration

To illustrate the advantage of this approach, I included sample code at the bottom. You can create a new Console Application and paste this over the default Program.cs.

A method, GetCombinations returns a collection of 2 numbers combined. The caller then loops through the collection, looking for a specific combination: "13".

First, run it using GetCombinations. This is implemented the way most people would write a method that returns a collection, in .NET 1.1. The output willl look something like this (the indented output is from within GetCombinations):

BEFORE call to GetCombinations
        Beginning execution of GetCombinations
        Producing 33
        Producing 23
        Producing 13
        Producing 32
        Producing 22
        Producing 12
        Producing 31
        Producing 21
        Producing 11
        Ending execution of GetCombinations
AFTER call to GetCombinations
Checking 33
Checking 23
Checking 13
Found!
Elapsed: 00:00:08.9931583

Now, comment out the line that calls GetCombinations, and uncomment the line that calls BetterGetCombinations. It is important to note that BetterGetCombinations uses the exact same algorithm as GetCombinations - given the same input, they will both return the exact same list of strings, in the same order (I've emphasized the lines that changed, in the sample code below). But since the caller is only looking for a specific combination, not all combinations need to be produced. Consider the output:

BEFORE call to GetCombinations
AFTER call to GetCombinations
        Beginning execution of BetterGetCombinations <-- notice that BetterGetCombinations doesn't really start until the foreach loop - NOT when the method was first called!

        Producing 33
Checking 33
        Producing 23
Checking 23
        Producing 13
Checking 13
Found!
Elapsed: 00:00:02.9976438

With the improved implementation that took advantage of the yield keyword, the program was able to finish its job in less than half the time! It also used much less memory, as it never had to store all 9 strings in a collection. Now imagine the potential impact if GetCombinations returned a collection with thousands of entries!

Sample Code

Paste the code below over the default Program.cs in a new Console Application:

using System; using System.Collections.Generic; using System.Diagnostics; using System.Threading; namespace YieldDemo { class Program { static void Main() { Stopwatch timer = Stopwatch.StartNew(); Console.WriteLine("BEFORE call to GetCombinations"); //1) First try running using the GetCombinations that returns a List<string> List<string> combinations = GetCombinations(3, 3); //2) After observing the output, comment out the above line, uncomment the next line, and re-run // IEnumerable<string> combinations = BetterGetCombinations(3, 3); Console.WriteLine("AFTER call to GetCombinations"); foreach (string combination in combinations) { Console.WriteLine("Checking " + combination); if (combination == "13") { Console.WriteLine("Found!"); break; } } timer.Stop(); Console.WriteLine("Elapsed: " + timer.Elapsed); } static List<string> GetCombinations(int left, int right) { Console.WriteLine("\tBeginning execution of GetCombinations"); List<string> output = new List<string>(); int currentLeft; int currentRight = right; while (currentRight > 0) { currentLeft = left; while (currentLeft > 0) { string newCombinationToProduce = currentLeft.ToString() + currentRight; Console.WriteLine("\tProducing " + newCombinationToProduce); Thread.Sleep(1000); // simulate expensive process to produce a combination output.Add(newCombinationToProduce); --currentLeft; } --currentRight; } Console.WriteLine("\tEnding execution of GetCombinations"); return output; } static IEnumerable<string> BetterGetCombinations(int left, int right) { Console.WriteLine("\tBeginning execution of BetterGetCombinations"); // NOTE The local List is not necessary in this implementation int currentLeft; int currentRight = right; while (currentRight > 0) { currentLeft = left; while (currentLeft > 0) { string newCombinationToProduce = currentLeft.ToString() + currentRight; Console.WriteLine("\tProducing " + newCombinationToProduce); Thread.Sleep(1000); // simulate expensive process to produce a combination yield return newCombinationToProduce; // insteading of adding to a list, just return the value --currentLeft; } --currentRight; } Console.WriteLine("\tEnding execution of BetterGetCombinations"); } } }
Sunday, February 03, 2008 12:02:09 PM (Central Standard Time, UTC-06:00) #    Comments [18]  | 

 

All content © 2010, josh
About this site
Send mail to the author(s) Contact me
Feed your aggregator (RSS 2.0)
Joshua Flanagan
I am a software developer focused on continuous improvement in the .NET community
Los Techies

On this page
Archives
Rest of the world

Acknowledgements

Powered by: newtelligence dasBlog 2.1.8209.14743

Special thanks to LosTechies.com

Site theme based on the essence design by Jelle Druyts

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.