Probabilities in Software

It might seem strange to write about probabilities on a blog about programming but bear with me; it will all make sense soon.

We use probabilities a lot in our daily lives without realising it, to the point where they can seem fairly intuitive. This is not the case, probabilities can be very counter-intuitive. A classic example of this would be the Money Hall Problem (you can look that up yourselves). People also tend to feel that if there is a 1-in-10 chance of something happening, then it should happen at some point in the first 10 tries.

This is a misunderstanding of the fact they are independent events. Each time the event occurs, it has a 1-in-10 chance regardless of what has come before. If you roll a regular die, there is a 1-in-6 chance of rolling a 6, what has happened in the past is irrelevant, if you roll 5 times and don’t get a 6, the sixth roll still has a 1-in-6 chance of rolling a 6. There are occasions though, where events are not independent and this is what I’m going to write about.

An example of a dependent event would be during a lottery draw (note, separate draws are independent as the machine is reset each week, you are no more likely to win each week, here I am talking about the balls drawn in a single draw). There are 49 balls in the machine and you have 6 numbers on your ticket. The odds of getting the first number are 1/(49/6) or about 12%. Now however, the odds change for the second number. There are now only 48 balls in the machine and 5 left on your ticket. These are the dependant events. The odds of getting the second number are 1/(48/5). This continues over the whole ticket. To get the odds of winning the jackpot, we must multiple all these odds together. 1/ (49/6*48/5*47/4*46/3*45/2*44) or 1/13,983,816.

To bring this back on topic, it could seem strange to use probabilities in software since software is generally thought of to be deterministic. If you provide identical input, it will produce identical output. The idea of using probabilities might seem to contradict that, however there are cases where trying to calculate something precisely is impossible or at least computationally inefficient.

The specific case we will look at is that of sensor analysis. Previously I have written about path finding algorithms. These assume that you have a map of the world already. What if you want to build that map? You need some sort of sensor that can see the world. Typically, a sonar sensor can be used to detect walls and obstacles. Ultrasonic sound waves are bounced off the target and timed how long they take to return to calculate the distance to the target. Distance = (time * speed of sound) / 2.

Sensor Accuracy

There are problems with sonar analysis though, it will have a margin for error and the sound travels in a cone, the response could come from anywhere in that cone.

Field of view

Here, R represents the maximum range of the sensor and theta is the field of view. To compensate for this, we must first split our environment into nodes.  Just like in the path finding problem, a square grid is simplest. We must then assume that any node intersecting the cone could be the target the sonar has detected. It is mostly like however, that the response is directly in front of the sensor and closer to the sensor rather than further away. It is worth noting some sensors have a minimum distance they can detect.

To handle this cone, we must analyse all of the nodes that are within it. We need to calculate a probability of the sensor’s accuracy, given that the node is occupied, P(s|O). We don’t know if it is occupied at the moment but that doesn’t matter for now.

Equation1

 

Equation2

 

Equation3

Where R is the maximum range of the sensor and Beta is the field of view of the sensor. For every node we can calculate the vector to from the sensors position and using the dot product we can calculate the angle between the sensor and the node. By taking the ratio of the node’s distance and angle to the maximum of the sensor we can calculate a value between 0 and 1 in order to determine a probability for the sensors accuracy of detecting the occupancy of a given node.

Stage 1 of the included code demonstrates this part of the calculation.

Response Probability

In this image, the shading of the squares represents the P(s|O) with white being 1.0 and black being 0.0.

Sensor Model

Now that we can handle the accuracy of the sensor, we need a model for how we are going to interpret the results. The response we get will be a scalar value representing the distance to the obstacle.

Sonar Sensor Model

From this we can calculate the region that the response could have come from. This is Region I in the diagram. This region is centred on the reading from the sensor and extends by the error margin in either direction. Region II represents the area closer than the reading. If this region were occupied, we would have gotten a response from there instead so we can assume that it is clear and update the nodes in this region accordingly. We cannot just set these nodes to 0.0, we use the function for P(s|O) from the previous  section as the probability it is empty rather than occupied, P(s|E), then P(s|O) = 1.0 – P(s|E). Region III is further away than the obstacle so we cannot see it. As such, we should not update these nodes at all.

Baye’s Theorem

Finally, we need to bring it all together. We can use Baye’s Theorem of conditional probabilities to calculate P(O|s) for each node. We can also use a variant of this to aggregate multiple readings of the same node and further improve the accuracy of the map. The final formula we use is:

Equation4

This allows us to aggregate previous readings with P(O|sn-1). We just need an initial value for this before the first reading. We can just set this to 0.5.

With this model we will never get probabilities of 1.0 and 0.0 for a node, so thresholds would need to be defined to determine if a node should be considered empty or occupied.

Stage 2 of the attached code attempts to model this; however I wasn’t going to try to simulate an ultrasonic sensor so you can just type in values for the sensor. I’m sure you can easily mess it up with contradictory readings but if you are fairly consistent, it should show it working quite well.

The code can be found at https://github.com/a-jackson/bayesian-sensor-analysis. It is by no means perfect, especially in stage 2, but I think it gets the point across.

Object Orientation

Most of the software I see at work is written in C# which is an object-oriented language, but it is written without any understanding of what that means and how is affects your software design. Some of it might as well have been written in C and the classes are used to separate different functional areas but that is about it.

In this article I’m going to try to explain some of the advantage of OO and highlight software designs that make appropriate use of it.

Terminology

Before I start I’m going to cover some terminology. I often find there is some confusion when talking about fields, members and properties.

  • Member – This is any item that is within a class such as fields, functions, events, properties and constructors.
  • Field – This a variable in the class scope. Not to be confused with a property.
  • Property – This is a .Net specific construct. It is essentially just a get and set method but with the syntax of a field. A field is still required to store the value in the class. C# allows for an “auto property”, this is a property without a backing field. However, the compiler just creates a backing field and a standard get and set method to access it so there is still a field.

Encapsulation

The first topic to discuss is encapsulation. When we design a class we can choose what access is available to the class members. The purpose of encapsulation is to hide inside the class the aspects of how it works and only expose the members that are required to interact with it. For example, it is normal that all of your fields are private and only properties or methods are exposed publicly.

This ensures that users of the class have to use the interface that you have created and you can validate all data in and out. If you need to expose data directly, you can create properties. This allows you to valid sets before accepting them. It is also possible to have different access modifiers on the get and set components so only Gets are possible externally. If you have complete control over all the data stored in the fields, you can be confident that your methods will run without error. You should treat each class as a self-contained unit of code. If it is impossible for a field to be set to null, you can be sure when using it that it isn’t null, regardless of anything else that may be happening in the rest of you system.

Many bugs are caused by fields having a value different to what is expected. If you apply these simple rules to your entire system, it should be far less likely that you will get invalid data in your fields and subsequently have less bugs in your system.

Another aspect of encapsulation is to only include things that are relevant to the class, in the class. Each class should be self-contained, manage its own data, and provide a useful interface for users of the class to perform its actions. This keeps each class fairly simple, and therefore manageable. Again, if your code is manageable, it should have far fewer bugs.

Inheritance

One of the fundamental concept of object orientation is that of class inheritance. This allows you to create an “is-a” relationship between 2 classes. For example, you could have a Shape class. Then a Rectangle is-a Shape. This is helpful when you need to have areas of you program where you don’t care about the specifics of a class and so can work with a more generalised form. For example, you could have a List<Shape> in your program and then store in it Rectangle, Triangle and Circle objects. This is far more useful than to have 3 different lists.

When once class is derived from another, it inherits all of its fields and methods. The parent class can still control its own data through private fields, but may allow more access to its children that it would anything else. This is done through the “protected” access modifier.

Classes can override their parent’s functions and have them work differently. In some cases, the parent class can make this mandatory by declaring the function as abstract. This allows the method to take advantage of additional data the child class may have, or simply to operate differently because the class represents something different from the parent.

While inheritance allows for code reuse, it is important not to overuse it. You should not derive from a class just to gain easy access to its methods. In this case it is far better to reference another instance of the class. If that is not practical in your design, then it may be time to revisit the design. Inheritance increasing class coupling which can be a bad thing (see below). It is often better to have an associative (has-a) relationship with a class rather than an inheritance relationship.

Polymorphism

Polymorphism is the dynamic determination of which version of an overridden method should be called.

Keeping with the shapes example, the Shape class may define an abstract method to calculate the area. It is unable to perform this function itself because all shapes are different however by defining it here, we can force all shapes to implement it and users of the Shape class can call it without needing to know what type of shape it is exactly.

public abstract double CalculateArea();

The Rectangle class can then implement it differently from the Circle class.

// Rectangle class
public override double CalculateArea(
{
    return Width * Height
}

// Circle class
public override double CalculateArea()
{
    return Math.PI * Radius * Radius
}

When the code runs, it will look at the type of the object and run the method in that type, or the closest ancestor to that type that implements it. In order to trigger this search, the method must be marked as abstract or virtual in the base class and override in all subsequent classes.

Cohesion and Coupling

Cohesion refers to how much the members of class work together to achieve the class’s function. A good class design would have a high cohesion of its members. All of the members of the class should be relevant to its function. If the class needs some functionality that it does not provide itself, it can call another class to achieve that.

Coupling refers to how much classes are tidied to each other. It is the interactions between classes that produce a program out of a library, however they should be kept to a minimum. Highly coupled classes are complicated to maintain as a change in one can have impacts all over the program. Generally high cohesion and low coupling go together as highly cohesive classes have less need for coupling.

Coupling can be reduced further by having less specific method parameters. For example, if you have a method that iterates through a collection of strings and prints them to the console, you could write:

public void PrintStrings(List<string> strings)
{
    foreach (string str in strings)
    {
        Console.WriteLine(str);
    }
}

But it is better to be less specific about the type of the parameter. In this case, you don’t need a List<string>, the code would work perfectly with a string[], or an ObservableCollection<string>, or any other collection of strings. So why be so restrictive?

public void PrintStrings(IEnumerable strings)
{
    . . .
}

You should not do the same with return types however. Be as specific as possible with the return types. The caller can use it generally if they want, but if they need the specific type you are forcing them to perform a cast. For example, if you have a method that creates a Rectangle object, don’t return a Shape object. If the user only needs a Shape, then they can with no issue. But if you return a Shape and they need a Rectangle, they now have to check that it is a Rectangle and cast it.

Summary

This article has touched on the high level concepts of class design in an object-oriented language. How to use each of these concepts successfully to produce a stable system is left for another article. It is useful to think though about the relationship between classes, is it as has-a relationship or an is-a relationship. That usually will tell you how to couple the classes.

Iterator Blocks

One question I am frequently asked. “What’s a yield return?

This is when the function is an iterator block. An earlier post used iterator blocks to demonstrate Linq methods but I never really explained what they are or how they work.

What are they?

Iterator blocks are methods that return a collection. The part that makes them special, is that the collection is only iterated on demand, one item at a time. Execution of the block is then yielded to the called until the next item is requested.

Take the following method.

private IEnumerable<int> GetNumbers()
{
    List<int> numbers = new List<int>();
    for (int i = 0; i < 10; i++)
    {
        numbers.Add(i);
    }
    return numbers;
}

This method returns a list of integers, however, we had to make a List<int>, fill it with data, and then return the list. This list will sit in the heap until it is garbage collected. The caller may not need the whole list, they may only want the first element. In that case, we have built an entire list redundantly. It may not be needed yet, in which case it will take up memory until it is used. By using an iterator block, we can return the same collection without having to create the list and have it sitting around in memory until it is needed. Each item is calculated on demand when it is needed.

private IEnumerable<int> GetNumbers()
{
    for (int i = 0; i &lt; 10; i++)
    {
        yield return i;
    }
}

The result of the call to this method is a query (see Deferred Execution). The code is not executed until the returned IEnumerable<int> is iterated over. Each call to the enumerators MoveNext() method will return execution to the GetNumbers() method which will carry on where it left off until a new yield return is reached. This will yield the execution back to the enumerator until the next iteration is required.

Why would I use them?

This has several advantages and disadvantages that it is worth considering.

Advantages

  • Only the exact number of iterations required are executed. If the caller only uses .First() then the function will only run to the first yield return.
  • No need to create an entire collection in memory. This can be useful with incredibly large data sets. It is possible to create a series of methods that will load the data, process the data and output the data, while keeping all the stages in separate methods and only storing a single value in memory at a time.
  • The caller is free to store the data in whatever collection they need. If the data was being used on a user interface, the caller may need an ObservableCollection<T>. With the iterator block they can pass the result to the constructor and create the collection without having made another List first that wasn’t needed.
var collection = new ObservableCollection<int>(
    GetNumbers());

The point about working with large data sets should be covered in more detail. Take the following program.

static void Main(string[] args)
{
    var file = LoadFile("In.txt");
    var data = ProcessData(file);
    SaveData(data, "Out.txt");
}

static IEnumerable<int> LoadFile(string filename)
{
    string line;
    var stream = new StreamReader(File.Open(filename, FileMode.Open));
    while ((line = stream.ReadLine()) != null)
    {
        yield return int.Parse(line);
    }
    stream.Close();
}

static IEnumerable<int> ProcessData(IEnumerable<int> data)
{
    foreach (var item in data)
    {
        yield return item * item;
    }
}

static void SaveData(IEnumerable<int> data, string filename)
{
    var stream = new StreamWriter(File.Open(filename, FileMode.Create));
    foreach (var item in data)
    {
        stream.WriteLine(item);
    }
    stream.Close();
}

Here we have 3 methods. The first loads data from a file, the second processes that data and the third writes it to a file again. Since the first 2 are iterator blocks. SaveData is the first method to be executed, it opens the stream and starts to iterate over the data. In order to get the data, ProcessData executes and starts to iterate its own data collection. In order to get the data, LoadFile executes and reads the first line of the file. Execution then yields to ProcessData which squares the value and then to SaveData which writes it to the file.

I tested this program using an input file with 7,500,000 lines and it used approximately 6MB of memory. Very little memory is used because the data is streamed directly from one file to the other and has no need to be retained any longer than that.

I then added .ToList() to the LoadFile and ProcessData calls. This forces the program to load the entire file to memory, then square all the values in memory, before writing the whole file. This used approximately 10x more memory. While I did not profile the application properly, this demonstrates the power of iterator blocks. It keeps the code simple and segregated while allowing for quite powerful data management.

Disadvantages

  • Handles exception can be strange.
  • Double executions

Since the iterator block is not executed until it is iterated. This can make exception handling a little strange. Take the following program

static void Main(string[] args)
{
    IEnumerable<int> collection = null;
    try
    {
        collection = Exception();
    }
    catch (Exception e)
    {
       // Handle exception
       return;
    }
    foreach (var item in collection)
    {
       // Process
    }
}
static IEnumerable<int> Exception()
{
    yield return 1;
    throw new Exception();
}

Here we have tried to catch the exception in the call to Exception, however the exception is not thrown until we are half way through the foreach loop meaning that the exception is un-handled.  This is not a problem if we are aware this could happen, however with the example above where several iterators are chained together, you could be getting exceptions from a completely different area of the program while looping over a collection.

The issue of double execution is one to be careful of. The problem comes because the values are not stored. Therefore, if the collection is iterated over twice. The iterator is executed twice and the data calculated twice. If the iterator is particularly expensive then you probably want to avoid this. Even seemingly simple calls like Count() cause the collection to be iterated and the following code suddenly takes twice as long as necessary.

var collection = GetNumbers();
if (collection.Count() > 0)
{
    foreach (var number in collection)
    {
        // Process
    }
}
else
{
    // Handle no numbers
}

The easiest solution to this is to call ToList() first and then change the Count() call to the Count property. An alternative, change the Count() > 0 to Any(). This will return true as soon as an item is found rather than Count() which must iterate the entire collection to get a count.

How do they work?

The iterator block might seem strange because the code jumps back and forth between different methods and it seems to define the standard programming model of executing all code sequence. It does this with some clever compiler tricks. The compiled code still executes as you would expect.

The entire method is regenerated into a class. The local variables and parameters are stored in fields and the yield return calls are replaced with a state machine using goto statements.

When MoveNext is called, the execution jumps to where it last left off using a goto, the value of all of the parameters has been retained as they are now fields and the state counter is incremented so the next call will jump to the next goto.

You can decompile a program to see the full code but there is a lot of it.

Object Equality

One area of C# that seems to be confusing is object equality. It is a very common operation that is frequently misunderstood. With the primitive types, it works as expected, “a” == “a”, 5 == 5. The issues come when it is used with custom types. If nothing is done, the default comparison for reference type objects is essentially an address comparison. If the 2 variable refer to the same object in memory, then the comparison will return true. If there are 2 object in memory with every field equal. The comparison will still return false. (This is not true for value type objects, the default comparison compares all of the fields with reflection. It should still be overridden as this is not the most efficient).)

In order to get the expected behaviour, 2 methods have to be overridden. bool Equals(object) and int GetHashCode(). The implementation of Equals is fairly straightforward, compare the current object to the parameter on whatever fields you wish. If they are equal, return true. Otherwise return false. The confusion often comes with GetHashCode. This should always be overridden in a pair with Equals. The methods should use the same fields so that equal objects have the same hash code.

There are several rules regarding the hash codes:

  1. Equal objects should have the same hash code.
  2. The hash code should be consistent within an object as long as the object does not change.
  3. Hash codes for a type should be evenly distributed. Ideally, with few duplicates.

We will look at each of these rules in detail.

Equal objects should have the same hash code is important because if 2 equal objects do not have the same hash code, then dictionaries will not work as expected. The hash code of the key is used for storing objects in the dictionary. If you added an object keyed with a String, and an equal string had a different hash code, you may not be able to retrieve the object from the dictionary.

The above is also a reason why hash codes should remain consistent for an object. If hash codes are different each time it is calculated, then it would be impossible to retrieve objects from a dictionary. That said, they only have to be consistent within a single execution of the application. Hash codes should not be stored between executions or transmitted between computers.

Point 3 is mostly about performance. If the hash code is evenly distributed with few duplicates then dictionaries will perform much better. That does not mean that duplicates are not allowed. It is only necessary that equal object have the same hash code. Unequal objects do not require different hash codes. The more duplicates though, the slow dictionaries will perform. The dictionary stores objects in a bucket based on the hash code. The less items in the same bucket, the faster lookups are. The Equals method is used to compare a key to all objects in the bucket to find the right one, so the smaller that collection is the faster the dictionary will run.

I did make this mistake once with an error in my GetHashCode method. The object had 2 Int16 values and I computed a hash code as:

 (a << 16) & b; 

I should have written:

 (a << 16) | b;

I had about 100,000 of these objects and was creating a dictionary keyed with this. With the AND operation, every hash code was 0. It took more than 3 minutes to create the dictionary. I changed to OR and no had no duplicates, the dictionary was populated in milliseconds.

When overriding Equals, it is a good idea to implement the interface IEquatable<T>. This provides the method bool Equals(T). This allows equality without casting to Object first. This is especially important with value types as using the standard bool Equals(object) method would force boxing.

A lot more information on hash codes can be found on MSDN here.

Multithreading

When writing GUI applications, it is often necessary to execute long running tasks on a different thread. This is done because when it isn’t polling the message queue, the GUI becomes unresponsive, it appears as though the application has crashed.

You may also need to use another thread for tasks that may wait a long time, such as network code. This would wait for data to arrive before it can read it.

The code to create a thread and run code on it is fairly straightforward. The issues come when you need to have cross-thread communication. You can have some shared area of memory when one thread writes data and another thread reads it. You have to be careful with this because for several reasons.

  1. The producer may not have finished writing a piece of data when the consumer reads it, this can lead to data corruption.
  2. The producer is working faster than the consumer meaning a value is overwritten that hasn’t yet been read.
  3. The consumer is working faster than the producer and gets the same value twice.

All of these issues can be resolved with a thread safe queue. This is a queue where the enqueue and dequeue methods use synchronisation locks to prevent simultaneous access and the dequeue method checks if the queue is empty and will wait for data if it is.

A simple implementation could be:

public class ThreadSafeQueue<T>
{
    private readonly Queue queue;

    public ThreadSafeQueue()
    {
         this.queue = new Queue();
    }

    public void Enqueue(T item)
    {
        lock(this)
        {
            this.queue.Enqueue(item);
            Monitor.Pulse(this);
        }
    }

    public T Dequeue()
    {
        lock(this)
        {
            if (this.queue.Count == 0)
            {
                Monitor.Wait(this);
            }

            return this.queue.Dequeue();
        }
    }
}

In this queue the lock keyword is used to create an area of code that does not allow simultaneous access. This prevent more than one thread from running inside the zone at once. In order to proceed, the thread must own the lock. If the lock is unavailable, then the thread must wait until it is.

In the dequeue method, we check if the queue has any items in it, if it does not, then the thread will sleep until there is using the Wait method. When an item is inserted into the queue, we can wake any sleeping threads using the Pulse method.

With this class, multiple threads can enqueue and dequeue without worrying if there will be simultaneous memory access or if there is anything in the queue to read. The threads will sleep sleep until they can proceed.

Communication with a GUI Thread

If you have ever tried to update a GUI in a multi-threaded application you will no doubt have seen the exception “Control access from a thread other than the thread it was created on”. You cannot update the user interface from any thread other than the GUI thread. This is annoying when you are running a task in the background and want to update a log or progress indication on the GUI as it is running.

The solution is to use the Invoke method. This method is a method of the Control class so all controls have it. It takes a delegate as a parameter and will invoke that method on the control’s thread. This run your tasks on a different thread so the GUI remains responsive, while simultaneously, providing feedback to the user on the progress of that task.

private void Start()
{
    Thread thread = new Thread(new ThreadStart(RunThread));
    thread.Start();
}

private void RunThread()
{
    for (int i = 0; i < 100; i++)
    {
        Thread.Sleep(50);
        this.progressBar.Invoke(new Action(() => this.progressBar.Value = i));
    }
}

As this example shows, it is very easy to invoke a delegate to be run on the control’s thread and allow the task to continue in the background.

One final option is to use events. You can fire events from your thread class and catch them in your GUI, the event handler can then update the GUI as necessary. The code still needs to be invoked as your event handlers are still executed on your work thread. A generic method can be written to handle invoking event handler where necessary.

private bool InvokeCheck(Action<object, teventargs=""> handler, object sender, TEventArgs e)
    where TEventArgs : EventArgs
{
    if (this.InvokeRequired)
    {
        this.Invoke(handler, sender, e);
        return false;
    }

    return true;
}

private void TaskOnProgress(object sender, ProgressEventArgs e)
{
    if (this.InvokeCheck(this.TaskOnProgress, sender, e))
    {
        this.progressBar.Value = e.Progress;
    }
}

Here, the InvokeCheck will see if an invoke is required and if it is, invoke the event handler and return false. If not, it simply returns true. If the method returns true, we know that it is safe to update the GUI, if not, the method will be invoked again. This is useful because it now does not matter if we are on the GUI thread or not, the event handler can always be executed safely.

More Photography

It’s been a few weeks since I got my camera and I’ve taken a whole lot more photos. I still don’ t have many that I would consider really good but these 2 are amongst my favourites.

IMG_1388
Trent & Mersey Canal
River Trent
River Trent

They were taken about a couple of weeks apart but both are just down the road from me. The river and canal run parallel to one another with the tow path between them. Both of these picture where taken from that tow path.

Another picture I like was taken last weekend in Cannock Chase. Again, walking distance from me. I suppose I’m lucky to have so many places for photography so close to me.

Cannock Chase
Cannock Chase

I think I might start to branch out and try a few different kinds of photography. Maybe take my camera into town and try some urban shots or something.

Reference Types and Value Types

All types in C# are either a value type or a reference type. It is the type that determines how it will be created rather than how it is instantiated. This differs from C++ where it is the instantiation of an object that determines where it is created.

Classes are reference types while structs are value types. Reference types are instantiated on the heap and value types on the stack.

With reference types, your variable is a pointer to the object on the heap. If you pass it into a method or copy it, you are only copying the pointer; it is still the same object on the heap.

public static void Main(string[] args)
{
    List<string> strings = new List<string>();
    AddItems(strings);
    Console.WriteLine(strings.Count); // 2
}

private static void AddItems(List strings)
{
    strings.Add("String 1");
    strings.Add("String 2");
}

In this example, there is only 1 list and so after the method call, the items added by the method are in the list.

With a value type, the value is copied and so modifications are not preserved.

With reference types, it is important to note that you are copying the pointer to the list. While there is only one object, there are 2 pointers. This means that if the pointer is reassigned to a new object, this assignment is not preserved.

public static void Main(string[] args)
{
    List<string> strings = null;
    AddItems(strings);
    Console.WriteLine(strings == null); // True
}

private static void AddItems(List strings)
{
    strings = new List<string>();
    strings.Add("String 1");
    strings.Add("String 2");
}

Passing by Reference

Using the ref and out keywords on the parameters, we can make it so that reassignments of the value are preserved. The difference between these is ref can be used for passing a value in and out of the method, out is used just for passing a value out.

There are some rules governing the use of these keywords.

  1. Ref parameters must be initialised before they can be used while out parameters do not.
  2. Out parameters must be initialised before a function can exit. All out parameters must have a value assigned in all code paths.

Both of these keywords can be used on either value types or reference types. The difference being, that with reference types, it is the pointer that is passed by reference, not the value.

public static void Main(string[] args)
{
    List strings1 = new List();
    List strings2 = strings1;
    strings2.Add("String 1");
    AddItems(ref strings2);
    Console.WriteLine(strings1.Count); // 1
    Console.WriteLine(strings2.Count); // 2
}

private static void AddItems(ref List strings)
{
    strings = new List<string>();
    strings.Add("String 1");
    strings.Add("String 2");
}

As this example shows, the pointer to strings1 is duplicated onto strings2. This means that when the string is added to strings2, it is the same list as string1. However, when strings2 is passed to the method and reassigned, only the strings2 variable is updated to reference the new list. The strings1 variable and its list are unaffected.

Boxed Value Types

There is an exception to the above. This is when value types are boxed. Boxing occurs because all types are derived from System.Object, which is a reference type. This means that any type can be referenced by an Object type variable.

int a = 5;
object b = a;

This is perfectly valid code, however the b variable is a reference type and so must act like one. In order to do this, the value is boxed in an object and copied to the heap. The value of a on the stack is affected. The variable b can be passed to a method and will still continue to reference the same value on the heap.

The issue comes when we try to unbox the value. Since it is an int (System.Int32), we must cast to an int to unbox. Even though casting to a long is a perfectly valid cast, the value must be unbox before this can happen.

int c = (int)b; // Valid. The value will be copied from the heap c.
long d = (long)b; // Invalid. This will throw an exception at runtime as the type does not match.
long e = (long)((int)b); // Valid. We unbox to an int, then cast to a long.

This can be a problem as it is tempting to write a method that take an object type to make it flexible and then cast to generic type if the exact one doesn’t matter. This is not possible, a better solution would be to overload the method.

Deferred Execution

I decided I would start to write posts about the problems that people frequently ask me about when I’m at work. One thing that comes up a lot is problems caused by not realising that LINQ queries use deferred execution.

When you write a LINQ query, the result of that expression is the query itself, not the result of executing the query. This seems to be one of the most frequently misunderstood things about LINQ.

var query = collection.Where(x => x < 10);
var query = from x in collection where x < 10 select x;

Both of these queries are the same, the result is a query that hasn’t yet been executed, just defined. The query is not executed until it is required, for example, iterating over the result will then execute the query.

Modified Variables

One of the problems that I see with this is using variables in the query. The value of this variable may change by the time is query is executed. This gives a different result to what was expected.

int a = 10;
var query = collection.Where(x => x < a);
a = 5;

Now, when the query is executed, it will return values less than 5, not less than 10. Even worse, if the value is changed during execution then it will use whatever the value was at the time each item is evaluated.

If the use a variable is required but you want to avoid it changing prior to execution, the simplest solution is to clone the value first. Then, only use the cloned value in the query and modify the original.

int a = 10;
int b = a;
var query = collection.Where(x => x < b);
a = 5;

Double Execution

We are often working with large datasets the LINQ queries can be quite CPU intensive to compute. A common error though is executing the result of the query twice. This is easily done by looping through the query twice. Each time, the query will be executed and all where clauses, groupings etc. must be recalculated. This effectively doubles the execution time.

This misunderstands comes because if you are not aware of deferred execution then you may think the query has executed and then you are looping through the results twice. This is not the case, each time you iterate over the query, usually with a foreach loop, you execute the query again. This can have serious performance implications depending on where the query is coming from.

The easiest solution is to simply add .ToList() on the end of your query. This will execute the query immediately to make a List object in memory with the results. You then iterate over the list. This does require iterating over the collection twice. It just executes the query once; it may not always be the best option.

Website Security

One thing that really winds me up is the terrible security of websites. I am especially annoyed when they email you your own password. That should not be possible. If they can email you your password, that means they have it stored in plain text. It makes me wonder what the rest of their information security is like. The worst of it though, they then are sending your clear text password across an unsecured network to your inbox.

Call me paranoid but it is things like this that lead me to use KeePass and have different passwords on every website. All too often there are news stories of some site being hacked and all their account compromised. All their users must then spend hours going around dozens of other sites and changing their passwords.

I remember a few years back, one website I had registered on (I forget now which one) would send out this weekly email. At the bottom they helpfully reminded me of my own username and password. I tried to get the site to stop emailing me, I didn’t want it anyway. There was no option to stop the emails. No option to even change my password. I don’t remember what I did in the end but that is probably one of the worst examples of security I have ever seen. It is any wonder that popular sites are being broken into on an increasingly regular basis?

Not only do I use different passwords, I also use different email address aliases on each website I register. Recently I started getting spam; I almost never get any spam. All of it was sent to the same alias, the one I used to register on Twitter. I don’t know how that address got out; presumably their site was comprised again. My email isn’t public or anything.

I don’t claim to be a security expert, far from it, but as a programmer who has written systems that require user authentication there are some basic things that need to be considered. First and foremost should be: don’t store passwords in plain text in your database. There are a lot of tell-tale signs that a website is doing this.

  1. They email you your password.
  2. Their “Forgot your password” wizard then just tells you what your password is rather than requiring a reset.
  3. They have a maximum length on their passwords. If they are hashing it, the length doesn’t matter. This makes me think that they have a varchar(20) in the database or something.

Granted, storing password hashes isn’t completely secure but it is a start. I know that there are hash databases now with trillions of hashes in, so salt your hashes and use a different salt for each user. This forces cracking them to be done for each account. In addition use SSL for the login page, this prevents the user’s password being transmitted in clear text to the server, in fact; use SSL for the whole site. The session cookie is just as vulnerable as the password.

In a world where cyber-crime is on the rise and there are open security standards that can protect us. Is there any excuse for making your site secure?

Photography

Last month I went on holiday to Mauritius and spent the week snapping away with my old point and click camera. When I get back, I’m going through the photos and they aren’t very good. So I start to read up on photography techniques to find out why. I read pages and pages on composition, exposure, lighting techniques… I then went out hiking in Cannock Chase and tried to apply everything I had just learned. It didn’t really work, the camera didn’t let me change half the settings. My picture were a little and I did enjoy doing it so I treated myself to new DSLR. I’ve had one on my Amazon wishlist for years, I did a little research and settled on a Canon EOS 1100D. Not the one that was on my wishlist but that was a lot more expensive and now quite an old model.

I bought it with the 18-55m F3.5-5.2 lens kit and currently that is the only lens that I have. The camera arrived on Monday and I’ve been snapping away a lot of the week. I have had the opportunity to go back out hiking again, hoping to do that tomorrow if the weather is good.

One thing I have been doing all week though is astrophotography. I tried this a couple of years ago with my old camera but didn’t get very good results. I’ve been out twice this week to snap pictures of the sky and have been really enjoy it. It is a very complicated and has a steep learning curve, I still have a long way to go but I think I’ll get there. I would like to buy a t-ring adapter so I can attach it to my telescope and then get a tracking motor for long exposures.

Lyra 07/09/2013
Lyra 07/09/2013

This was taken last night. It is my best so far but as I type, I am re-stacking the image with Deep Sky Stacker to try and get a better result. The photo is taken at 18mm F3.5 and is made of 50 12 second exposures plus 30 or so dark, flat, flat dark and bias frames each. If anything better comes out I’ll post that as well. All my night sky pictures are in my Flickr account here.