[Translation] Choosing the right data structure in Swift

Hello again. Before you leave for the weekend I want to share with you the translation of material which was prepared especially for the basic course “iOS developer”.


To decide which data structure to use to represent a given set of values, it is often much harder than it looks. Because each type of data structures optimized for a certain number of use cases, selection of the correct matching for each set of data can often have a big impact on the efficiency of our code.

Swift standard library comes with three basic data structures — Array, Dictionary and Set, each of which has its own set of optimizations, the pros and cons. Let’s look at some of their characteristics, and also cases where we may need to go beyond the standard library to find the right data structure for our purposes.

The linearity of the arrays

Array (array) is one of the most frequently used data structures in Swift, and for good reason. It stores its elements sequentially, they easily move in a predictable way, and it can store any value from entities to instances of classes and other collections.

For example, here we use an array to store a collection of shapes placed on the Canvas in a drawing application. Then when you need to visualize a canvas picture, we just go through the array to draw each item using DrawingContext as follows:

struct Canvas { var shapes: [Shape] func render() -> Image { let context = DrawingContext()
shapes.forEach(context.draw) return context.makeImage()
}
}

When it comes to plotting all of our figures, as we did above, the array is ideal. The arrays not only store their items in a very effective way, they also have a guaranteed order of enumeration, that provides predictable procedure for rendering without having to perform any additional work.

However, like all other data structures, arrays have their drawbacks. In our case we will face one of its drawbacks when you want to remove the figure from the canvas. Because the array elements are stored according to the index, we always need to first find what index is this figure before we can delete it:

extension Canvas { mutating func remove(_ shape: Shape) { guard let index = shapes.firstIndex(of: shape) else {
return
} shapes.remove(at: index)
}
}

At first, the code above may seem not so difficult, but it can become a bottleneck in terms of performance for any canvas that contains a large number of figures, as firstIndex is linear (O(N)) from the point of view of time complexity.

Although we can circumvent this limitation where we use our Canvas HTML element. For example, always referring to the figures by index, not by value or ID that would make our code more complex and more fragile, because we always would have to be sure that our indexes will not expire when the canvas with which we work will change.

Speed sets

Instead, let’s see if we can optimize itself Canvasby changing its underlying data structure. Considering the above-mentioned problem, one of our original ideas could be to use Set (set) instead of Array. As we have already discussed in the article The power of sets in Swift (the Power sets in the Swift), one of the significant advantages of sets over arrays is that inserting and deleting can always be done in constant (O(1)) time, since elements are stored by hash value and not by index.

Updating the Canvas so that it uses the sets, we get the following:

struct Canvas { var shapes: Set<Shape> func render() -> Image { let context = DrawingContext()
shapes.forEach(context.draw) return context.makeImage()
} mutating func remove(_ shape: Shape) {
shapes.remove(shape)
}
}

Again, the above code may look correct and it even compiles without problems. However, although we solved the problem of removal, we also lost our stable order of drawing — because, unlike arrays, many do not give us guaranteed order of enumeration — that is a stumbling block in this situation, since it seems that we are going to draw custom shapes in a random order.

The index of indexes

Let’s continue to experiment. Now let’s see if we can optimize the Canvasby typing the Dictionary (vocabulary) that allows us to search the index of any shape based on its ID. We’ll start with the fact that change the level of access to our array of shapes to privateto be able to control the insertion of an element using the new method add. And every time you add a new figure, we also add its index in our dictionary:

struct Canvas { private var shapes = [Shape]() private var indexes = [Shape.ID : Int]() func render() -> Image { let context = DrawingContext()
shapes.forEach(context.draw) return context.makeImage()
} mutating func add(_ shape: Shape) { let index = shapes.count indexes[shape.id] = index
shapes.append(shape)
}
}

Since we now know what the index is stored in this figure, we can quickly perform a deletion in constant time, as using the set:

extension Canvas { mutating func remove(_ shape: Shape) { guard let index = indexes[shape.id] else {
return
} shapes.remove(at: index) indexes[shape.id] = nil
}
}

However, in our new implementation Canvas is a pretty serious bug. Every time we remove a piece, we actually cancel all the indexes, which is higher than the one we just removed — since each of these elements will move one step to the beginning of the array. We could solve this problem by adjusting those indexes after every deletion, but that again would return us to the territory of the O(N), which we tried to avoid from the beginning.

Our last implementation has its advantages. In General, the use of a combination of two data structures can be a great idea in these situations because we can often use the strengths of one data structure to offset the disadvantages of another and Vice versa.

So, let’s try again with another combination, but this time start with consideration of our real needs:

  • We need to insert and remove had a constant time complexity, and should be able to remove a piece without knowing its underlying index.
  • We need a guaranteed order of enumeration, to be able to maintain a stable layer order.

Looking at the above requirements, it turns out that although we need a stable order too much, we really don’t need indexes. This would make a linked list ideally suited for our use case.

Linked lists consists of nodes, where each node contains a reference (or link) to the next node in the list, which means that it is possible to sort in a predictable manner — without the need for any updates to the index when deleting the element. However, the standard library for Swift (yet) does not contain the type linked list, so if we want to use it, you first need to create.

Create a linked list

Let’s start with the structure Declaration List (list) that will keep track of first and last nodes in our list. We will do both of these properties are read-only outside of our type, to ensure data consistency:

struct List<Value> { private(set) var firstNode: Node? private(set) var lastNode: Node?
}

Next, let’s create our type of the Node (node), which we will do a class, because we want to be able to refer to nodes by reference and not by value. Our list will dvuhsvetny, which means that each node will contain a link to its next neighbor, and on the previous. Each node will also store the value like so:

extension List { class Node { var value: Value fileprivate(set) weak var previous: Node? fileprivate(set) var next: Node? init(value: Value) { self.value = value
}
}
}

The reason why the previous property we make weak, is to avoid retain cycles, which would appear if we had maintained strong links in both directions. To learn more about how to avoid retain cycles, see

“Memory Management” (memory Management).

It’s actually all the code that we need in order for our linked list could be stored values. But this is only the first part of the puzzle, as in any other collection, we also want to be able to touch it and change its contents. Let’s start with the iterations, due to the very Protocol-oriented design in Swift, it’s easy to implement, ensuring compliance to Protocol Sequence and by implementing the method makeIterator:

extension List: Sequence { func makeIterator() -> AnyIterator<Value> { var node = firstNode return AnyIterator { // Loops through all nodes, consistently moving to the next, and extracts the value: let value = node?.value node = node?.next return value
}
}
}

Since the above iteration is very simple, we use AnyIterator standard library, to avoid having to implement a custom iterator type. For more complex scenarios it can be implemented by adding a line with IteratorProtocol.

Next, let’s add API to modify our linked list, starting with the inserts. We will expand the List using the method append, which adds a new node for the inserted values, and then returns this node like so:

extension List {
@discardableResult mutating func append(_ value: Value) -> Node { let node = Node(value: value) node.previous = lastNode lastNode?.next = node lastNode = node if firstNode == nil { firstNode = node
} return node
}
}

Above we use the attribute @discardableResult, which tells the compiler not to generate any warnings if the result of calling our method was not used because we are not always interested in the node that was created.

Since linked lists are not based on indexes, and on the maintenance of the chain of values by reference, implementation of the deletions is just a matter of updating the next and previous neighbors of the deleted nodes, so instead they pointed to each other:

extension List { mutating func remove(_ node: node) { node.previous?.next = node.next node.next?.previous = node.previous // Using the "triple parity", we can compare two instances of class identity, not by value: if firstNode === node { firstNode = node.next
} if lastNode === node { lastNode = node.previous
}
}
}

Based on the foregoing, the initial version of our List complete and we are ready to try it out. Let’s update the Canvas to use our new list as well as a dictionary that allows us to quickly search for which node corresponds to the ID of the figure:

struct Canvas { private var shapes = List<Shape>() private var nodes = [Shape.ID : List<Shape>.Node]() func render() -> Image { let context = DrawingContext()
shapes.forEach(context.draw) return context.makeImage()
} mutating func add(_ shape: Shape) { nodes[shape.id] = shapes.append(shape)
} mutating func remove(_ shape: Shape) { guard let node = nodes.removeValue(forKey: shape.id) else {
return
} shapes.remove(node)
}
}

Now we have fast insertion and deletion, and predictable search order, without the need to add additional complexity to the process of challenge is pretty awesome! And, as our new List is fully generic type, we can now use it whenever we again need to store values without an index in a linear manner.

Conclusion

Despite the fact that data structures are so fundamental that they can be found in all types of programming languages, the decision about which one to use in each situation might require a significant amount of thought, testing and experimentation, especially if we want our code to be efficient with the growth of the dataset.

It’s also very likely that a suitable data structure for each particular situation may eventually change as our requirements change, and sometimes the use of a combination of several data structures, not just one, may be the way to achieve the desired performance characteristics.

We will continue to explore the world of data structures in the following articles, paying special attention to those that are not yet implemented in the standard library. As in many other things, sometimes we need to expand our thinking beyond Swift to choose the right data structure for each situation.

You can find me on Twitter or email me if you have any questions, comments, or feedback.

Thank you for your attention!

Source

Leave a Reply

Your email address will not be published. Required fields are marked *