This is the second post in a series on common Go footguns. This post demonstrates the performance impacts of appending a known number of elements to a slice without doing any pre-allocation. This is one of the most common and easily fixed performance issues I see in pull requests on a regular basis.

Scenario

Consider the relatively common scenario of converting one iterable type into a slice. A simple example is to collect a slice of all keys in a map.

func Keys(in map[string]string) []string {
    out := []string{}
    for key := range in {
        out = append(out, key)
    }
    return out
}

The Keys function correctly creates a slice of all keys in the map, but is surprisingly slow.

Initial benchmarks

func BenchmarkKeys(b *testing.B) {
    const size int = 1000000
    bigMap := createBigMap(size)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        got := Keys(bigMap)
        if len(got) != size {
            b.Fail()
        }
    }
}

func createBigMap(size int) map[string]string {
    out := make(map[string]string, size)
    for i := 0; i < size; i++ {
        out[fmt.Sprintf("key%d", i)] = fmt.Sprintf("value%d", i)
    }
    return out
}

The results on my machine, notice the 38 allocations/operation here.

cpu: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
BenchmarkKeys-12              19      58427273 ns/op    88017264 B/op          38 allocs/op
PASS

The adjustment

If we simply pre-allocate our slice’s maximum size, we’ll see a significant performance improvement.

func Keys(in map[string]string) []string {
    // create a slice of length 0 with the underlying array 
    // having an initial max size of len(in)
    out := make([]string, 0, len(in)) 
    for key := range in {
        out = append(out, key)
    }
    return out
}

Re-running the benchmarks, we see a roughly 70% performance improvement with only a single allocation per operation.

cpu: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz
BenchmarkKeys-12              93      17152376 ns/op    16007169 B/op           1 allocs/op
PASS

Why is it like this?

Slices are really just dynamic arrays. Under the hood slices are backed by simple arrays. Slices have a length and a capacity. The length is the number of elements a slice contains. The capacity is the maximum number of elements the slice can contain before needing to be resized. You can check the capacity of a slice with cap(sliceVarHere). If the current maximum capacity of a slice is N elements, and you try to append N+1 elements then the entire backing array must be replaced.

For example, let’s say you start with the following maximally populated array:

[val1][val2][val3]

This example array has a capacity of 3, and a length of 3.

Appending to this array requires a larger array. This means we must first allocate a new array, copy all of the values from the first array, then we can add the new item to the first open spot. A typical approach to implementing a dynamic array is to double the maximum size of the array every time we run out of space.

New Array:

[][][][][][]

With old array copied over:

[val1][val2][val3][][][]

With the appended value:

[val1][val2][val3][val4][][]

After appending, the final dynamic array has a capacity of 6 and a length of 4.

You can clearly see that not pre-allocating your slice is far more expensive than some would expect.