Last night, I optimized a module I wrote in golang and got more significant results than I expected, so I’m documenting them here.

  1. When declaring a slice, you can specify the initial capacity of its backing array via the cap argument. When the eventual size of the slice is known in advance, this can reduce array reallocations. I went through the code and added cap wherever I could, but profiling didn’t show any noticeable performance change.

  2. Out of OOP habit, I had some logic that could have been written as functions but had instead been written as structs. I changed them to functions to avoid unnecessary heap usage. Again, there was no noticeable difference.

  3. Reduced the number of memory allocations by using unsafe.Pointer when converting byte slice to string.

  4. I had code that cached frequently used entities on disk, but I changed it to an in-memory cache. The cached objects were small enough to justify keeping them in memory. That removed the need to serialize them every time and led to a significant reduction in CPU usage.

  5. I applied the automaxprocs library created by Uber.

  6. I replaced the rate limiting library I had been using. I had originally been using one created by Uber, but when I took a closer look at the algorithm, I realized it didn’t match our requirements. After replacing it, we saw a significant increase in throughput.

  7. I tried sync.Pool on some frequently created structs, but it didn’t seem to improve performance as much as I thought it would, and I could see where it could cause problems if handled incorrectly in the future, so I reverted back.