Performance Improvements

2020-10-31

This post will describe a performance guide I presented to my team last week. We encountered some issues with one of our services due to configuration change. The change quadrupled the average processing time for a single message. To be more precise, the change happened in the data mapping DSL, which helps analysts describe how input data is mapped to output data. Read more about it here here. I took this week to make the service run at a reasonable speed again. Below you’ll find the approach I take, when trying to improve application performance.

Disclaimer: There will be a lot of assumptions and the guide often talks about message processing. I understand that this doesn’t represent every application, but the core principles mentioned in the “Prerequisites” section still stand. I’m mostly using the term „application“ here, but I really mean any type of software, be it a backend service, frontend app or simple script.

Motivation

graph showing average provessing time per message

There is a wide range of reasons, why the performance of an application needs to be improved:

The application would be able to run more efficiently [1].
The application needs to run faster to be worth running at all, for it to be economical.
Performance is a competitive advantage [2].
The high standards of the developer aren’t met.
And many others: More stable, predictable runtime behaviour, increased future load, building a strong basis for future features.

Prerequisites

To actually make an impact on application performance, you’ll need to fulfil some requirements. There needs to be measurability. You need to be able to measure the metric you want to go down/up. You can record those by doing local benchmarks or by actually reporting metrics to a metric backend (Prometheus, CloudWatch or similar). Though the latter one might be to coarse.

The next thing you want is reproducibility, you’ll need to be able to reproduce your measured results. Your application should be able to process the same n messages in the same amount of time. Also replaying the same input data should work effortless. This can actually be pretty tough, if your application depends on an external system like a database, a message broker or another cloud-thingy; I’ll get to that down the line.

The final piece is profiling, which enables you to drill down on what your application spends its time on. It’s mandatory that you can investigate which components of your application you should improve: If your app is spending 80% of its CPU time on JSON parsing, you should probably focus on that area. Maybe switch to a faster parser or a different serialisation format like protobuf. Don’t go to work on those other 20%, your improvements won’t be very impactful. For profiling flame graphs are a good choice, for smaller scripts you might be able to instrument the code yourself.

To summarise: you want to measure, reproduce and profile your application’s runtime behaviour.

Approach

Set yourself a goal and a time frame. Both can be rough estimates, but you should stick to them nevertheless. Working on performance can be a tricky rabbit hole filled with micro optimisations and intensive swearing. Having a plan to stick to keeps you focused on the big wins. This can be as straightforward as: „This week I’m going to improve message throughput (a measurable metric!) by at least 10 percent points“. On the other hand you shouldn’t create unnecessary pressure for yourself, if the performance issue can be mitigated by a reasonable amount of additional hardware, go for it! The amount of hardware you get compared to a few hours of work is substantial. This just shouldn’t become your default solution.

It’s time to get your hands dirty, you should be able to start your application locally and put data into it. Preferably you want to cut out external system like remote databases. Can you extract your core business logic from the dirty gripes of remote data fetching? Great, setup up a standalone version of your app, which can read input from file or a local database. Otherwise do your best to at least make your input data deterministic. Work with a frozen database table or re-read that Kafka topic from a known point. If you are having a hard time with that, it might be wise to spend some time upfront and work on better a separation of concerns within your application. Knowledge about where your data lives and gets send off to shouldn’t leak into the business side of things, push those concerns onto the edges of your application. The next item on the list is to establish a baseline measurement, something you can later use to make a comparison. I’m a fan of documenting those measurements in a spreadsheet. It helps with calculating differences and allows you to document future changes.

Now you can finally get into things, run the benchmark/standalone version and attach the profiler. You probably already have some suspicions about what might slow the application down, confirm those with actual numbers. There might also be a lot of low-hanging fruits, which will only take a few minutes to fix, but in total will already make a noticeable impact. If you find other performance sins, document them and prioritise them according to their predicted impact and effort of the fix. Now just rinse and repeat: Fix an issue, validate the improvement, profile and start over.

Typical Pitfalls

Some common things you might encounter when trying to improve application performance:

Actually static values which will be recalculated very very often and are surprisingly expensive to do so. I found a regular expression which was recompiled from a string over 500 times per input messages. The source string was known on application start-up.
A collection of things is calculated, but most of the results are discarded later on. If your iterate over a collection to calculate something for each item and later on do a collection.find(...) call, you might want to use a lazy collection.
If a single item in a collection is always searched for by a given key: Use a data structure with O(1) access like a hash map instead of a sequence. Generally speaking: use the correct data structure for the job at hand. It might cost you to convert a JSON array into a map, but that cost will amortize itself surprisingly fast.
The Complete Rewrite™ is probably not the solution to your performance issues.

Conclusion

Performance work is fun and rewarding. Though there is a hurdle to establish a reproducible benchmark you can run locally, this benchmark will pay off. It is also a useful thing to have for reproducing some of those nastier bugs. Improving the core performance of your application will offer you deep insights into your programming language of choice and make you more proficient with it. Performance work isn’t a one-off task you do once, you’ll need to continually monitor and measure your application as it will change over time.

Notes

[1] efficient meaning less resource consumption (CPU, Memory, Network etc.) for the same result. Also: Same resource consumption for a better result, for example more frames per second on an UI app.

[2] Speed still matters, How One Second Could Cost Amazon $1.6 Billion In Sales, Amazon Found Every 100ms of Latency Cost them 1% in Sales

← Other posts