loyalty.dev

Optimising Code for Large Batch Runs

Our program enables the exchange of loyalty points, and one of the methods we utilise is SFTP. In order to accommodate a new SFTP partner with a higher exchange volume than usual, we conducted load testing and optimisation on our system.

However, during the initial attempt to generate a points transfer file consisting of approximately 80,000 rows and upload it to a SFTP server, our program failed to complete the process. This highlighted the need for code optimisation to ensure our program's efficient handling of the substantial volume of loyalty points.

The primary objectives of optimisation are to enhance the program's performance by speeding up execution time, minimising memory usage, and promoting efficient resource consumption.

How to identify?

Identifying which areas require optimisation is essential before starting the optimisation process. To do this, we used three tools.

Application performance monitoring

  • Appsignal
    • Real-time APM provider that offers insights into errors and performance issues, plus host monitoring and and custom metrics platform.
    • We can leverage container host metrics, such as CPU usage, load average, memory usage, disk usage, disk I/O, and network traffic.
    • Additionally, we can assess the performance of actions using Performance > Issue List, which provides an event timeline with allocations and duration.

Metrics visualisation platform

  • Grafana & Prometheus
    • Data visualisation platform aids users to understand and interpret their data via charts and graphs that are consolidate into one dashboard.
    • It can also monitor a wide range of other system and application metrics, such as memory usage, network traffic, disk usage, and response times.

RubyProf

  • Profiler that helps track down slow or memory-intensive methods.
  • It allows us to determine where the application spent its time and which functions called other functions during execution.
  • We primarily use the call stack report and this is how to interpret it:

    e.g.: 33.41% (99.99%) Hanami::Interactor::Interface#call [1 calls, 50003 total]

    • 33.41% - This is the percentage of time spent in the method Hanami::Interactor::Interface#call out of the total time spent in the entire program.
    • (99.99%) - This is the percentage of time spent in the method Hanami::Interactor::Interface#call out of the total time spent in the method and its child methods.
    • Hanami::Interactor::Interface#call - This is the name of the method being profiled.
    • 1 calls - This is the number of times the method was called during the profiling period.
    • 50003 total - This is the total amount of time (in microseconds or seconds) spent in the method, including any time spent in child methods. It is important to note that this is not the same as the time spent exclusively in this method, as it includes the time spent in child methods that were called by this method.

Optimisation 1 - algorithm/ application complexity

By using RubyProf profiler we were able to detect some significant O(N) and O(N^2) that we can optimise. The reduction in operations required to complete the program's functionality can lead to a significant improvement in the algorithm's time complexity, resulting in a faster and more efficient program.

O(N) → O(1): Eliminating repeatable computations

Code example of O(N):

def check_delivery_availability?
  @delivery_options.settings.service_available? &&
  @delivery_options.settings.supported_providers.include?(@provider_code)
end

The number of times check_delivery_availability? method is called is directly proportional to the number of rows in the file, with 80,000 rows we can see that this method is called 80,000 times. However, since this method retrieves the same settings from cache for all rows, it is unnecessary to call it repeatedly.

We utilised Appsignal to gain insights into errors and performance issues. Through the event timeline analysis, we discovered that our program was spending around 49 seconds accessing Redis, an in-memory data structure store. However, we only needed to access it once for the check_delivery_availability? method.

This inefficiency was resolved by optimising our program to minimise unnecessary access to Redis. After memoising the method, we are able to optimise O(N) to O(1) and speeding up execution time.

Code example of O(1):

def check_delivery_availability?
  @check_delivery_availability ||= 
    @delivery_options.settings.service_available? && 
    @delivery_options.settings.supported_providers.include?(@provider_code)
end

By ensuring that this method has a constant time complexity (O(1)), we can significantly improve the program's performance.

O(N^2) → O(N): Avoid O(N^2) with better data structure

Code example of O(N^2):

def prepare_batch_updates(batch_id, batch_updates)
  batch = repository.find_batch_with_items(batch_id).first

  batch.items.inject({}) do |result, item| # <-- first loop
    update = batch_updates.detect { |u| u[:id] == item.id } # <-- second loop

    next result unless update

    result.merge!(
      item.id => {
        status: update[:status],
        response: update[:response]
      }
    )
  end
end

Code above showed us an example of nested loops that iterate over collection of elements. Detect and reject both belong to the Enumerable module, which is used to work with collections of objects.

Code example of O(N):

def prepare_batch_updates(batch_id, batch_updates)
  batch = repository.find_batch_with_items(batch_id).first

  updates_hash = batch_updates.to_h { [_1[:id], _1] }

  batch.items.inject({}) do |result, item| 
        update = updates_hash[item.id] # <-- use of hash table

    next result unless update

    result.merge!(
      item.id => {
        status: update[:status],
        response: update[:response]
      }
    )
  end
end

After we have removed detect method and use a hash table instead, we managed to optimise from O(n^2) to O(N).

Optimisation 2 - memory allocations

In an attempt to be more “functional” in how we processed data, we decided in the early days of our data services that each content input and output is immutable and cannot be changed once created. This meant that each new service in a workflow would instantiate new instances of the data record to work on. This ensures that functions and operations are free from side effects impacting each other.

However, creating new copies of data collectively makes the workflow inefficient in terms of performance. Our next optimisation here is to minimise unnecessary data copies and memory allocations hence.

Minimising unnecessary copies of data

The map method in Ruby is useful for transforming the elements of an array using a block. It is a non-destructive method that creates a new array containing the transformed elements, without modifying the original array. Essentially, the map method applies the block to each element of the original array and returns a new array with the results.

def initialize(batch)
  @batch = batch
  @batch_with_items = batch.items.map do |item|
    Entities::BatchWithItems.new item
  end
end

In our case, the batch.items array has 80,000 elements, and using map to transform the elements creates another copy of the array that we don't actually need. This can slow down our code and consume unnecessary memory.

However, we can optimise the code above by using map! instead. map! is a Ruby method that modifies the original array in place without creating a new copy. By using map!, we can avoid the unnecessary creation of a new array and reduce memory usage, resulting in more efficient code.

Before using the map, collect, and select methods in Ruby, it's important to consider whether it's acceptable to modify the original data without side effects. If you determine that modification is permissible, it's best to use the corresponding bang methods: map!, collect!, and select!. By doing so, you can optimise your program and minimising unnecessary creation of new copies of the original array.

Proper use of transaction callbacks

Transactions are protective blocks where SQL statements are only permanent if they can all succeed as one atomic action. Transactions enforce the integrity of the database and guard the data against program errors or database break-downs. Basically, we should use transaction blocks whenever we have a number of statements that must be executed together, or not at all.

In our particular scenario, we generate numerous new Ruby objects within a transaction block, which can result in the exhaustion of available memory. As illustrated in the code snippet below, no database statements were included in the transaction block.

def create_with_items(batch, items)
  transaction do
    batch = create(batch)
    items = create(items)

    Batch.new find_with_items(batch.id, items.collect(&:id))
  end
end

When allocating a significant number of objects within a transaction, transaction callbacks can have an adverse effect on memory usage. This is because references to these objects are held until the transaction is either committed or rolled back, making it impossible for Ruby to garbage collect them beforehand.

In addition to negatively impacting memory usage, transaction callbacks also tie up one of the database connections until all the code in the transaction block finishes executing. This can be problematic if the block contains a slow process, such as an API call, as it can tie up a database connection for an extended period.

By understanding the potential impact of transaction callbacks on memory usage and database connections, developers can take steps to optimise their code and prevent performance issues.

To wrap it all up like a burrito

In conclusion, optimising code plays a crucial role in enhancing software performance. To begin with, it is important to identify areas that require optimisation. We utilised various tools, such as application performance monitoring, data visualisation platforms, and profilers like RubyProf, to aid in this process.

Next, we focused on optimising algorithmic or application complexity. By eliminating redundant computations and avoiding O(N^2) scenarios through the use of more efficient data structures, we achieved significant improvements in the program's time complexity. This, in turn, led to faster execution times and more efficient resource utilisation.

Lastly, achieving a balance between “functional” programming paradigm and memory allocation is vital. We accomplished this by minimising unnecessary data copies and ensuring the proper use of transaction callbacks.

Overall, by following these steps and strategies, we were able to successfully enhance the performance of the software.