How We Improved Our Android Apps Performance by Up to 24%

Performance Engineering can be as complex as cooking. The difference is that cooking almost always has a recipe to follow.

Victor Oliveira
Mercado Libre Tech

--

Read this story in portuguese.

Intro

Have you ever tried to cook? Le Cordon Bleu as a culinary arts school was founded in Paris (1895) by the journalist and publisher of La Cuisinière Cordon Bleu magazine, Marthe Distel. This culinary school is considered one of the best in the world. Even though many want to graduate there to master extremely difficult techniques, become great Chefs and make extraordinary achievements, most of them probably started small. For example, on a daily basis at home, they might have cooked at least once without following recipes. When developing a love for the culinary practice, they realize that there are many complex dishes that can’t be prepared without the commitment to follow rigorous steps, from the oven’s temperature to ingredients precisely weighted on a scale. Cooking then can be as hard as Performance Engineering. This means we may stick to general guidelines, but there are no specific recipes to follow.

Context

Here at Mercado Libre, performance really matters so we’re always trying to improve it. Over the past few months, part of our Performance team has focused efforts on reducing our Fintech (aka Mercado Pago) and Marketplace (aka Mercado Libre) apps startup time. During this journey, we’ve faced many challenges and that’s what I would like to share with you today.

When talking about performance issues, first off we need to have metrics to understand what the current situation is. Unfortunately, by the time the article was written, we weren’t on Android Gradle Plugin 7 so we could not use Google’s Macrobenchmark to triage performance issues. Still, our startup demand couldn’t wait and we had to improvise. Macrobenchmark was just one among other tools we had to evaluate in order to understand what was feasible for us to achieve our goal. Baseline Profiles for example, which claims to improve startup time by up to 40%, was also not an option since Macrobenchmark is required for its implementation. For us, analyzing performance had to mean fully understanding what our startup was doing under the hood. Android Studio Profiler & Perfetto helped us with this evaluation.

Issues

MercadoLibre Inc. is known for making e-commerce more democratic in Latin America. According to the Google Play Console, by the time this article was written, around 19% of our users did not have high-end Android devices but equally expected to enjoy a faster startup. Therefore, our main criteria throughout this journey has been to benchmark the lowest Android API possible. The main goal was to reduce cold startup in these devices, assuming that our algorithm refactoring would positively impact users by empowering their devices.

From an implementation perspective, we analyzed what could make sense to develop taking advantage of the greatest cost-benefit equation between performance boost and Time To Market (TTM), then we took action over five topics:

  1. Excessive iterations over lists
  2. Non-critical configurations at startup
  3. Too many coroutine scope launches
  4. Little usage of cached threads
  5. Too early startup tracking

Excessive iterations over lists

Big O Notation is quite known for assessing the performance of algorithms. This notation makes it possible for us to state that, in a simple manner, we have a linear startup performance or, in other words, an O(n) algorithm. Our apps startup time is directly proportional to the number of setups that we need to do before enabling users’ input.

The major difference with our previous algorithm is that we no longer stack many tasks before executing them. This used to be required so that they could be sorted to avoid crashes at app startup since some of them depend on others’ state for proper execution.

/**
*
* Old algorithm example of a high-order function
* being stored for later execution.
*
**/
class ConfiguratorManager {

internal val configurables = mutableListOf<ConfiguratorData<*>>()

infix fun <T : Configurable> ConfiguratorManager.configure(
configurable: () -> T
) = ConfiguratorDefaultData(configurable) .also {
configurables += it
}
}

Non-critical configurations at startup

There is normally no need to subclass an Application. When required, the implementation should be as quick as possible. Still, with a large code base and countless teams building user experiences on the same app, it’s pretty common to lose this essence. Since our Splash Screen is currently not optimized, we’re eventually forced to block the main thread execution in order to be sure that any essential setup is made before launching the app.

It’s in our roadmap to take full advantage of the Splash Screen. With that in mind, to help in a future migration while boosting the app’s performance immediately, we took off several non-critical feature configurations from being executed on the main thread. We even managed to adjust some of them so that they could be dynamically delivered.

Too many coroutine scope launches

Coroutine is a powerful concurrency design pattern used to simplify code that executes asynchronously. Every background task processed used to be done by creating a new Coroutine scope and launching it immediately. Although the parallel execution of startup tasks is generally considered a good practice, we realized that in our scenario it was best to decrease the usage of this practice.

Coroutines creation here means declaring and initializing it. During our Proof of Concept, the cost of this operation was mostly irrelevant. On the other hand, launching coroutine scopes might be a very expensive task for an app startup. In our scenario, we observed that creating and launching a single Coroutine scope — thus avoiding parallelism and only applying an asynchronous execution — could make this code stretch execution up to 9% faster.

/**
*
* Old algorithm responsible for creating
* a parallel execution.
*
**/
internal class ConfiguratorEnqueueImpl(
private val coroutineScope: ConfiguratorCoroutineScope
) : ConfiguratorEnqueue {

private val scope: CoroutineScope by lazy {
coroutineScope.newScope()
}

private val jobs = mutableListOf<Job>()

override fun launch(
dispatcher: CoroutineDispatcher,
block: suspend CoroutineScope.() -> Unit
) {
scope.launch(dispatcher, block = block).also { jobs += it }
}

override suspend fun enqueue() { jobs.joinAll() }

}

Little usage of cached threads

Perfetto UI is an outstanding performance tool frequently presented by Google Engineers to benchmark Android Applications. It can also be used through a command line. A trace sample in Perfetto displays information regarding kernel, memory usage and many different services running on a device. While benchmarking cold startups, we started to wonder how we could prevent the other apps in the background from being allocated on a core during our setup process.

Coroutines context includes a CoroutineDispatcher to specify what thread(s) the corresponding coroutine will use for its execution. Whenever processing an Asynchronous setup, Mercado Pago and Mercado Libre now make use of a custom implementation that through an ExecutorService caches threads within a pool. These threads are set as a maximum priority in such a way that the device can accomplish anything related to our startup process as fast as possible.

Too early startup tracking

As stated previously, our startup has more tasks than what we originally wanted. Some of these are executed in the background. Even though asynchronously processed, a task needs a core to be allocated.

This initial context is important to understand why we postponed our startup tracks export. We used to process tracks while executing tasks; we now enqueue events in a buffer only to export them later, making sure there’s no waste of resources. Nothing other than critical startup tasks are processed, whether in the main thread or not. Five seconds is the period we choose to postpone the exportation since that represents the vast majority (p90) of our users’ startup time according to the Google Play Console.

Results

The results collected by Mercado Pago & Mercado Libre were somewhat similar, although they scaled in a different way. The first app had a better performance boost than the second. This could be attributed to the difference in tasks that one has over the other, not only in a quantitative way — it’s important to keep in mind that their implementations might totally diverge. For a better picture of what’s meant here: Mercado Pago & Mercado Libre rely on setting the Notifications module as a core task at the startup. Still, Mercado Libre’s algorithm is not the same as Mercado Pago’s, and it loads faster.

In an attempt to match our Install Base, benchmarks hereby presented were done with two completely different chipsets running Android APIs 24 & 29. As of the previous improvements presented regarding our new algorithm, — which represents a fraction of the entire startup duration — Mercado Pago has shown a performance boost range from 23% to 58%, while Mercado Libre remained at 17% in both APIs. Overall cold startup time benchmark at Mercado Libre has shown an improvement of up to 5%, though Mercado Pago has improved close to 24%.

Devices comparison and their maximum benchmark performance improvement regarding the overall startup time

Production samples have followed what we had internally proven. Compared to three months earlier, Mercado Pago p90 cold startup time improved by up to 24% according to Firebase. In the meantime, Mercado Libre was enhanced by up to 16%.

Mercado Libre and Mercado Pago startup time presented by Firebase in a comparison with the last 3 months

Non-compliant cold startup sessions at Mercado Pago were reduced by half on a seven days period comparison. The 30-day metric decreased by 29%. Mercado Libre also minimized startup time by approximately 7% and 6% respectively.

Percentage of sessions in dissonance with optimal startup time according to the Play Store

Converting these metrics to quantities, Mercado Pago now has more than 1.5M users per day starting up our apps faster than 5 seconds. The same applies to Mercado Libre, in fact reaching 500K enhanced initializations.

Performance improvement achieved in Mercado Libre and Mercado Pago

Conclusion

We’re aware that there’s still plenty to be done in our apps, especially when it comes to startup time. From removing any unnecessary implementations to optimizing the splash screen, we still have margin for enhancements. They’re surely harder to achieve since they require speculation of what hinders the app, full comprehension of algorithms’ execution and creativity to get around these issues.

We must also say that we’re very excited to see how Baseline Profiles can further improve our initialization performance because whether taking advantage of Google’s libraries or moving on with our own ideas, what’s most important for us is to deliver the best User Experience to our customers.

And you? What’s your personal or professional performance story? Tell us more about your experiences in the comments below and stay tuned to learn more about how we keep improving the performance of our mobile apps!

Medium References

  1. Le Cordon Bleu Story
  2. Top 5 Culinary Schools In The World
  3. Android Vitals — Profiling App Startup
  4. App Startup Time
  5. Time To Market
  6. Ace Your Coding Interview By Understanding Big O Notation

--

--