Fun with ArrayMaps (Android Performance Patterns Season 3 ep1)


[MUSIC PLAYING] Optimizing your Android
application performance has a lot to do with how
you’re managing your memory. And more often than
not, issues here can come from using collections
that may not be frugal when it comes to this resource. My name is Colt McAnlis. And to help you write more
performant applications, the Android system
provides a set of collections built especially
for mobile development. Consider the commonly
used hashmap object. Totally useful from
a language perspective, but also a complete memory hog. See, a typical hashmap system
works a little something like this. You take in a key object
and apply a hash to it, which gives you an
index into a large array. At that index location,
you place the value object, which means the biggest problem
you have to design around is collisions, or rather, when multiple keys hash
to the same location in that array. Now, small arrays mean that you’re going to end up
with a lot more hash collisions. I mean, mapping 20,000
objects to 10 locations is, by definition, going
to get a little bit messy. As such, most hashmaps end up
allocating a large array in order to reduce the
potential for collisions, and then go about adding
some other crazy logic in case those things do happen,
like chaining algorithms and whatnot. So obviously, this
whole large array that’s only sparsley
populated thing isn’t really ideal from the perspective
of a memory-minimal device, which is why the Android Runtime
provides an alternate container class, which is more memory efficient,
ArrayMap. ArrayMap provides the identical
functionality as a hashmap, but avoids all of its crazy overhead by using two small arrays
instead of one large one. The first array contains the hashes
of the given keys in sorted order. The second array stores
the key and value objects that have been inserted into the
collection, interwoven together according to the ordering
of the sorted key array. When you want to fetch a value,
we create a hash for the key and then binary search
the hash array to find its index. We can use that
index directly then to find the location in the key
value pair in the interwoven array. Now, if the key in the second array
isn’t equal to the one that we submitted when we
were searching for things, then we assume that
there’s been a collision. To resolve this, we then linearly
walk the keys in both directions, trying to find the correct match. These two things together mean that as the number of objects
grows in our container, so does the time required
to access a single object. Basically, you’re trading off
smaller memory overhead for more expensive
runtime access. Now, since these arrays
are contiguous in memory, there’s a few things to keep in mind
with respect to their usage. Dominantly is understanding how deleting
and adding to the container works. Deleting elements
can fall into two paths. Either you get lucky|
and only need a compaction step, which means shifting the
deleted items to the end and then all the
other values forward. Or you can end up in the slowest path which actually requires a resize
and copy of the elements to eliminate the value in question. Insertions work
on the other side of this coin. If the array has been compacted,
then we can reuse those allocated blocks, and then just shift around everything
to keep our sorted order. However, the slow path here does require a complete resize
of the contiguous array in order to make space… plus copying and moving everything,
on top of that. In general, this means that
insertions and deletions from array maps are going to
cost a little bit more from a performance perspective. But if you keep the number
of objects in it small, like in hundreds of items, then this really
isn’t anything to worry about. See, these small contiguous arrays mean that when the number
of values is pretty low, you get a lot of savings
versus the standard hashmap. For empty maps, there’s no allocations
hanging around taking up space. And for small amounts of objects,
it’s pretty memory-optimal. Oh, and by the way, another great feature
of these containers is that you can iterate
over them using indexing. Compare that to the hashmap container, which requires you to use
the iterator pattern, which, of course,
is significantly slower and takes up more memory to do. But obviously, it’s not wise to use these optimized containers
in every case. But there are some perfect situations
that you should consider. Number one, situations where you have
a small number of items, in the hundreds or so,
with lots of accesses, or the insertions and deletions
are infrequent enough that the overhead of doing so
isn’t really noticed. Number two, situations
where we have containers of maps, like maps of maps, where the submaps tend to have
a low number of items and you’re often
iterating over them a lot of time. If your use case doesn’t
fall into either of those two buckets, then it might be best to stick
with the hashmap class which actually highlights
a very interesting point. Optimizing performance
is a constant trade-off between finding the right container
for the right usage pattern for the right memory pattern. And as someone said,
there is no silver bullet, which is why you need to check out the rest of the Android
Performance Patterns content, to get more information about
these types of trade-offs. Oh, and don’t forget to join
the Google+ community as well to hear other war stories from developers who may be facing
similar situations. So keep calm, profile your code,
and always remember, perfmatters. [MUSIC PLAYING]

About the author

Comments

  1. No API to control the growth of arrays?
    and i guess it's susceptible to more slowing attacks, like making it grow while also making a specific hash collide more.

  2. After seeing this, I see basically no reason to use an ArrayMap.
    You say It's efficient for small containers but in that case I feel a normal ArrayList with a Pair would suffice. Searching such a one with binary or linear search is basically equal in time for small containers.
    The beauty of a HashMap is that it has fast access speed, which this container trades away.

    I also don't think I agree that an iterator being "significantly" slower than a normal for-loop. It would be better to say x % faster/slower (if that is measurable), but with a warmed up JIT I'm not even sure there is notable a difference. But sure there is an ever so slight memory overhead, but thinking about that one is over-optimizing in every situation if you ask me.

    A for each loop has added benefits which is worth mentioning as well. You are sure yo always get an element from the collection and you are also sure not to go out of bounds. 

    Fun with new episodes though 🙂 I'm off seeing the next one later tonight!

Leave a Reply

Your email address will not be published. Required fields are marked *