Troubleshooting multi-threading problems related to Android’s onResume()

This post is a continuation of a previous post I wrote about best practices for using onResume(). I found a particularly testy bug that caused me 2 hours of pain time to track down. The tricky part was it would only show up when there was no debugger attached. Right away this told me it was a threading problem. I suspected that the debugger slowed things down just enough that all the threads could complete in the expected order, but not the actual order that occurred when running the device in stand-alone mode.

The test case. This is actually a very common workflow, and perhaps so common that we just don’t think about it much:

  • Cold start the application without a debugger attached. By cold start I mean that the app was in a completely stopped, non-cached state.
  • Minimize the app like you are going to do some other task.
  • Open the app again to ensure that onResume() gets called.

Now, fortunately I already had good error handling built-in. I kept seeing in logcat and a toast message that a java.lang.NullPointerException was occuring. What happened next was troubleshooting a multi-threaded app without the benefit of a debugger. Not fun. I knew I had to do it because of the visibility of the use case. I couldn’t let this one go.

How to narrow down the problem. The pattern I used to hunt down the bug was to wrap each line of code or code block with Log messages like this.

Log.d("Test","Test1");
setLocationListener(true, true);
Log.d("Test","Test2");

Then I used the following methodology starting inside the method were the NullPointerException was occurring. I did this step-by-step, app rebuild by app rebuild, through the next 250 lines of related code:

  1. Click debug in Eclipse to build the new version of the app that included any new logging code as shown above, and load it on the device.
  2. Wait until the application was running, then shutdown the debug session through Eclipse.
  3. Restart the app on device. Note: debugger was shutdown so it wouldn’t re-attach.
  4. Watch the messages in Logcat.
  5. If I saw one message , such as Test1, followed by the NullPointerException with no test message after it, then I knew it was the offending code block, method or property. If it was a method, then I followed the same pattern through the individual lines of code inside that method. This looked very much like you would do with step-thru debugging, except this was done manually. Ugh.

What caused the problem? As time went on, and I was surprised that I had to keep going and going deeper in the code, I became very curious.  It turned out to be a multi-threading bug in a third party library that wasn’t fully initialized even though it had a method to check if initialization was complete. The boolean state property was plainly wrong. This one portion of the library wasn’t taken into account when declaring initialization was complete. And I was trying to access a property that wasn’t initialized. Oops…now that’s a bug.

The workaround? To work around the problem  I simply wrapped the offending property in a try/catch block. Then using the pattern I described in the previous blog post I was able to keep running verification checks until this property was either correctly initialized, or fail after a certain number of attempts. This isn’t 100% ideal, yet it let me keep going forward with the project until the vendor fixes the bug.

Lessons Learned. I’ve done kernel level debugging on Windows applications, but I really didn’t feel like learning how to do it with one or multiple Android devices. I was determined to try and narrow down the bug using the rather primitive tools at hand. The good news is it only took two hours. For me, it reaffirmed my own practice of implementing good error handling because I knew immediately where to start looking. I had multiple libraries and several thousand lines of code to work with. And, as I’ve seen before there are some bugs in Android that simply fail with little meaningful information. By doubling down and taking it step-by-step I was able to mitigate a very visible bug.

Where’s that OS update for my Android?

I’ve talked with many Android users and developers and the question that comes up the most is: why can’t the cell provider or hardware manufacturer provide an OS update for my phone? For example, I doubt that my primary developer phone, a Motorola Atrix, is ever going to see an OS update beyond Android 2.3.6. My idea is reinforced by the fact that its much touted successor, the Atrix 2, only supports v2.3.7 and, as of this writing, Android’s latest release is at 4.0.x.

Why update at all? In my case, there are an increasing number of software requirements that simply can’t be met without significant work-arounds with the older OS versions, for example building dynamic UIs. That’ not to say that upgrading, in itself, doesn’t also involve additional coding to support certain levels of backward compatibility. More on that in a minute. On that note, I’ve also blogged before on the versions of the Android OS being used by the majority of phone users.

In the case of the consumer, you may simply want a new feature like a better camera, or more battery life. Or, perhaps your two year cell plan is expiring and your phone battery isn’t lasting as long, or the phone seems to be getting slower and you would like something new.

The twist. Okay, back to the OS updates. First of all, as customers roll off their two year contract an educated guess is that they’ll want to buy the latest generation phone rather than hang onto their old one which isn’t upgradeable. So many things change in two years that my old phones look ancient, and sometimes downright clunky, compared to the latest and greatest. You may have experienced similar thoughts. This creates huge demand for the “new”. It’s not much different than cars in a way.

Here’s what makes things interesting. It’s a fact that Android makes these updates publicly available. However, the carrier who sold you your phone, and the hardware manufacturer that built it most likely made changes to the operating system and phone firmware that essentially creates a specialized branch of code. So, the Android official update may not necessarily work on your phone. In essence, you’re stuck because the source for these customized OS’s isn’t open.

Reality is it costs money to maintain these unique Android code branches through support resources and software developers to make and test bug fixes. And, it costs even more money to maintain backwards compatibility for older versions of phone firmware or software in parallel with support for all new subsequent releases. If you aren’t familiar with firmware, it provides the lowest level of control on your phone and it’s provided by the phone manufacturer not Android. When new releases of software or firmware happen, you have to make sure you don’t break anything. I have a hunch that it also takes more time for hardware manufacturers to catch up on supporting new Android features than it does for Android to add features. Making hardware changes takes time via a manufacturing process.

This is where phones are different than cars. I’m not aware of a huge aftermarket for upgrading the OS on older phones. Where, in comparison, car parts and service is a big business. So my analogy isn’t 100% perfect.

I also suspect that neither the cell providers, nor the hardware manufacturers, want to be in the software business. That certainly does not appear to be their core competency. After all, the people behind Android are the experts who continue to innovate at lightning speed.

So, putting all of these concepts together makes me a realist, I suppose. I think the odds show little incentive for the stakeholders of my Atrix to bequeath an Android OS update.

Summary. At a national level, adoption of the newer OS’s happens in longer cycles because of cell contracts which affect how often consumers can update. Supporting older phones costs money. And since us consumers push the latest and greatest, the carriers and phone manufacturers respond by being wholly focused on getting new technology into our hands.

On one hand, I cringe at the fact of giving up a decently working phone that I’m very familiar with and possibly relegating it to the back of a rarely used desk drawer. But, on the other hand, what I’ll gain from a pure consumer point-of-view, seems to significantly outweigh that simple fact. The hardware benefits of a new phone include: a much better screen and camera, better battery life management, more powerful CPU and onboard memory. From a software perspective I get support for the latest version of Android which includes the new user interface capabilities along with other behind-the-scenes improvements.

[Edited 6/11/12: Database crashed. Restored Blog Post.]

Best Practices for using onResume() in Android Apps

onResume() is a tricky part of an Android’s application life cycle that is called after onRestoreInstanceState(bundle), onRestart(), or onPause(). Its’ typical usage looks like this inside an Activity:

@Override
protected void onResume() {
	super.onResume();
	//do something
}

There are two things to be aware of when using onResume():

1)      The application may not be visible yet to the user

2)      Code that you want to access may not be fully initialized yet.

It seems very simple on the surface. When an application resumes it’s really no different than when you wake up in the morning. It may take some time to get going and there may be certain necessary rituals to be completed. For example, some people need a few cups of coffee (or tea), and applications are the same way. Of course, applications don’t drink coffee or tea (yet). But, anyway, it takes time and there may be certain rituals that need to be done for certain aspects of your application to spin back up. This is especially true when you have implemented your own threads.

It’s important to note: onResume() does not indicate that the application knows anything about the state of your application, and this is where you can get into trouble. This event is, for the most part, just an announcement by the operating system that it has resumed your Activity and that you can start accessing your app or hardware items such as the camera.  What makes this confusing is that some aspects of your Activity will come back to life without your help. Examples of this include user interface components. And, other aspects of your app will not automatically come back to life. An example of this is if you built any custom threads.

So, some key items to consider in your code are:

1)      If you are concerned about visibility then check onWindowFocusChanged(). You can do this using the pattern described below for #3 and #4.

2)      Did you pause any threads prior to the onResume() event? If you did, you’ll need to unpause them. If you don’t unpause them they won’t start back up again automatically.

3)      Do you have anything that takes additional time to re-initialize? An example of this might an RSS refresh request is kicked off, but the response payload hasn’t been received and processed yet and you want to synchronize that with other methods.

4)      If the device is under load when your application resumes, the methods you attempt to access and their responses, as well as any event handling, may be sluggish. Examples of a device under load include limited memory conditions, and/or high CPU usage, and/or high-bandwidth usage. If you don’t handle this properly the app will crash.

To work around items #3 and #4, there are several relatively easy ways to help prevent your app from crashing: Handlers and AsyncTasks. Use Handlers or AsyncTask for managing aspects of your application that don’t or can’t spring immediately back to life. If you aren’t familiar with Handlers or AsyncTask, they give you an easy way to off-load time-consuming or intensive tasks from the main user interface thread, and they also provide easy methods for re-synching messages, methods or objects back into the main thread. The concept behind this is the end user can continue working with a compliant user interface that still accepts input, while these special methods work on their tasks in the background, and then return control back to the main thread when the tasks are done.

There are plenty of posts that explain Handlers and AsyncTasks and show how to fully implement them, so I’m not going to cover that. I will, however, show you one example to demonstrate what I’d consider a best practice to cover you on items #3 and #4. In this example, and in the context of the application being resumed, it must now wait until the RSS feed has been retrieved before running an analysis on the feed. Both the RSS HTTP request/response and the analysis can be time consuming, and the analysis could still be running in the background while another RSS feed request is taking place. By using background threads, we can better manage this scenario and reduce the chance of an application crash.

public boolean getRSSPayloadReady(){
	boolean rssRecieved;
	//determine if RSS has been recieved and processed.
	....
	return rssRecieved;
}

@Override
protected void onResume() {
	super.onResume();
	rssQueue.unpause(); //threaded method for retrieving RSS
	delayedStart(15000);
}

/**
* Use with onResume().
* Check for RSS update in the background using a specified second delay.
* @param delay how long to wait in milliseconds
*/
public void delayedStart(final int delay){
	final Handler handler = new Handler();

	Runnable rssTask = new Runnable() {

		@Override
		public void run() {

			try{
				handler.postDelayed(new Runnable() {

					@Override
					public void run() {

						try{
							boolean test = getRSSPayloadReady(); //Has RSS refresh completed?

							if(test == true){
								//this algorythm runs as AsynTask
								runParsingAlgorythm(); //won't work if RSS payload = null
							}
							else{
								sendToastMessageRSSFailed(); //Let user know there was a problem.
							}
						}
						catch(Exception exc){
							Log.d("Test","delayedStart(): " + exc.getMessage());
						}
					}
				}, delay);
			}
			catch(Exception exc){
				Log.d("Test","delayedStart(): " + exc.getMessage());
			}
		}
	};

	Thread rssThread = new Thread(rssTask);
	rssThread.start();
}

Reference:

Android Activity and Application Life Cycle