Safari on iOS 5 Randomly Switches Images, Part 3

We are still digging deeper into the imagebug problem we’ve mentioned in part 1 and part 2.

We’ve not been able to create a synthetic setup that triggers the bug, but managed to automate, identify, alert and log using our production environment.

We’ve modified our code so that on every pageview on the client loops through all the images and check if the dimensions of the image and the surrounding container match using javascript.

If some of the images do not match, the images are marked in red and the client posts as much information as possible about the problem to a server side script that logs to splunk

Using splunk we have tried to figure out what type of clients that triggers the bug and this is what we found so far:

  • It seems to be a problem on all browsers that has pipelining enabled
  • Opera Mini does funky stuff on images by design so it’s a false positive
  • iOS5 is overrepresented
  • Opera on Android (And Symbian) has all kinds of issues.
  • Native Android browser has issues, but at a much lower rate than iOS and Opera

Here is a query from splunk looking at the user-agent for all browsers that triggered the bug the last 24 hours.

place useragent count percent
1 iPhone 1340 75.791855
2 Android 247 13.970588
3 Opera 148 8.371041
4 Symbian 33 1.866516

At this point we put up a test environment to test all variants:

  • Hardware: Macbook Air
  • Operations System: OS X Lion
  • Chromium latest daily daily snapshot with pipelining turned on using chrome://flags
  • Firefox 9.0.1 with pipelining turned on using about::config
  • Opera 10.60 (pipelining enabled by default)
  • iOS Simulator 5.0 from the iPhone SDK
  • Android Emulator form the Android SDK
  • Network Link Conditioner (from Lion Xcode) to emulate differnt types of network
  • Wireshark listening on port 80
  • – this page uses a singel host for all images to maximise occurrence of the problem (using the parameter ?time=hammer will reload the page until it fails.)
This machine has been running at home, and in the office, on wired lan, wireless lan and on wireless lan through OpenVPN (ssl-vpn)
We’ve managed to trigger the bug in all network condition except when running through OpenVPN (SSL-VPN) with the iPhone emulator. Lowering the network quality seem to increase the bugrate.
We have not managed to trigger the bug in the Android emulator or any of the other browsers
This is the normal setup at which shows the error:

Since the two other major newspapers in Norway have reported the same problem and they don’t use varnish we had a suspicion that the concept of loadbalancing would be the triggering factor. So to narrow down the problems we put a varnish directly on the internet with a public ip and hammered it with all the different browsers in our test environment.

The only browser we consistently managed to trigger the error on was the iOS iphone emulator running iOS 5.0.1. It took anything from 15 to 1000 reloads to trigger, averaging around 170 reloads.

For anyone interested in diving into this why this happens: Here is a bug triggered pretty early on a wired net without any traffic-shaping.

Screenshot when the bug triggers:

Screenshot of which pictures that failed.

  • PCAP-file – taken client side (all 37 attempts) using wireshark on the test environment

The first is the correct picture (of the soccer guy cheering):

The next images which is supposed to be a picture of a guy that bought lots of planes for the Norwegian Airline, but is replaced with the image above:

is then replaced with the above one.

Using wireshark, look at eq 60 in wireshark to see where things go wrong. In this case it seems like it actually requests the image twice before the reply. But that does not seem to be the case always.

Feedback appreciated!

Read more from the Software engineering category