Previous A/B result invalid due to cookie bug

In the previous post I wrote about an A/B test where I compared a template bought from ThemeForest against one I created myself. The test ran for several months, and ended with ThemeForest being the winner.

After that post I was contacted by a developer called Max from Germany who told me that he had noticed there was something odd going on with how I was choosing whether to show the new ThemeForest template or my old hand-coded template.

Cookie trouble: lpab

Max correctly noticed that the lpab cookie determines which template to show. It would be set to "new" to show the new ThemeForest template, or to "old" to show the hand-coded template.

Or so it should be, but he told me that sometimes he would see this cookie set to "old" but would still be served the new template, and sometimes it would be "new" but he would still get the old template. Uh-oh.

His report sounded solid, and he had extensively tested that this was happening, but I was unable to reproduce it despite trying to clear the cookie dozens of times.

Accomplice: session_id cookie

Now I might have given up trying to track down the bug, but only minutes later Max gave me a critical hint: to reproduce the bug, it wasn't enough to just delete lpab, in addition another cookie called session_id also needed to be cleared.

That sounded really odd to me, as these two cookies were not related to each other at all. But when I tried to do as he instructed, lo and behold the A/B test pinning stopped working. I hadn't been able to reproduce it before, because I had only tried deleting the lpab cookie, while he had probably been deleting all cookies.

Long-winded explanation of the cause

Reading through the code, I was quickly able to find the issue.

(1) First, lpab cookie is set randomly to "new" or "old". Then template is served based on the result. No problem there.

(2) Next, if session_id doesn't exist, it is initialized and saved to a cookie. This would be fine too, except... The way I saved the cookie was to directly alter the Set-Cookie header to a completely new value, erasing any cookies set during the same response, including the cookie from (1).

Long-winded explanation of the cause

In other words, (1) says "in the response, write header Set-Cookie such that lpab is set to 'old' or 'new' randomly". (2) says "actually forget about lpab, overwrite Set-Cookie so it only sets session_id".

So lpab is never set because setting session_id immediately overwrites it. But in my tests I was always seeing that lpab did end up getting some value, where was that coming from?

Well it just so happens that in an AJAX call that happens ~100ms later, the above is run again, this time lpab is not set, so it is set randomly to "new" or "old" again. But now session_id IS set, so this time it does not overwrite the new lpab value.

So yes you end up having a random lpab cookie (yay), but it does not reflect the template that was actually shown (argh). Often you will have a cookie that says "new", but you still get the old template or vice versa.

Why would you even do that?

Why I would set the Set-Cookie header in a way that overwrites everything else set before it?

I wrote the session_id part of the code ~6 years ago, with full intention to store everything in a Session object, so it did not matter if I deleted all other cookies, as there should be no other ones if I just put everything in the Session.

However later on I forgot about this intention, and started putting things into separate cookies instead.

When using these separate cookies I was a bit smarter after having learned how to use my framework properly, such that I was no longer overwriting other cookies set during the response. But the old session_id code was still there, like a time bomb, just waiting for me to dare try to set cookies before session initialization.

What prevented me from noticing the issue was that it would only be apparent to new users with no cookies set. I am decidedly not a new user, and while I did test my code by deleting cookies relating to any code I was working on at the time, it didn't occur to me to test deleting all cookies.

Conclusion

This makes the previous A/B test result invalid. Worse yet, new visitors had often been served a different template on their first and second visit, which must have been quite confusing. On the positive side, it is great to know this (even if slightly embarrassing), as I can now re-run the test correctly. But it will take at least another 3 months to know the real result.

Conclusion

I ate a cookie while writing this post to make myself feel better.