I am now stuck at home as many of us. Weather out is beautiful. So I’ll take the opportunity of the extra time to try to write more about the app development. One source of inspiration that never seem to end is chasing bugs in the app! So while I have the time at home, I’ll try to detail them as I chase them! Starting with some database woes…
Delay in activities being available or the important of database optimisation
The first problem that started happening for a few users was that the activities sometimes would not be available until quite a while in ConnectStats. This wasn’t affecting everyone, and of course by the time I looked at the issue the activity was there.
I also noticed that the queue was quite regularly behind by up to a minute. It is supposed to process up to 5 activities at a time in parallel, to make sure my server does not get overwhelmed, and it used to be always processing everything within a few seconds.
My first thought was “Oh no, I need to upgrade my server again or finish quickly the migration to AWS!”. But of course before, one should always investigate a bit in more details.
I have a quite sophisticated testing in place for my server setup, so I can on a local server replicate the main server and test code changes, etc. While no matter how sophisticated your testing setup is, it’s never exactly the same as the production server the app actually uses, and regularly problem users see are not seen on the testing setup. But this time it was!!! So it made my life a bit easier. Some jobs were also very slow on the testing server.
Debugging an iPhone app is made quite easy by a lot of really advanced tools that let you run the code line by line, produce all kind of diagnostics on the code as it runs, etc. Debugging a server code is not exactly as easy. There are some tools to enable you to step into the code of a server using PHP like ConnectStats and get diagnostics dynamically, but I am not aware of really simple and efficient ones. So I reverted to old school binary search or “divide and conquer” approach to debugging…
The server is doing a good dozen queries or operations on the database when it processes an activity received from a device. These Operations are quite small and I took care to limit the number and make them as efficient as possible to make them as fast as possible by limiting the number and making sure I organise the data with the proper database indexes. With the binary search approach, I managed to narrow down to a single of these operation that was very slow.
It was all because one of these operation was doing a search on all the activities in the database without the right database index! The database isn’t even yet that big, it has about 500,000 activities at the time of this writing, but already, one un-optimized query operation was slowing down everything!
Everything is now fast and shining on the server!