Startup companies race to complete features and ship them as soon as possible. We don’t do that, because we are aware that new features are never right. There is always a period of tweaking after a release to match what we built, to how people are actually using it.
How we work is that the development team will have a list of big projects to do, which they’ll work on for a few months. Each of the projects appear to naturally come to a close near six months into development, so we’ll tidy up all the new projects and announce them all as a new release.
The current release candidate will be our tenth official release since we started about 5 years ago, and will have three major products in it: a stronger Dynamic Scheduler (route optimisation for multiple vehicles, departments, etc), strong Xero integration (both Public and Private API versions), and a new Purchase Orders system.
We’ve already planned out the bones of what’s coming for the next six months after this, and we think it will be our biggest and most important release; not because of new products in it (we’re thinking Quickbooks, and multiple languages, for a start), but because we’re mostly dedicating our development resources into making sure the system is scalable (we want millions of people using it), fast even with all those users, never goes down for any reason, and is so simple to use that you’ll wonder why we bother making tutorial videos!
We’re already at the stage where most parts of the system are redundant, so if a server or two (or even a full datacentre – hello Digital Ocean San Francisco I’m looking at you…) goes offline for a while, we can reroute people to different areas that are still up and running. In most cases, there won’t even be a pause (like when SF went offline, and there wasn’t even moment where anyone noticed but our development team), but sometimes we will have short period where parts of the system are offline.
In our own case, how it usually happens is that we’ll see an issue, we’ll solve it in a quick way that’s not perfect but gets the job done for now, and then we’ll solve it in a more permanent way that takes longer but is more robust.
Yesterday, for example, we had an issue where people logging into one particular server in our network were experiencing a large delay in their requests. We looked at this, couldn’t see the immediate cause, and figured the best thing to do at that moment was to increase memory and CPU (we multiplied the number of CPUs by 4, and the amount of RAM by 16 – hey, go big or go home!) to see if that fixed the immediate issue. It did, forcing people using that server (some customer relationship management software users, no mobile workers) to log in again, which was a price that had to be paid at that time to get the problem solved. The more permanent solution in this case is that over the next six month development period, we will be tearing that server type apart into separate “microarchitecture” pieces that can be managed separately, and making sure that each of those new pieces can be turned on/off without affecting people that are logged in. Obviously, this will take time to plan and build, so the quick solution (throw more memory and CPU at it!) was the right answer in the mean time.
Yesterday’s issue took that particular server offline for three minutes (I was timing it) while we upgraded it. Three minutes is better than how long Google Drive was out last week, or how long the entire country of Japan was offline for on August 25th (thanks again, Google), or how Melbourne’s train network was shut-down for hours on July 13th.
We’re at that stage in a company’s life where we’re pretty happy that we have the product we need, and that there is no need to keep on adding more and more and more shiny stuff to it, which is why we’re happy now to spend the next 6 months or more just making sure it is the absolute strongest, fastest, most reliable, and best all-round cloud field service management software.