So you’re using something like CouchDB for your architecture, maybe Couchbase on the server with Couchbase Lite or PouchDB in your client. Great! You and I should get together sometime to exchange migrane medication tips.
Offline applications have a very complex way of dealing with data. You might not have noticed, but when you start up an offline-first application, it usually loads the data you had previously loaded the last time you used the application, only to notify you that you have updated data to display, if you like. For example, here’s a screen grab that I pulled from loading up the LinkedIn app, literally just now:
Check out that message at the top that lets me know there are New Updates. Those aren’t yet loaded in the view, but they’re definately stored on my device. When I click that button, the newsfeed is updated with the latest data, and my screen scrolls to the top of the list.
That’s a UX consideration with highly technical programmatic implications, even if it doesn’t seem that way. In this case, yes, the app will always load to the news feed, and that’s all good and gravy for a very predictable data dependency. But that might not always be the case, especially in applications with very heavy workflows and multiple users. When the local data store updates, how does the webapp know that the newly synced information is important to the view, and worth interrupting the reader’s experience?
Offline-First Data Sync
So when we start talking about building offline-first appications, we have to think about data in two forms. The first is the local store, which might be an instance of Couchbase Lite (which is what we use on Predix Mobile), or maybe PouchDB, or if you’re a hard-worker you might use the native browser IndexDB or WebSQL. Whatever your plan is, your webapp is going to consider that data store as its one and only source of truth – the all-knowing, all-seeing keeper of data.
The second form is the mysterious server-side data set. It’s literally the job of the webapp not to have any care in the world for the server-side database, as the webapp will never have any direct interaction with the server at all. Let’s take a look at the architecture behind a typical replicated data store:
Here you can see two separate relationships being maintained. In some instances you might have a go-between – i.e. the Sync Gateway in a Couchbase worflow. The point is, regardless of the actual number of boxes, the webapp only knows about the local data store, and the local data store only contains what it can sync with the server. It’s one handsome two-timing devil.
The webapp data flow
As the webapp loads up, it’s going to pull whatever data it needs from the local database. This is kind of a no-brainer, but it will get a bit hairy in a few minutes, I promise.
As a consequence of only dealing with the local store, it has absolutely no idea what’s going on with the server-side database. Regardless of how many updates the server might receive, the webapp only wants to know what’s happening in the local data store.
Once the webapp has requested and successfully loaded the active view’s data from the local store, it’s really not too concerned with what happens after that, unless otherwise instructed. It will maintain a view model for as long as the view is active, and will only change the data displayed when the user requests a new view model.
We don’t often put too much thought into this, but this kind of data load is what I’ve come to call the view’s data dependency. The view is dependent on data, and calls that data as it loads. In a RESTful application, this would be the API call to retrieve data, and we generally wouldn’t expect that to change unless we’re working with a websocket connection, a la Meteor. However in offline-first, that view model is mapped to what’s in the local data store, which can change at any moment. So it’s up to us to instruct the webapp on what to do when that happens.
The local data store’s sync process
Regardless of what’s happening with the webapp, the local data store will continually try to sync its data with the server-side database – which one might refer to as the application’s master database. Posts, puts, and deletes are generally intuitive from a sync point of view, as they’re all processes created by the user, and the user only needs to be notified if their last action was successful. Probably a topic for a different day.
But in a complex app with multiple users all posting, putting, and deleting, there’s plenty of opportunity for other users to end up with new data pertinent to their active view, which might be vital when making decisions in, for example, an application used for tracking tasks on an industrial job site. The last thing we want to happen is for the active view and its data dependencies to get out of sync with the local data store. Misinformation can cost a company thousands, even millions in lost productivity. So it’s best to have the latest information whenever possible.
Data Dependency Mapping
Ultimately, we’ll want to have some sort of data dependency declared within each view, so that as the view loads, it can expose to a data service some information that will determine whether or not the user will be interested in the latest incoming data.
Notifying the user of updated data
For example, let’s say that I’m managing a construction job site with my fancy iPad and a nice new application, as outfitted secifically for me and my team by corporate HQ. Great stuff. As I’m glossing over things during my lunch break, I notice that a task that was supposed to be complete that morning – let’s say second-story framing on a house – hasn’t been marked complete by the construction crew. So I start getting pissed, and get ready to give the guys on the job a piece of my mind. Meanwhile, two seconds after I load that task in the view, an update comes in from the server, letting my local data store know that the task is in fact successfully complete.
In this case, two things can happen. The first, which is what we want, is that I’m notified that the data contained within my active view has changed. As the user receiving that notification, I should probably have some way of updating that view so that I can see that my crew has completed the task, albeit right at the buzzer.
The other option is that my local data update isn’t handled, and instead of knowing to refresh, I finish my lunch looking at outdated data (might even be an hour later by the time I’m done eating), and write up a report causing at least one head to roll. That’s just not what you want; nor is it what that rolling head wants.
Selective update notifications
From a broad-strokes perspective, we can easily say that any time there’s been updated data, we’ll want to let the user know. That’s a solid MVP or first release solution, but it’s not necessarily a very good one, or one that we want to brag about.
Consider the same situation, where I’m managing a construction site, iPad in hand, eating lunch, and I’ve just navigated to my about-to-be overdue second-story framing task. The good devs at corporate HQ have thought about notifying me when there are local data updates, so as I’m looking at that task, I get the notification.
Well, what if the data that synced locally wasn’t pertinent to the task I’m looking at? What if that update was for a different task? Would I, as a user, really care? Whenever I navigate to a task and load that task’s data into the view, it’ll always pull the latest from my local data store; and so when I navigate to that other task, I’ll pull that latest local data anyway. But I’m not on that task, I’m on the one that’s overdue. And despite telling me that I have new data, nothing looks updated.
Thus we have a need for data dependency mapping. If my app tells my user that data has been updated, but they reload the view and nothing looks new, they’re going to stop trusting the app, start throwing iPads into the cement mixer, and ultimately thinking less of the devs at corporate HQ. That’s bad for you and me.
Determining data dependencies
Essentailly we’re talking about a programmatic flowchart:
Again, this seems super straight-forward. But the question that we have to ask as we build out our applications is how we’re supposed to know – programmatically – how to build that decision point in the middle. In your current application, is there a way for your data service to determine whether or not updates impact the current view?
Personally I don’t know anything about React, but I can tell you from experience that in Angular 1.x, this isn’t easy. Most of our general understanding of front-end programming was built with RESTful APIs in mind. I mean, it wasn’t that long ago you could put it on a resume as a way of guaranteeing a second interview.
Devil’s advocate on auto-refresh
It’s worth asking – as this definitely came up when I mentioned this problem in a demo a few weeks ago – why not just automatically refresh the view when there’s new data to be seen? That’s a fine devil’s advocate position to take, but it’s easily shot down. If you’ve ever been to a website that auto-refreshes, chances are it wasn’t a pleasant experience for you. I generally tend to read nbcnews.com once a day, and if I’m scrolling down the page, reading headlines as I go, usually I’ll get to the Tech section right when the page auto-refreshes and I lose my place. It’s infuriating. Don’t do that to your users.
As we get further into offline-first development, we’ll have to think of update events like this, and determine how we want to handle them. The UX folks I work with are working away at processes like this. We as developers need to be hard at work as well. If you have any thoughts on this, or have any experience with data-sync workflows, feel free to leave a comment.