DynamoDB can be really powerful for serving an API for mobile clients with an offline-by-design strategy. In this post, I’ll describe how one such API was built for such an app.
Single Table Design is a really intriguing concept. It allows access patterns not normally available to a traditional RDMS, with one such access pattern being fetching multiple types within a single query.
Before digging in further, let’s talk sync.
There are multiple ways (probably endless, honestly) to approach sync. Considering all the business requirements, this solution may or may not be right for you, but it was right for this app and worked really well. This particular use case recently had the following requirements:
- Support an offline-by-design mobile app.
- When coming back online, clients could handle any conflicts. This provides the option to provide UX if needed.
- Bulk ingestion of all data when signing in, or for periodic drift correction.
- Resumable - needed to be able to be processed incrementally, and be paused at any time and resume where it left off.
Given these requirements, the design that was engineered was one very similar to how Apple’s CloudKit works, as it has some similar concepts; at least with regards to change tokens.
The API Design
For this particular use case we utilized GraphQL, and I found it to fit naturally. There were some entities not intended to be synchronized, and being able to utilize GraphQL Interfaces allowed us to designate that through the API Schema - only the types we wanted to sync implemented the SynchronizedItem
interface.
Here’s what the SynchronizedItem
interface looks like:
interface SynchronizedItem {
id: ID!
changeToken: String!
createdAt: Time!
updatedAt: Time!
}
It mostly has some pretty typical stuff you might see in any sync API, namely id
, createdAt
and updatedAt
. The one that you might not be familiar with is changeToken
. changeToken
is used to track a revision of an entity. It’s purpose will be shown as we work our way through below.
Creating/Updating Entities
When we create or update an entity, we want to “rev” or add a new revision to the entity in question. For this particular business use case we don’t care about all the historical changes at a per entity level, and as a result we just need to know if we have the correct version or not. We track this with the change token, which is a ULID. In a ULID, the first 10 bytes comprise the timestamp, and the remainder is the entropy (or unique part) of the id. This trait of ULIDs means that not only are they sortable by time, but comparable as well. By generating a new ULID on every creation and update and seeding it with the current time, we now have a string that we can compare simply with the <
and >
operators to see which came first. This will be important when fetching the entities that have changed from Dynamo as we'll see later.
What about attempting to update an entity that itself has changed through some other means, causing a conflict? As mentioned previously this particular business use case calls for a solution that allowed the client to handle conflicts. So, not only d0 you get a new changeToken
on every update of an entity, but you have to provide one too. Here’s an example update mutation:
input UpdatePersonPartial {
givenName: String
familyName: String
dateOfBirth: Date
relationship: RelationshipType
phoneNumber: String
emailAddress: String
profileColor: String @hexColor
occupation: String
}
input UpdatePersonInput {
tenantID: ID!
id: ID!
changeToken: String!
changes: UpdatePersonPartial!
}
type UpdatePersonPayload {
person: Person!
}
extend type Mutation {
updatePerson(input: UpdatePersonInput!): UpdatePersonPayload! @isAuthenticated
}
The way an update works is it will first fetch the entity you are attempting to modify from the database and check the current changeToken
for that entity against the changeToken
the client passed as a parameter. If it is up to date, it will allow the update to proceed. However if the server entity’s token is ahead of the token the client passed, it will reject the request and throw an explicit error code that can be captured by the client to be used for handling the conflict.
Deleting Entities
Deleting entities can be done a handful of ways:
- Soft delete - set some state on the record to signify it's deletion, and filter from any queries. The issue is the latter; it's hard to remember all the various places to filter the deleted items from the results, and in DynamoDB filter operations happen after the queries have taken place.
- Deletion sentinel - a record written to signify a deletion with metadata around the deletion event.
The latter is the way we'll opt for here; Take a look at it's GraphQL schema:
type DeletedItem implements SynchronizedItem {
id: ID!
tenantID: ID!
changeToken: String!
type: DeletedItemType!
createdAt: Time!
updatedAt: Time!
}
You'll notice a few things:
id
this is the id of the deleted item, not a unique id for this instance.tenantID
is again, the scope or the account under which this record was deleted.changeToken
exists here as well, to satisfy theSynchronizedItem
interface but also to act as a sentinel for a deletion that happened at a point in time.type
is purely metadata around the type of entity that was deletedcreatedAt
andupdatedAt
also exist to satisfy theSynchronizedItem
interface, but also declare when this record was created, or rather, when the entity in question was deleted.
When we delete an item, we will create a DeletedItem
record for the record being deleted, and save it indefinitely. Any client that knows about that record can then receive this record and process it; deleting the record from their local stores. As changes are processed from the server, the local changeToken
is rev'ed, and this change now no longer has to ever be processed again.
Fetching Changes
Our approach to fetching the list of changes was one that allowed a client to specify their current, most recent revision of all the data, and get back any changes since that point in time. This is great because for a mobile app, it means less over the wire; if nothing changed only a simple id is passed over the wire, not an entire data dump.
Here is what the changes
query in GraphQL looks like:
changes(tenantID: ID!, since: ID, cursor: String): SynchronizedItemList!
First off, tenantID
is the scope of which you are fetching the changes for. This could be anything in your domain - think of it like an account. The since
parameter is accepting a change token, which you’ll notice that can be optional. By passing null
, the client is saying that they don’t have any current revision and would like all of the data in one bulk, paginated operation. If, however, a changeToken
is passed as a parameter, the client is requesting anything that has a revision later than that value. On the backend, the server will utilize that in its query to fetch the synchronized items and filter by that value as a starting point. As the client receives changes which are sorted by the change token, it can persist the new latest revision it has synced with from the server. For example, if a client is on revision A
, and the server is on revision D
, the client will receive the changes for B
, C
and D
, and on each step the client can first process, then persist the completion of the processing of that revision, thus giving a trait of resumability and fault tolerance to the system, before arriving at D
and saving it's state at that step.
Next: DynamoDB
In Part 2 we'll take these concepts and apply them to the queries in DynamoDB and see how DynamoDB makes this all too easy for us to put together.