Keith Hirst

Keith Hirst

Santa Claus is coming to cloud

Santa Claus is coming to cloud

Talking about Christmas in January - am I mad! Well, yes, probably, but there is a good reason, I promise. Santa runs a complicated operation. He has to deliver presents to potentially 2.2 billion children in up to 195 countries. Not only that, but he employs a workforce of elves to build these presents. He also allows every child to write him a list to request specific gifts. And Santa is a champion of good behaviour, so he has to decide how many toys a child should get depending on how good they have been.

Let’s put this in the context of software. Santa has a bunch of common business problems that have been solved by software many times before - only Santa has them at a scale that you probably haven't seen before. And THAT makes it interesting, or at least to me it does.

This blog is based on a talk that I did last December with a few of my colleagues. We used our imagination a little, took a few liberties, and pretended that Santa had come to Infinity Works and asked us for help. It was a fun talk to do and gave me a little insight into pieces of tech or business processes that I may not have looked in detail at before. With this post, I hope to show how we got to the architecture that we ended up with and what it might cost to run parts of Santa's operation in the cloud - specifically, AWS.

The problem

Before we did anything, we had to come up with a reason why Santa would have come to us. We would then gather his requirements and think about how we, as a consultancy, could help solve his problems. We figured that Santa’s biggest challenge in 2020 would be handling data and getting consent to hold that data. It's a big challenge facing many companies. It doesn't matter how fictitious you are, if you are storing data, people have a right to know what information you are storing!

So what were the high-level requirements? What does Santa need to do to stay GDPR-compliant?

  • We need a way of identifying potentially all children in the world, every year. That includes getting the right consent and also finding out where each child lives (or rather, where their chimney is located).
  • Santa is also responsible for keeping track of the kids who are being 'Naughty and Nice' and storing their wish lists so they can be presented if anyone raises a 'Subject Access Request'.
  • How can we design a scale-optimised, efficient system and keep the costs low at such a high load?

Research

One of the first things I learned doing this project was that you really can find out anything you want on Google. I thought I would need to make up a load of data about how many elves Santa needs to make all the toys, but no, someone has already worked that all out for me. And in great depth too!

Let’s suppose that each elf can make 4 toys an hour and works for up to 16 hours a day. That works out to about 80,000 elves required to make toys for 2.2 billion children. He would also need about 20,000 elves to carry out other duties such as receiving and processing letters and probably picking up the elves that have collapsed from exhaustion. But don't fret, Santa pays his elves handsomely in cookies. I didn't find out how many cookies an elf needs per day, but for the sake of completeness I found out that a pack of 8 M&S cookies cost about £1.50 ($2) which would mean that Santa's total yearly workforce bill would be around £72m ($97m) - which may sound like a lot, but £720 per employee per year isn’t.

Identification

Let’s look at the first requirement. We have to be able to identify and store 2.2 billion identities and also be able to geo-locate each chimney. Santa gives gifts to children in lots of different countries. This means we need to make sure that we have an app that can be accessed by every child in every language.

We could build a static site using a static site generator like Next.js. This would give us static hosting using a global network of CDNs so it could be accessed quickly by people on all sorts of devices.

We also thought about some non-functional requirements. For example, Santa faces unpredictable loads throughout the year when new toys/games consoles get released. Also, for 10 months of the year, most of his new cloud services won't be used much, so paying for wasted compute time is something we would need to try to avoid.

When you think of identification and AWS, your first thought is probably going to be Cognito. It's a service you can hook straight into, and you wouldn't need to worry about storing all of the data and keeping it safe. That is a sound conclusion, and in most situations, it would probably be the correct one. But when we actually crunched the numbers, we found that we could literally save millions by building a custom solution in DynamoDB. It is staggering how much difference there is.

Cognito charges you a reasonable £0.0019 per user per month. So for 2.2 billion users that would total £4,180,000 ($5,650,000) per month, which means it would cost over £50m for the first year to use Cognito. When you look at DynamoDB, we could spend about £430 ($580) per month, but we only pay for the reads and writes that are performed. That means that if we do have 10 months of greatly reduced use, then DynamoDB would be really cost-effective. Assuming 2 months of active usage, we estimated DynamoDB to cost just over £20k ($27K) a year. That would mean that Cognito is 250,000% more expensive than DynamoDB (in this scenario, at least).

Geolocations are usually represented as latitudes and longitudes (51.520847, -0.19552100). However, they are hard to remember. We thought about this problem and decided to use what3words. what3words are a company that has created a location grid over the whole world and assigned a three-word combination for each square (filled.count.soap). They provide an API to resolve an address or latitude and longitude to a single what3words address. It turns out that their API is actually restricted to 75k requests each month which obviously wouldn't really work for us with 2.2bn users. We tried to contact what3words to discuss this and they actually responded, so great respect is due to them for replying to such a ridiculous request! This is their response:

Hi Keith,

Thanks for your very festive message 🎅 Hopefully I can give you (and Santa?) enough detail for your presentation! We offer free API usage to many NGOs for educational, community and innovative projects. I agree that Santa (and all his amazingly charitable work delivering presents) probably fits into this category, so costs would be low (we only charge a £20 set up fee). More info here.

Worth mentioning that in this API plan, we offer unlimited calls to our API to convert GPS coordinates to what3words addresses. We do however have a max of 75,000 convert to coordinates calls per month in this plan, so I would recommend instead of Santa using the lat longs, he should navigate to each location using what3words addresses!

As this is hypothetical, I won't go into too much detail about our fair usage policy but if you (or Santa) fancies taking look you can read about it here.

I hope this is helpful! 🎄 You can have our first "Merry Christmas" of the year!

Thanks, what3words, that was really helpful. We decided that we could also use the what3words location of the chimney as a login for each child/household to make it easier for them to remember.

There's one potential problem that was highlighted in the Q&A after our talk. Someone mentioned that what3words terms say that you are not allowed to store the what3words address. I've since checked the terms, and I think it is just if you are trying to subvert an API call. So for the purposes of this, we will assume that it will be OK. At the end of the day, this is only a proposal, and userID generation has many solutions we could adapt.

The list

So how do we help Santa with storing the wish lists... S3, surely? Well, let's explore. If we have 2.2bn children, then we need to have at least 2.2bn lists. We know from many Christmas songs and carols that Santa checks his list twice which would be 4.4bn requests. There will also be a spike of traffic around December time. One of the issues we will need to tackle is that we estimate that a third of all wish lists will be handwritten, so we need a way to parse these.

Comparing S3 with Dynamo

To start, we would need to have a simple Lambda to sit in front of our chosen AWS storage service. That would just receive the parsed list and then store it. Next up was deciding what we actually wanted to use. We needed to know how big a wish list might be. We did a quick experiment and some maths and created a simple text file with a wishlist and looked at the size. We then doubled it because we wanted to assume that kids might get quite greedy. This came to about 300 bytes which allowed us to work out storage costs. Initially, we set out thinking about using S3. It turns out though that at high volumes S3 is really expensive. It would cost nearly £10k ($13.6k) to store and retrieve the data we needed.

We started looking at alternatives, namely DynamoDB. Now, I wrote off DynamoDB because when I calculated the storage costs, it was over 10 times the price of S3. But the problem was that I gave up far too quickly. With S3, you could use the AWS pricing calculator. DynamoDBs pricing page just gives you the numbers which looked higher than S3. It wasn't until I overheard a colleague saying that DynamoDB billed you per million requests that I realised I should revisit my calculations. When I went back to it with this new knowledge, it made the "Read" and "Write" requests a lot easier to swallow. Overall, it was a smidge less than £5k ($6.8K) cheaper to implement DynamoDB as a storage medium for wish lists.

DynamoDB: 2 - Obvious choice: 0

The most interesting thing about this is that at lower volumes, we probably wouldn't have even considered DynamoDB to store a list. Which brings me back to the point of why Santa is worth talking about in January. The key requirement was cost at high volumes.

So what about the handwritten letters? Well AWS has a service called Texttract which can parse handwriting of up to 3000 words a page. That comes in at about £1,100 ($1500) per million pages. We expect a third of all children to use this service which would bring the total up to a staggering £297K ($406K) a year.

Phew, all of that maths has made me tired. Let's take a step back and focus on what's important here. How do we present that information to Santa? He clearly doesn't have a concept of money at the North Pole, as he pays his elves in cookies from M&S! For Santa to understand this, we need to talk in terms of cold hard elves...

S3 is the same cost as 17.5 Elves - Dynamo is the same cost as 8.5 Elves

OK, sorry about that, I really enjoyed making that slide and wanted to get it in here somehow...

He's going to find out who is naughty or nice

Keeping track of children's behaviour is surprisingly also something we can migrate to the cloud.

With our app, Parents or Elves will tell us whether a child has been naughty or nice on any given day. Each kid would receive a point for being good or two for being selfless. They would also have a point deducted for being naughty. Their score at the end of the year determines whether they receive one gift from their list or multiple. It's a simple process. Often you see kids in movies getting miracles at Christmas, and that's also decided by their score. If they are selfless for more than 3 quarters of the year, they can get snow, a dog or anything else that happens in a movie.

So how should we keep track of that in the cloud? By far, the simplest solution would be to reuse our DynamoDB instance from before. We already know that writing to DynamoDB 2.2bn times costs £2,470 ($3,380). The difference this time is that it's calculated daily.

£2,470*365 ~= £900k ($1.2M)

Naughty and Nice list straight to DynamoDB vs Consolidating Calls

We could reduce this cost by adding an SQS queue in the middle. We could then use one of the new "High Resource Lambdas" to consolidate all of the calls from the previous month. That would reduce the cost down to $29k ($40k) which is quite the saving.

But, actually, we decided that it wasn't worth it. In the grand scheme of things, it's a lot of engineering effort for a small saving. Most of the costs will come from the non-digital side. So focusing on using that effort to make a complete move to digital would have significantly more effect on the overall bill. But we are skipping ahead a little bit. Before I take you through the final architecture, let me share one optimisation we did make.

We decided that by default, everyone was well behaved. That way we will cut out a large portion of our calls reducing the costs significantly. Upon further googling, we found an article stating that about a third of parents threaten their kids with the naughty list, so our savings could be up to two thirds since not every kid can be that girl from "Miracle on 34th Street".

Proposed solution

Proposed solution

So here is our proposed solution. To make it accessible to everyone in the world, we need 3 points of entry. We have:

  1. Voice Calls - Children can ring into regional Amazon Connect call centres to create and update their lists. If they need help, they may even end up speaking to an elf directly.
  2. Digital Access - We hope that most children would be able to self-service on the website/app which and their GDPR opt-in would be stored with their list in Dynamo.
  3. Letters - Finally, kids could send their letters to Santa as they have done for centuries. The elves would use a custom mobile app that would use the Amazon Textract SDK to get the lists into Dynamo.

Final bill

Final Bill - £49,711,945. Digital - £377,905. Non Digitial - £49,334,040

So there you go, Santa's annual AWS would come to roughly £49,711,945 ($67,633,051) I've rounded most figures off in this post but, I thought I would leave you with that number in its full glory! Here it is again in a big font!

£49,711,945

Have you ever seen an AWS bill like that before? If you have, then I would love for you to contact me. We have a relatively simple architecture but it's just the sheer quantity of potential users in this case which makes this value so high. Realistically (in the mythical being sense), all 2.2bn children won't believe in Santa. Also, a lot of those children will be far too young to write wishlists to Santa. So the chances of it costing this amount are slim, but it's much better to get a worst-case picture because then you can work on it.

If we break this down, it's actually understandable where these values are coming from too. The digital part is just requests and a bit of infrastructure that you need. Let's compare this to Twitter because that puts this amount into a whole different light. Twitter has about 500M tweets a day, so if we compare just the PUT/Write requests then Santa will have about 4.4x the number of calls per day than Twitter. So if you owned Twitter would you be upset with a yearly bill of £73k ($99K) for 182,500,000,000 Tweets on your platform?

The non-digital side is really where the costs come from. Even then, it is AWS Connect (AWS's call centre service) with the highest value, which when you think about it, is quite reasonable. We are essentially spinning up a call centre for 195 countries expecting 2.5m calls a month in each of those locations, and there are lots of overheads in running a conventional call centre.

So what could Santa do about this cost? As far as we are aware, he doesn't make any money from selling these toys. He still has a workforce to buy cookies for. Santa is not likely going to be able to afford that kind of bill. Well at this scale, coming to an AWS Partner is probably the first thing you should do. Because of our AWS partnership, we are able to offer several tiers of discounts would reduce Santa's AWS spend.

Secondly, if Santa managed to get to 100% digital traffic then he would only be paying around £400K a year which is such a massive saving. This would probably organically grow year on year, but hey, Santa could use some of those 20K elves he doesn't need anymore to bridge the digital divide and build a budget phone around the world (the Elfone anyone??)

Conclusion

This was a really entertaining talk to present and I got the opportunity to work with some really excellent people at Infinity Works on it. I wanted to do this talk so that I could get practice making a proposal for a client. And also I wanted to learn more AWS, specifically CDK. Now we didn't get around to building this architecture, but I gained a lot of valuable lessons that I wouldn't have learnt by doing a project that's of an everyday scale. And now I have a complete architecture to work from to build this app in the future so I will still use it to learn.

One of the most valuable lessons I learned was that I shouldn't be afraid of exploring more complex solutions to problems instead of off the shelf services. Sometimes the difference in operational cost offsets the cost/time to build a custom solution like with our DynamoDB Identity Management Solution.

Also, something that's worth mentioning is that although the yearly bill is huge, it's just a stepping stone for achieving a fully digital platform. Ever since hearing the phrase "Do things that don't scale" by Paul Graham, I have really tried to live that. It just makes sense, things don't have to stay the same, sometimes you need to make a sacrifice to get to where you want to be. In this case, Santa needs a digital solution to reply to the Subject Access Requests. He has a client base who don't all have access to the internet, so he needs to have a way to get them into the system. He could spend years trying to bridge the digital divide before implementing the solution, but this would be at the risk of large fines from not being able to comply with the various regulations. Or he could build an expensive, short term solution that would free up resources for him to tackle the digital divide with less risk. Risk is the factor that the phrase "Do things that don't scale" tries to mitigate, which is ultimately what we all want.

And with that, I wish you a Merry Christmas! (if January isn’t too early)