Colorado Edge Infrastructure

Nithya Shree Rajashekhar
Smarkets HQ
Published in
3 min readSep 11, 2023

--

If you are in a regulated industry with a mandate of wagering operations to take place within the restricted geographical area and feel the difficulty of maintaining physical hardware to host services with modern architecture, you might be interested in this blog.

We will be sharing our experience of making one of our colocations — Colorado — reliable with cloud benefits & resilient architecture.

AWS Local Zones

AWS Local Zone provides edge location infrastructure capabilities in places where there is no region but extends the network from a region. In our case we were aiming to use the Denver Local Zone subnet within us-west-2 VPC. We started off with enabling Local Zone availability zone us-west-den-1.

Not all AWS service offerings or configurations are available in all local zones, however we practice using cloud agnostic bootstrap tools with EC2, so it all worked out of the box.

The main benefits we observed is the pay-as-you-go pricing, on-demand scaling flexibility and ability to implement a well-architected setup, still complying to the stringent data residency requirements.

Some of the limitations we faced were:

  1. Our network performance benchmark tests from server in Colorado to Ireland suggested ~29ms additional latency as the peering connection is from us-west-2, which adds another hop
  2. Limited locations availability hindering our goal of standardisation across colos
  3. Selected instance types only
  4. Not the latest storage capability, with gp2 volumes only available at the time of writing
  5. Additional data transfer charges incurred for internet traffic, as NAT gateway is not available in Denver AZ

k3s

After being able to provision instances with Terraform, we were looking for a lightweight orchestration engine to deploy our micro services. Our expectation was that it worked well with our IAC, and was simply consistent across other jurisdictions (as we had only few mandatory services to be running there we didn’t want to over pay or over complicate but still wanted production grade)

After some comparisons, k3s stood out for our needs.

Combining the server/agent initialisation as part of cloudinit userdata and node failure handling with autoscaling groups over launch templates helped us build a fault tolerant system. For cluster level high availability, we followed their Fixed Registration Address architecture pattern where the k3s servers are behind a load balancer for the agents to register.

With addons like traefik, servicelb and local-storage disabled while initializing the cluster — EBS CSI driver works seamlessly, ingress-nginx’s service loadbalancer is created after having Cloud Controller Manager (CCM) work properly.

For future improvements, we are looking into -

  1. Potential use of external datastore for etcd to further decouple the system components
  2. Explore using spot instances and karpenter

Importantly, the solution also turned out to be cost effective for us!

--

--