First Friday Playlist: Muserk’s Wes Jones

This month’s First Friday playlist comes from Wes Jones – Senior Software Developer.

We talk about DSPs and artist storefronts a lot at Muserk. It’s no secret that probably the most influential medium for art, fashion, culture, and music right now is TikTok. For me, it’s completely changed how I find new music.

Years ago, before I left for college, I played in a metal band so I spent a lot time playing with other artists, traveling, meeting new people, and that was most often how I heard about new music. Maybe it was a recommendation from someone I met, an opener, a member of this band played in that band. A few years ago you could kind of curate avenues for new music to be force-fed to you based on what you were interested in. Pandora was a huge help and then we finally got Spotify in the US, but this was still a pretty active approach to finding new music. Remember “NOW That’s What I Call Music”? I kind of ripped off that idea and I would spend a ton of time every year finding new music and adding it a playlist called like NOW 30XX which would be all the music I found for that year. I still do this today, and I’m on NOW 3021 (I make it the current year, but 1000 years into the future).

Now with TikTok I’m able to take a more passive approach to finding new music as it’s kind of fed to me as I’m consuming other content. Here is a playlist of various songs and artists I that I was first introduced to through TikTok.

 

 

 

Cattle Not Pets

When I first heard the term “cattle not pets” it was the perfect metaphor to describe a concept I had always been aware of when developing for the cloud, but never had the words to describe. The idea that you should remove individuality from your cloud infrastructure and treat your resources as nameless and dynamic like cattle. Resources come and go all the time so there is no time to name, feed, and care for them as if they were pets.

I’m sure many of us have been somewhere that has a fleet of servers named after superheroes, Disney characters, or something exceedingly nerdy like Dr. Who villains. When we start talking about scalability, though, characters can’t be imagined fast enough. Not to mention the hand feeding required to spin up new instances of an application over and over again. As we were developing our cloud infrastructure to scale for Muserk, our first goal was to never connect directly to an instance again. This felt like a great starting point to answer the question of how do we deploy applications, manage state, and debug issues that arise. This is mostly a qualitative look at how we began to scale our operations in the early days of Muserk…. So for you super nerds out there we won’t go into detail about things like load balancing, caching, or infrastructure as code.

DEPLOYING APPLICATIONS

Probably the most important aspect of scaling is being able to deploy an application programmatically. Once we can do that everything else is just a facility. The obvious answer here is Docker. The more advanced answer involves Kubernetes or Terraform, but that’s a topic for another day. With a containerized application we can control dependencies, versions, the operating system, and any configuration that needs to be done ahead of time. So, all we need is a platform to run our container. The advantage of this is that this platform can be anything! The container will run exactly what we need, the exact same way, anywhere that can support docker. Once the process of starting one of these containers is automated, we are free to start up as many as we would like, allowing a load balancer to route traffic appropriately.

MANAGING STATE

Next there is the problem of how to manage state on a server instance that is essentially garbage. Writing to disk is out of the question because all of that information would be lost from instance to instance. Well, what about NFS? This could be a plausible solution, but too slow without provisioned IOPs (which are expensive in the cloud). Besides, we should do better!

In fact, this was the starting point for really honing our data model and forced us to come up with a first pass at some sort of ETL. As we ingest data, how do we store it so that our applications can access it in a consistent way? Once all of our data is in one place, we can use it as our Single Source of Truth. Using a database as a SSOT is its own complexity. The real lesson for managing state across a scalable infrastructure is to AVOID state when you can.

DEBUGGING ISSUES

Most commonly, the reason for needing to log into an individual instance is typically to figure out what went wrong. As resources start to scale this gets increasingly difficult anyway because an error could have occurred on any one of 4, 10, or, theoretically, n number of instances. So how do we figure out where problems are happening and how to fix them? There are all sorts of things to monitor across our applications. User experiences, resource trends, load times, are a couple of examples. Most importantly, in my opinion, are the error logs.

When an error occurs, we want to be made aware of it. At first pass you should be using a logger. A logger lets us standardize how we create new logs by assigning a category for each type of log and ordering them by severity. Some common categories include DEBUG, INFO, and ERROR. In this example, DEBUG level logs may be information that would be helpful when figuring out what happened, but not necessary to be looked through all the time. INFO-level logs are adding a bit higher severity. These are messages we may always want to see so that we can see usage in real time. ERROR logs, being the most severe, can be alerted on. We can configure our system to report when an ERROR has been logged so that we can take immediate action. We can then use the INFO and DEBUG logs to determine what happened. If we’ve done it correctly these logs will have information on the unique machine the application is running on so we can handle hardware-specific problems. Once we are collecting logs from all machines across all applications, we can begin to build dashboards around each application. Combined with usage and hardware metrics, we have a central location to view all relevant information.

I hope this was in some way helpful for thinking about your own cloud infrastructure.  As we continue to improve our architecture, we hope to have more to share. We are evolving our technology every day and are working hard to improve our ETL workflows and integrations into the substantial amount of processing we are doing with the data we generate. In the meantime, we will continue to backfill posts with what we have learned and implemented along the way of this journey into the final frontier.

Cattle Not Pets – Configuring Scalable Resources

When I first heard the term “cattle not pets” it was the perfect metaphor to describe a concept I had always been aware of when developing for the cloud, but never had the words to describe. The idea that you should remove individuality from your cloud infrastructure and treat your resources as nameless and dynamic like cattle. Resources come and go all the time so there is no time to name, feed, and care for them as if they were pets.

I’m sure many of us have been somewhere that has a fleet of servers named after superheroes, Disney characters, or something exceedingly nerdy like Dr. Who villains. When we start talking about scalability, though, characters can’t be imagined fast enough. Not to mention the hand feeding required to spin up new instances of an application over and over again. As we were developing our cloud infrastructure to scale, our first goal was to never connect directly to an instance again. This felt like a great starting point to answer the question of how do we deploy applications, manage state, and debug issues that arise. This is mostly a qualitative look at how we began to scale our operations in the early days of Muserk so we won’t go into detail about things like load balancing, caching, or infrastructure as code. If you’re looking for that type of thing stay tuned!

Deploying Applications

Probably the most important aspect of scaling is be able to deploy an application programatically. Once we can do that everything else is just facility. The obvious answer here is Docker. The more advanced answer involves Kubernetes or Terraform, but that’s a topic for another day. With a containerized application we can control dependencies, versions, the operating system, and any configuration that needs to be done ahead of time. So all we need is a platform to run our container. The advantage of this is that this platform can be anything! The container will run exactly what we need, the exact same way, anywhere that can support docker. Once the process of starting one of these containers is automated, we are free to start up as many as we would like allowing a load balancer to route traffic appropriately.

Managing State

Next there is the problem of how to manage state on a server instance that is essentially garbage. Writing to disk is out of the question because all of that information would be lost from instance to instance. Well, what about NFS? This could be a plausible solution, but too slow without provisioned IOPs (which are expensive in the cloud). Besides, we should do better!

In fact, this was the starting point for really honing our data model and forced us to come up with a first pass at some sort of ETL. As we ingest data, how do we store it so that our applications can access it in a consistent way? Once all of our data is in one place we can use it as our Single Source of Truth. Using a database as a SSOT is its own complexity. The real lesson for managing state across a scalable infrastructure is to AVOID state when you can.

Debugging Issues

Most commonly, the reason for needing to log into an individual instance is typically to figure out what went wrong. As resources start to scale this gets increasingly difficult anyway because an error could have occurred on any one of 4, 10, or, theoretically, n number of instances. So how do we figure out where problems are happening and how to fix them? There are all sorts of things to monitor across our applications. User experiences, resource trends, load times, are a couple of examples. Most importantly, in my opinion, are the error logs.

When an error occurs we want to be made aware of it. At first pass you should be using a logger. A logger lets us standardize how we create new logs by assigning a category for each type of log and ordering them by severity. Some common categories include DEBUG, INFO, and ERROR. In this example, DEBUG level logs may be information that would be helpful when figuring out what happened, but not necessary to be looked through all the time. INFO-level logs are adding a bit higher severity. These are messages we may always want to see so that we can see usage in real time. ERROR logs, being the most sever, can be alerted on. We can configure our system to report when an ERROR has been logged so that we can take immediate action. We can then use the INFO And DEBUG logs to determine what happened. If we’ve done it correctly these logs will have information on the unique machine the application is running on so we can handle hardware-specific problems. Once we are collecting logs from all machines across all applications we can begin to build dashboards around each application. Combined with usage and hardware metrics, we have a central location to view all relevant information.

I hope this was in some way helpful for thinking about your own cloud infrastructure. We have come a long way in the past several years, and still have a long way to go. As we continue to improve our architecture we hope to have more to share. In. the meantime we will continue to backfill posts with what we have learned and implemented along the way.

Update: Working as a Distributed team

It has been a wild few months here at Muserk. We’ve onboarded some major players as customers, started a joint venture in Japan to combat piracy, and successfully executed our largest payout to date. To say things have been moving fast would be an understatement. Taking on this rapid growth as a distributed team has been its own effort so we wanted to check in and give an update on some things that have and have not worked for us.

Weekly all-hands meeting

This has been a crucial meeting for us to not only get an update on each of our silos, but to have an opportunity to discuss what is on the horizon. This is a practice we would continue even in an office.

Retrospective 

Taking time to reflect on the lessons we’ve learned each week has always been a key to each of our team’s successes at Muserk. While the decision to work remotely has only amplified the utility of the retrospective, it has also changed how they get conducted. This meeting for us used to be predictable at times, and usually we were just filling out a template. These days it’s more ad hoc, but somehow has become more constructive. It feels more qualitative than quantitative and, in turn, more successful.

Virtual White Boards

I think we were a little skeptical about how useful this was going to be when we ordered tablets. It sounded nice, but could quickly become just a novelty. As it turns out, we use a tablet as a white board almost every meeting! When we decided to move remote we probably took the whiteboard for granted. In hindsight it was probably one of our most used tools.

Scheduled Social Time

We went through a few iterations of scheduled social time like virtual coffee or happy hour. This ended up not being something that stuck. Instead these happen intermittently whenever schedules allow. Have we accidentally socialized for the entire duration of a scheduled meeting? Plenty of times. More often, however, meetings turn into social events once we’re finished. Or that quick question you need to ask your would-be neighbor might become a 45 minute catch up.

A lot of how our collaboration has changed at Muserk is due to the lower amount of face-to-face time we receive. We’ve lost the personal aspect of working in an office so meetings can sometimes feel like a social event. This makes the time box we allocate for meetings feel more relaxed as more social interaction comes into play. Which sheds light on a larger stance we take on culture at Muserk. In a sense, culture may not be something you can force on a decentralized team. While it can, and should, in some ways be guided by leadership, it’s something that we are finding sometimes has to grow naturally within the company, within the different teams, and within individual relationships.

Working Remotely

How Our Office Prepared Us To Be A Remote Team

As many startups do, Muserk began as a fully remote team. Once our business solidified, workflows increased and collaboration became increasingly more important. The logical next step was to get as many people as possible into one place. With team members all over the world, however, we couldn’t expect the entire company to move to Nashville.

In the throes of COVID-19 we were forced to move the team remote. We had gotten used to office life, and the team had grown significantly. Were those distant memories of a fully remote team lost? We couldn’t be sure how big of a challenge this would be to overcome. Luckily we still had members of the team outside of Nashville, and all along we had accidentally been preparing for this.

The office has always served as a hub for us. Once a quarter we assemble in Nashville for a week to share everything the company has been doing, and the division in each group’s effort is obvious. Each team is working on their own thing, and some of that knowledge doesn’t come across day to day. Because it doesn’t need to. Communication is key to facilitating a remote team, and, like any good team, we should even strive to over-communicate. That over-communication can quickly become a distraction, however, if it’s not effective.

Working in an office we’ve figured out what information streams matter to us, how to separate those, and how to tap into the ones we care about. In order to subscribe to all of the conversations happening at Muserk, and mute them when they get in the way, we make our chat channels as granular as possible. Discussions often happen in chat rather than in person, and we are in the habit of posting the results for those who were not there. We send casual meeting invites to those who may or may not care about the topic in case they want to be involved. Scheduled meetings auto-create video links within our calendars, and we join with tablets to use as whiteboards. When COVID-19 kept us from getting to the office, we worried about how it would hinder collaboration, and in hindsight we had been preparing all along. Without even knowing it we had fostered a remote-first culture that even our new hires, with no remote experience, were able to seamlessly adapt.

So has the office become unnecessary? Without it we wouldn’t have figured out who we are as a team. The lessons about communication may have required more effort, or taken longer to develop. We inadvertently learned a valuable lesson about disaster preparedness that we can carry on into the future. When we go back to the office, this moment will remain in the back of our minds. We may never be in this situation again, but if we are, the transition will be just as seamless.