Hi—I have some free Azure credits and would like to use them to host a personal Lemmy instance. I know Lemmy is containerised, but is there a preferred choice for hosting in Azure—AKS, Azure Container Apps, Container Instances? Also, any guidance on appropriate PostgreSQL configuration—I know there are some options around that.
Also, can anyone point me at what resource utilisation will look like for a Lemmy instance—I imagine disk space is more of a concern that compute usage.
I’m currently trying to get Lemmy working on Azure Container Apps and Azure Postgres Flexible Server. I’ve got it all deployed, but I’m having some issues with the reverse proxy.
Regarding the ‘best’ choice - well it depends on what you mean by ‘best’. AKS will be the most flexible and ACI will probably be the simplest (if it will even work for Lemmy - I haven’t looked at ACI in years). Container Apps will probably be somewhere in the middle. Container Apps is just an abstraction over Kubernetes, so in theory you should get the scalability and flexibility of k8s without the overhead of managing a cluster.
I got Lemmy up and running on my home Rapberry Pi microk8s cluster pretty easily, so it will work fine on AKS for sure.
I’m looking at Container Apps just as a pet project because I’ve been waiting for a product like this for years. Kubernetes is awesome, but has always been too complicated for the average software developer to use. It needs a layer of abstraction and that’s what Container Apps is. So anyways I figured running Lemmy on it would be a good way to test drive it.
As I said though, I’ve run into some issues and am almost at the point where I was going to ask for help. If anyone’s interested, I can post links to my Github repos with my Terraform code and all that.
some free Azure credits
That’s probably not enough for a 3 node AKS (it used to be though) but even with one or two nodes having a familiar API is a plus. If you’re already experience with k8s or already have an AKS for other dev/fiddle stuff, that would be the obvious solution.
I haven’t even decided if I’ll run lemmy or kbin. Jerry Bell is currently running both.
Will you only be supporting yourself and maybe a small subset of users? If you don’t need your instance to scale, you can (shameless self plug) try my deployment script to get yourself running.
It just uses the recommended Postgres configuration as seen in the deployment files in Lemmy’s official repo. It would just be in a Docker volume on disk, so if you had thoughts of scaling in the future, and wanted to use a managed Postgres service, I would not recommend using my script.
I run an instance just for myself, CPU resources are so low that pretty much anything you can get in the cloud will be good. Disk space is a much more important factor. In terms of just Lemmy-created data, my personal 10-day instance has stored about 6.2GB of data. 2.4GB of this is just thumbnails. Note that this does not include other things that consume resources, such as my Docker images or my Docker build cache, which I clear manually.
So, that is roughly 640MB of new data generated per day. Your experience will vary depending on how many communities you subscribe to, but that’s a good rough estimate. Round it up to 700MB to have a safer estimate. But remember, this is with Lemmy’s current rate of activity. If the amount of posts and comments doubles, triples in the future, my storage requirements will likely go up considerably.
I am genuinely not sure what long-term Lemmy maintenance looks like in terms of releasing disk space. I can clear my thumbnail data and be fine, but I wonder what’s going to happen with the postgres database. Is there some way to prune old data out of it to save space? Will my cloud storage costs become so unreasonable in a year, that I’ll have to stop hosting Lemmy? These are the questions I don’t have answers to yet.
If there is something clever you can do to plan ahead and save yourself disk space costs in the future (like, are managed Postgres services cheaper to host than on disk ones?), I’d recommend doing that.
Thanks for the great reply—I’ll take a look at your deployment script to see if that fits my needs. I only plan to use the instance for me and a handful of friends. Like you say, data retention is probably my biggest concern so I’ll look at the most sensible way to budget for that in Azure. Are there any numbers available from the major Lemmy instances? Consideration for retention policies seem like a bit of an oversight—I might do some reading to see what the plan is here.
I’m not sure if the other instances have published their numbers, I can only see what my Docker volumes look like.
But if it helps you plan, you should know that federation only involves new data. When you set up a new instance, and federate with/subscribe to a community, it will only fetch an initial 20 posts (if that). From that point forward, you will receive a copy of all posts/comments posted to that community, but you will not have anything from before you federated. So you don’t have to worry about mirroring the entirety of a community’s history - I’d probably be out of disk space 3 times over if that were the case.
There are ways for users to retrieve “old” posts, but it’s done on an individual basis, not in bulk.