First major SaaS production release in Go

Hi Everyone!

We have officially 'soft launched' our SaaS product, MailDB.io.

The entire backend of the project is coded in Go with the frontend in React. Multiple supporting tools have been created as well for this project in Go, such as our web crawler and goque.

We love Go and wanted to show everyone here a real project built with it, meant to make money, and running in production (and get a bit of that sweet exposure).

If you have any questions or feedback or anything, feel free to comment below!

评论：

__snake_likeGopher:

Did you start with go or did you try any other languages? If so could you elaborate on this a bit? Also would love to hear any challenges you faced as well as some of the library/framework choices used in your application. Thanks.

beeker1121:

I actually started out with PHP back in 2006, just learning web development in general with it and how to build dynamic web sites that use a database. Kept progressing and eventually built an affiliate network which took a year or so.

After that I built my first true SaaS product, Navilytics, which was also in PHP. Soon after running that, I started to look into new languages and first sought out Node to use as the backend, but luckily found Go before I got too deep into the project. Built a couple very small projects to learn Go and React (like resounden, the structure of this app is completely different, though) and then finally started building MailDB, and here we are :)

Good questions. The toughest part Go wise was just figuring out the structure of the application, how to separate everything. Originally, MVC seemed like the way to go (the Node way), but after reading some articles from Ben Johnson and learning Go a bit better, eventually went for a sort of domain-service type structure. This isn't perfect in my opinion, but the way everything is separated out, and eventually pulled together where needed, makes it very simple to use, modify, and navigate imho. Will be writing an article on this soon.

For framework and library choices - this project wouldn't be working if not for this brilliant port of LevelDB in Go. The search functions on the site just wouldn't return in the hundreds of milliseconds as they do, for any domain, if not for LevelDB. Other than that everything is pretty standard.

Thank you for the questions!

sethammons:

Can you go into more details on your choice of the levelDB port, and why your chose a key value store over a relational database? I'm curious about the performance you are talking about. I work with a popular email SaaS, and we search through dozens of sharded datasets when looking up unsubscribed emails, and if it took hundreds of milliseconds, that would not work for us. Plus it seems like you store metadata with each record.

My first, naive thought on what your schema should be is a domain table that has your domain metadata, then an email table which would have a two column index on the email prefix and foreign key domain. You could likely have another table that links potential similar records like john@example.com, john.doe@example.com, and john+linkedin@example.com.

And to be compliant with GDPR, you might have each unique email address actually stored in the relational dataset as a hashed value. You might need to maintain a table of GDPR redactions (just the hashed value) to ensure you don't re-populate that email address into your dataset on the next crawling (assuming you are crawling for the data).

This is all to say that I'm curious of how you set up your datastore, why you chose how you did, etc. While I feel there is room for a KV store, much of the data feels relational on initial thought.

beeker1121:

Very good points and thoughts.

One of the main goals was to use Go to try and keep the costs of the whole infrastructure as low as possible.

We're storing hundreds of millions of email addresses. Along with each of those email addresses could be hundreds of source URLs that particular address was found on. As you can imagine, the size of the database required to do this adds up very quickly.

MySQL was tested, along with PostgreSQL and even MongoDB. Sharding would have been required just to meet speed requirements if using MySQL or PostgreSQL, but given the size of the database, we would need either a few really powerful servers with huge memory capacity, or the dozens of less powerful machines. The costs of this though would add up.

Just doing some quick maths on the GCE calculator, we could get 24 n1-standard-1 instances each with 3.75 GB for a total of 90 GB for $582.54 per month. 90 GB wouldn't even come close to our database size. Instead, the entire email database and application runs on a single n1-standard-1 server with 2 cores and a 1 TB SSD. We get these 100-200 ms response times pulling everything from disk.

The crawler coded in Go is built with the same ideas. Crawl billions of web pages on a single machine, quickly, using just a couple gigs of memory and an SSD.

kl0nos:

we could get 24 n1-standard-1 instances each with 3.75 GB for a total of 90 GB for $582.54 per month

Dedicated server:

CPU: Intel 2x Xeon E5-2650v3 - 20c/40t - 2.3GHz /3GHz

RAM: 256GB DDR4 ECC 2133 MHz

HDD: 2x450GB SSD NVMe (or 2 x 1.2TB SSD NVMe for 30 euro more)

Price 315 euro and this is dedicated machine, NOT shared one, so all IOPS and CPU is yours.

Stop using those overpriced "clouds" if you don't use their other services like databases etc. It just dosen't calculate at all.

beeker1121:

True.

Going with GCE though, everything was easier. Management of the server in particular, such as creating backup snapshots, and even early on testing when I would create and delete VMs all the time - it made everything much better to deal with.

Plus, even with the dedicated server you posted, if we went with MySQL or another database and structure that still wouldn't be enough memory and we're still cheaper on the cloud. This was the major benefit of Go for us, which ties into performance. We could make it even more affordable by switching everything over to dedicated servers, but with the huge gains in software optimizations over hardware, it's not really an issue to us.

whizack:

1 big server hosted locally in your networking closet isn't the same as 24 instances in geo-redundant availability zones on a huge global backbone provider. There are far more costs to building a proper datacenter than just buying the machine to run it on.

sethammons:

That makes sense. I'm spoiled bwith big, beefy database nodes. If you are running one one server, how are you handling backups? Or is that something GCE handles?

beeker1121:

GCE offers some guarantees with disk redundancy, but it's very easy to just create an image (or snapshot as Google calls it) of the disk on the GCE platform and that serves as the backup.

You can even do this while the disk is attached to the instance, and the backup is incremental and automatically compressed for you.

__snake_likeGopher:

Thanks for the response!

Oliver_Fish:

Your signup appears to be broken, when I click the email verification link I get a 500 internal server error.

beeker1121:

Looks like it's an error from the MailChimp API, and it looks like the email you used should be valid... so this error is surprising coming from them.

I'm going to add in a check for it though and just return an invalid email error if we get a 400 response from them. Thank you!

Edit: Ok found out that while the email is valid in format, MailChimp considers it a fake address and rejects it, and I didn't have a check for that. Adding it to the verification page now. Thanks again!

Second edit: Fixed :)

用户登录

今日阅读排行

一周阅读排行

最新主题