This post is the first in a series, the aim is to create a service that is dependent on multiple storage backends(Postgres, S3), but behaves as if it was only one. Recent data should be kept in Postgres, older data should be transferred to S3.
The rationale behind such a service is to minimize costs, this post is investigating if it is justified to have such a service.
Usecase
we pretend that customers have transaction data, this data is immutable and downloaded once a month(CSV).
The data size varies between 100kb and 1.5 Gb, the larger files are less frequent than the smaller ones. The data is stored in an AWS RDS Postgresql database. The data is backed up once a month. Total data size < 10 T
For the calculation, I will focus mainly on the storage size and usage-related costs, additional VM costs are omitted.
User data size distribution
AWS RD Postgres, Multi-AZ, us-east
storage costs : $0.23 per GB-month
storage backup: $0.010 per GB
data-transfer: $0.09 per GB
S3 Standard storage
storage costs: $0.23 per GB-month
select-costs: $0.0004 / 1000 requests
COST overview per month
Users | Postgres | S3 |
---|---|---|
1K | $137.61 | $9.15 |
5K | $669.60 | $44.53 |
10K | $1344.17 | $89.38 |
100K | $13539.74 | $900.2 |
Conclusion
It is obvious that the storage for S3 is significantly cheaper even when we duplicate the data as a backup, also additional VM costs should be added for Postgres.But depending on data access patterns (now fictional), it seems plausible to have a hybrid approach or if the nonfunctional requirements allow it to depend fully on S3 for our storage needs.
To makes things interesting we will go for the hybrid approach in our next blog using GO
0 Comments
Post a Comment