The concern is documenting issues with storage sizes on aws redshift
1. explanatio and 2 ways of upscaling
- you upscale vertically, usually
- Redshift is a distributed columnar data warehouse solution.
- Unlike traditional databases — Redshift is designed to scale out by adding nodes to the cluster.
- Adding nodes adds
- computing power
- When storing data in Redshift, you should choose a distribution key (column or set of columns) that will evenly distribute your data across different nodes.
- As a general principle, you should use the same set of columns for your distribution key across all your tables.
- Note that Tables configured to use a distribution style of all will get replicated across all nodes; limit using dist style all to dimension tables only.
- There are different types of nodes that you can choose from depending on your requirement.
- DC1 are compute optimized nodes; they have smaller but faster SSD drives.
- DS1 nodes will provide you with significantly higher disk space per node.
- When you add nodes to your Redshift cluster, Redshift will re-distribute your data across all nodes as specified in the distribution style for each of your tables.
- Dense Storage offers magnetic hard drives (HHD)
- Dense Compute comes with SSD storage
- DC is faster, but comes with less storage space compared to DS.
- Also, DS nodes are more expensive than DC ones
- Redshift (also EMR) services run on EC2 infrastructure; however, Redshift instance families (ds1, dc1, ds2, dc2) aren’t available as standalone EC2 instances - only as Redshift nodes - and are subject to Redshift restrictions regarding Reserved purchases.