The most appropriate solution for a bank that is utilizing Amazon MSK to feed real-time data into a data lake and is experiencing an almost full storage capacity is:
✅ "B. Create an Amazon CloudWatch alarm that monitors the KafkaDataLogsDiskUsed metric. Automatically flush the oldest messages when the value of this metric exceeds 85%".
✅ Here's why:
🔴 Now, let's examine why the other options are not the best choice:
❌ A. Use the Amazon MSK console to triple the broker storage and restart the cluster: Tripling the broker storage would not address the underlying issue of managing the disk space effectively and would also result in unnecessary costs.
❌ C. Create a custom Amazon MSK configuration. Set the log retention hours parameter to 48. Update the cluster with the new configuration file: Reducing the log retention hours to 48 may not suffice for the requirement of data being accessible within 24 hours, and could potentially lead to loss of data.
❌ D. Triple the number of consumers to ensure that data is consumed as soon as it is added to a topic: The number of consumers does not directly influence the storage usage in an Amazon MSK cluster. Moreover, tripling the consumers might introduce unnecessary cost and complexity.