AWS Kinesis
Guillaume Simard
- Coveo
- Big data
Record application usage
122M
235M
1M
much data
such throughput
so wow
AWS Kinesis Streams
real-time streaming data
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data, and also providing the ability for you to build custom streaming data applications for specialized needs.
Core concepts
Record
{
"Records": [
{
"Data": "XzxkYXRhPl8w",
"PartitionKey": "partitionKey1"
},
{
"Data": "AbceddeRFfg12asd",
"PartitionKey": "partitionKey1"
},
{
"Data": "KFpcd98*7nd1",
"PartitionKey": "partitionKey3"
},
"StreamName": "myStream"
}
Core concepts
Shard
Core concepts
Shard iterator
{
"ShardIterator": "AAAAAAAAAAETYyAYzd665+8e0X7JTsASDM/Hr2rSwc0X2qz93iuA3udrjTH+ikQvpQk/1ZcMMLzRdAesqwBGPnsthzU0/CBlM/U8/8oEqGwX3pKw0XyeDNRAAZyXBo3MqkQtCpXhr942BRTjvWKhFz7OmCb2Ncfr8Tl2cBktooi6kJhr+djN5WYkB38Rr3akRgCl9qaU4dY="
}
Core concepts
Data retention
24 to 168 hours
Producers
AmazonKinesisClient amazonKinesisClient = new AmazonKinesisClient(credentialsProvider);
PutRecordsRequest putRecordsRequest = new PutRecordsRequest();
putRecordsRequest.setStreamName("DataStream");
List <PutRecordsRequestEntry> putRecordsRequestEntryList = new ArrayList<>();
for (int i = 0; i < 100; i++) {
PutRecordsRequestEntry putRecordsRequestEntry = new PutRecordsRequestEntry();
putRecordsRequestEntry.setData(ByteBuffer.wrap(String.valueOf(i).getBytes()));
putRecordsRequestEntry.setPartitionKey(String.format("partitionKey-%d", i));
putRecordsRequestEntryList.add(putRecordsRequestEntry);
}
putRecordsRequest.setRecords(putRecordsRequestEntryList);
PutRecordsResult putRecordsResult = amazonKinesisClient.putRecords(putRecordsRequest);
System.out.println("Put Result" + putRecordsResult);
Consumers
// Create shard iterator
GetShardIteratorRequest getShardIteratorRequest = new GetShardIteratorRequest();
getShardIteratorRequest.setStreamName(myStreamName);
getShardIteratorRequest.setShardId(shard.getShardId());
getShardIteratorRequest.setShardIteratorType("TRIM_HORIZON");
GetShardIteratorResult getShardIteratorResult;
getShardIteratorResult = client.getShardIterator(getShardIteratorRequest);
String shardIterator = getShardIteratorResult.getShardIterator();
// Get the records
GetRecordsRequest getRecordsRequest = new GetRecordsRequest();
getRecordsRequest.setShardIterator(shardIterator);
getRecordsRequest.setLimit(25);
GetRecordsResult getRecordsResult = client.getRecords(getRecordsRequest);
List<Record> records = getRecordsResult.getRecords();
// Get the next shard iterator
shardIterator = getRecordsResult.getNextShardIterator();
KCL
#! /usr/bin/env ruby
require 'aws/kclrb'
class SampleRecordProcessor < Aws::KCLrb::RecordProcessorBase
def init_processor(shard_id)
# initialize
end
def process_records(records, checkpointer)
# process batch of records
end
def shutdown(checkpointer, reason)
# cleanup
end
end
if __FILE__ == $0
# Start the main processing loop
record_processor = SampleRecordProcessor.new
driver = Aws::KCLrb::KCLProcess.new(record_processor)
driver.run
end
KCL
-
Connects to the stream
-
Enumerates the shards
-
Coordinates shard associations with other workers (if any)
-
Instantiates a record processor for every shard it manages
-
Pulls data records from the stream
-
Pushes the records to the corresponding record processor
-
Checkpoints processed records
-
Balances shard-worker associations when the worker instance count changes
-
Balances shard-worker associations when shards are split or merged
Pricing
Shard Hour (1MB/second ingress, 2MB/second egress) | $0.015 |
PUT Payload Units, per 1,000,000 units | $0.014 |
Extended Data Retention (Up to 7 days), per Shard Hour | $0.020 |
Pricing
For $1.68 per day ($52.14 per month)
4MB of data per second, or 337GB of data per day (4 shards)
Increase Amazon Kinesis stream’s data retention period from 24 hours to up to 7 days for an extra $59.52 per month.
https://aws.amazon.com/kinesis/streams/pricing/
Kinesis Firehose
Kinesis Firehose
Easy integration with AWS
- S3
- Redshift
- DynamoDB
- API Gateway
Fully managed
TL;DR
Kinesis is pretty cool
careers.coveo.com
?
AWS Kinesis
By Guillaume Simard
AWS Kinesis
AWS Redshift introduction
- 2,760