Kafka with Spring Boot Part 2
4 common use case of Kafka API
- Source -> Kafka. (Producer API) (Kafka Connect Source)
- Kafka -> Kafka (Producer/ Consumer API) (Kafka Streams)
- Kafka -> Sink (Consumer API) (Kafka Connect Sink)
- Kafka -> App (Consumer API)
Kafka Connect - High Level
- Source Connectors to get data from common data source
- Sink Connectors that data in common data stores
- Make it easy to get data reliably in Kafka
- Part of ETL pipeline
Kafka Connect Concepts
- Kafka Connect cluster has multiple loaded connectors
- Each connector is a re-usable piece of code (java jars)
- Most of the connectors exist in open source world
- Connector + User Configuration = Tasks
- Tasks are executed by Kafka connect worker server
Getting Confluent
- Go to https://docs.confluent.io/platform/current/installation/installing_cp/zip-tar.html
- Download the zip file of latest version i.e http://packages.confluent.io/archive/6.1/confluent-6.1.0.zip
- Extract the zip file once it is download
SetUp for Kafka Connect
- Go inside the bin folder and start the zookeeper
- Start Kafka Broker
- Start the Schema Registery
./zookeeper-server-start ../etc/kafka/zookeeper.properties
./kafka-server-start ../etc/kafka/server.properties
./schema-registry-start ../etc/schema-registry/schema-registry.properties
Reading file data with connect
-
We require two configuration files to startup a FileStreamSourceConnector to read data from a file and output it to Kafka
- ./etc/schema-registry/connect-avro-standalone.properties : It stores the data conversion format via AvroConverter
- ./etc/kafka/connect-file-source.properties : It stores file related configuration from which we need to read data
- Create a file test.txt and in the connect-file-source.properties specify its absolve path in file property
- Enter some data in test.txt file
- Start Kafka connect instance
- Start console consumer in another console to inspect the contents of the topic
./connect-standalone ../etc/schema-registry/connect-avro-standalone.properties \
../etc/kafka/connect-file-source.properties
./kafka-avro-console-consumer --bootstrap-server localhost:9092
--topic connect-test --from-beginning
Write File Data with connect
- Create a file test.sink.txt
- Open the file ./etc/kafka/connect-file-sink.properties
- In the file property mention the absolute path for test.sink.txt
- Run the command below to run source file and sink file process
- Now enter the data in test.txt you will observe that the data will persist in test.sink.txt
./connect-standalone ../etc/schema-registry/connect-avro-standalone.properties \
../etc/kafka/connect-file-source.properties ../etc/kafka/connect-file-sink.properties
Kafka Connector to Mysql source
- Download the mysql jar from the link https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.23/mysql-connector-java-8.0.23.jar
- Install kafka connect jdbc
- Now paste the Mysql jar file in the location : confluent-6.1.0/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib
- Create a folder kafka-connect-jdbc in confluent-6.1.0/etc/
./confluent-hub install confluentinc/kafka-connect-jdbc:latest
- Inside the kafka-connect-jdbc folder create a file source-quickstart-mysql.properties
- Enter the following properties in the file
- Create table and enter some values
name=test-source-mysql-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://127.0.0.1:3306/mydb?user=root&password=
mode=incrementing
incrementing.column.name=rollno
topic.prefix=test-mysql-jdbc-
table.whitelist=user4
create table user4(
fname varchar(30),
rollno int(6) primary key);
insert into user4 values ("sunny",1);
insert into user4 values ("ginny",2);
- Start the worker for jdbc source connect
- List the topics
- Launch Consumer to inspect the data
./connect-standalone ../etc/schema-registry/connect-avro-standalone.properties
../etc/kafka-connect-jdbc/source-quickstart-mysql.properties
./kafka-topics --bootstrap-server localhost:9092 --list
./kafka-avro-console-consumer --bootstrap-server localhost:9092
--topic test-mysql-jdbc-user4 --from-beginning
JDBC Sink Connector
- Create sink-quickstart-mysql.properties file in /etc/kafka-connect-jdbc/ folder
- Add the following properties in the properties file
name=test-sink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=test-mysql-jdbc-user4
connection.url=jdbc:mysql://127.0.0.1:3306/mydb?user=root&password=
auto.create=true
- Run the command to read data from topic and write it in database
- Here we are reading from the topic which we created earlier. Now in the database you will see that table with the name test-mysql-jdbc-user4 has been created and data is read from the topic and inserted in the table
./connect-standalone ../etc/schema-registry/connect-avro-standalone.properties
../etc/kafka-connect-jdbc/sink-quickstart-mysql.properties
Running 2 worker threads for JDBC Source and Sink Connector
- Open new terminal and create a copy of /etc/schema-registry/connect-avro-standalone.properties with the name /etc/schema-registry/connect-avro-standalone-1.properties
- In the connect-avro-standalone-1.properties mention the property rest.port=8084
- Run the following command to initiate JDBC Source Worker
- Now you will observe that when you insert data in user4 than that data is also reflection in table select * from `test-mysql-jdbc-user4`;
./connect-standalone ../etc/schema-registry/connect-avro-standalone-1.properties
../etc/kafka-connect-jdbc/source-quickstart-mysql.properties
Install and Run mongo 3.6 using docker
- Pull the docker image
- start the docker container
- enter in the container
- Connect with mongo
- Initiate replica sets
- Create db
docker image pull mongo:3.6
docker container run --name mongodb3.6 -d -p 27018:27017 <container-id>
mongod --replSet my-mongo-set
docker exec -it mongo3.6 /bin/bash
mongo --host 127.0.0.1:27017
rs.initiate()
use kafka-topics
Mongo sink Connect
- install Mongodb Kafka connector
- Create a file /etc/kafka-connect-mongo/sink-quickstart-mongodb.properties and enter the following properties in it
- Run the Mongo Sink connect worker
- Inspect the data in mongo console
./confluent-hub install mongodb/kafka-connect-mongodb:1.4.0
tasks.max=1
connection.uri=mongodb://localhost:27018
database=kafka-topics
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
topics=test-mysql-jdbc-user4
name=mongo-sink-worker
collection=user
./connect-standalone ../etc/schema-registry/connect-avro-standalone.properties
../etc/kafka-connect-mongo/sink-quickstart-mongodb.properties
db.user.find()
Mysql Sink to Mongo Source
- In addition to the mongo sink worker also spin up mysql source worker for the same topic
- Now if you insert the record in user4 table of mydb mysql database you will observe that the same record in reflected in mongodb user collection under kafka-topics database
./connect-standalone ../etc/schema-registry/connect-avro-standalone-1.properties
../etc/kafka-connect-jdbc/source-quickstart-mysql.properties
Exercise 1 (Do any two)
- Push the data of one file in another file using Kafka connect using File source and Sink connector.
- Reflect the inserts in one mysql table into another mysql table using JDBC source and sink connectors.
- Reflect the inserts in one mysql table into a Mongo DB collection.
Kafka Stream
- It is a stream processing framework
- It is an alternative to apache spark, nifi and Flink
- It reads the data from one topic and place it in the different topic after some transformation
Stream Processors
- Stream processor process incoming data stream
- Stream Processor can create new output stream
- Data flow from parent to child
- Child stream processor can define another child
Streams
Stream Processor
Source Processor
Sink Processor
Source Processor
- Does not have upstream
- Consumes from one or more kafka topics
- Forwarding data to downstream
Sink Processor
- Does not have downstream
- Receive data from upstream
- Send data to specific data topic
KStreams
- Ordered Sequence messages
- Unbounded
- Insert data
- Use case
- Topic is not log compacted
- Data is partial information (Bank Transaction)
KTable
- Unbounded
- Insert and update based on key
- Delete on null Value
- Topic is log compacted
- Data is self sufficient (Bank Balance)
Log Compacting
- Kafka admin process
- Keep at least latest values and delete the older
- Based on record Key
- Useful if we need latest snapshot
- Configure when creating topics
- Keeps the Order
Create 2 topics
- promotion-code
- promotion-code-upper
- Spin up a producer for promotion-code
- Spin up a consumer for promotion-code-upper
./kafka-topics --create --topic promotion-code-upper -zookeeper localhost:2181
--replication-factor 1 --partitions 3
./kafka-topics --create --topic promotion-code -zookeeper localhost:2181
--replication-factor 1 --partitions 3
./kafka-console-producer --topic promotion-code --broker-list localhost:9092
--property parse.key=true --property key.separator="-"
./kafka-console-consumer --topic promotion-code-upper --bootstrap-server localhost:9092 \
--from-beginning \
--property print.key=true \
Note : Command to delete the topic
./kafka-topics --zookeeper localhost:2181 --delete --topic <topic-name>
Set Up
- Go to start https://start.spring.io/
- Create a gradle project with following dependencies
- Spring for apache Kafka
- Spring for apache Kafka stream
- Lombok
- Spring Boot dev tools
- Download the zip file for the project
- Extract the project and Import it in intellij Idea
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.kafka.annotation.EnableKafka;
import org.springframework.kafka.annotation.EnableKafkaStreams;
@SpringBootApplication
@EnableKafkaStreams
@EnableKafka
public class KafkaStreamApplication {
public static void main(String[] args) {
SpringApplication.run(KafkaStreamApplication.class, args);
}
}
Enable Kafka and Kafka Stream for Spring Boot application
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.StreamsConfig;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.annotation.KafkaStreamsDefaultConfiguration;
import org.springframework.kafka.config.KafkaStreamsConfiguration;
import java.util.HashMap;
import java.util.Map;
@Configuration
public class KafkaStreamPropertyConfiguration {
@Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
public KafkaStreamsConfiguration kafkaStreamsConfiguration(){
Map<String,Object> props = new HashMap<>();
props.put(StreamsConfig.APPLICATION_ID_CONFIG,"kafka-stream");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG,0);
return new KafkaStreamsConfiguration(props);
}
}
Setup KafkaStreamsConfigurations
Perform transformation of promotion code from promotion-code topic to upper case and place it in promotion-code-upper topic
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Printed;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class KafkaStreamPromotionCode {
@Bean
public KStream<String,String> kStreamPromotionUppercase(StreamsBuilder streamsBuilder){
KStream<String,String > sourceStream = streamsBuilder
.stream("promotion-code", Consumed.with(Serdes.String(),Serdes.String()));
KStream<String,String> uppercaseStream = sourceStream.mapValues(e->e.toUpperCase());
uppercaseStream.to("promotion-code-upper");
sourceStream.print(Printed.<String, String>toSysOut().withLabel("code"));
uppercaseStream.print(Printed.<String, String>toSysOut().withLabel("Upper-Case-Code"));
return sourceStream;
}
}
On Publishing data from promotion-code topic producer you will see that the value has been converted in upper case and read by consumer of promotion-code-upper topic
{code:"asdnas"} => {CODE:"ASDNAS"}
Our code has also changed the code attribute to upper but we just want to change its value in upper.
Let's see how we do it.
Create Promotion code class
public class PromotionCode {
private String code;
public String getCode() {
return code;
}
public void setCode(String code) {
this.code = code;
}
@Override
public String toString() {
return "PromotionCode{" +
"code='" + code + '\'' +
'}';
}
}
Transformation using Spring JSON Serde
import com.course.kafka.kafkastream.entity.PromotionCode;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Printed;
import org.apache.kafka.streams.kstream.Produced;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.support.serializer.JsonSerde;
@Configuration
public class KafkaSteamJSONPromotionCode {
@Bean
public KStream<String,PromotionCode> kStreamPromotionUppercase(StreamsBuilder streamsBuilder){
KStream<String,PromotionCode > sourceStream = streamsBuilder
.stream("promotion-code", Consumed.with(Serdes.String(),new JsonSerde<>(PromotionCode.class)));
KStream<String,PromotionCode> uppercaseStream = sourceStream.mapValues(this::uppercasePromotionCode);
uppercaseStream.to("promotion-code-upper", Produced.with(Serdes.String(),new JsonSerde<>(PromotionCode.class)));
sourceStream.print(Printed.<String, PromotionCode>toSysOut().withLabel("code"));
uppercaseStream.print(Printed.<String, PromotionCode>toSysOut().withLabel("Upper-Case-Code"));
return sourceStream;
}
private PromotionCode uppercasePromotionCode(PromotionCode promotionCode){
promotionCode.setCode(promotionCode.getCode().toUpperCase());
return promotionCode;
}
}
Keys and Partitions
- Key and partitions are related
- partitions are allocated according to key
- whenever the key of a message changes it triggers repartitioning.
- Operations on Kafka stream which changes the key trigger repartitioning
- As soon as an operation can possible change the key the streams will be marked for repartition.
- Map
- flatMap
- selectKey
- Repartitioning is done seamlessly behind the scene but will incur a performance cost.
Kafka Stream Operations
- mapValues
- Take one record produce one record
- Does not change key
- Affect only value
- Does not trigger repartition
- Intermediate Operation
- Available on KStream and KTable
stream.mapValues(v->v+10);
- map
- Takes one record and produce one record
- Change key
- Change value
- Trigger repartition
- Intermediate Operation
- KStream
stream.map((k,v)->KeyValue.pair("X"+k,v*5));
- filter
- Takes one record, produces one or zero record
- Produce record that match condition
- Does not change key and value
- Does not trigger repartition
- Intermediate Operation
- KStream and KTable
stream.filter((k,v)->v%2==0)
- filterNot
- Takes one record, produces one or zero record
- Produce record that not match condition
- Does not change key and value
- Does not trigger repartition
- Intermediate Operation
- KStream and KTable
stream.filterNot((k,v)->v%2==0)
- branch
- split stream based on predicate
- evaluate predicate in order
- record only placed once on first match, drop unmatched records
- Returns array of Stream
- Intermediate operation
- KStream
stream.branch(
(k,v)->v>100,
(k,v)->v>20,
(k,v)->v>10,
)
- selectKey
- Takes one record and produce one record
- Set / replace record key
- possible to change key data type
- Trigger repartitioning
- Does not change value
- Intermediate Operation
- KStream
stream.selectKey((k,v)->"A"+k)
- flatMapValues
- Takes one record, produces zero and more records
- Does not change key
- Affect only value
- Does not trigger repartitioning
- Intermediate Operation
- Kstream
//split a sentence into words
sentencesStream.flatMapValues(value->Arrays.asList(value.split("\\s+")));
// (alice, alice is nice) tranfroms to (alice,alice), (alice,is), (alice, nice)
- flatMap
- Takes one record, produce zero or more record
- Change key
- Change value
- Trigger Repartitioning
- Intermediate Operation
- KStream
KStream<Long, String> stream = ...;
KStream<String, Integer> transformed = stream.flatMap(
// Here, we generate two output records for each input record.
// We also change the key and value types.
// Example: (345L, "Hello") -> ("HELLO", 1000), ("hello", 9000)
(key, value) -> {
List<KeyValue<String, Integer>> result = new LinkedList<>();
result.add(KeyValue.pair(value.toUpperCase(), 1000));
result.add(KeyValue.pair(value.toLowerCase(), 9000));
return result;
}
);
- groupByKey
- Intermediate Operation
- group records by existing key
stream.groupByKey()
- groupBy
- Intermediate Operation
- group records by new key
stream.groupBy((k,v)->v%2)
- forEach
- Terminal Operation
- Takes one record, produce none
- KStream and KTable
strea.forEach((k,v)->insertToDatabase(v))
- Peek
- Produce unchanged stream
- Result stream can be further processed
- Intermediate Operation
- KStream
stream.peek((k,v)->insertIntoDatabase(v)).[nextProcess]
- print
- Terminal operation
- Print each record
- print to file or console
- KStream
stream.print(Printed.toSysout())
- to
- Terminal operation
- Write the stream to destination topic
- KStream
stream.to("output-topic")
- through
- Intermediate operation
- Write stream to destination topic
- Continue record processing
- KStream
stream.through("output-topic").[nextProcess]
Problem statement
Find out word count from a stream of sentences
- Create sentence topic
- Create work-count topic
- Spin up producer for topic sentence
- Spin up consumer for word-count sentence
./kafka-topics --create --topic sentence -zookeeper localhost:2181
--replication-factor 1 --partitions 3
./kafka-topics --create --topic word-count -zookeeper localhost:2181
--replication-factor 1 --partitions 3
./kafka-console-producer --topic sentence --broker-list localhost:9092
./kafka-console-consumer --topic word-count --bootstrap-server localhost:9092 \
--property print.key=true \
--property key.separator="-"
High level DSL for find out word count from sentences
- Stream from kafka <null, Kafka Kafka Stream>
- MapValues lowercase <null, kafka kafka stream>
- FlatMapValues split by space <null,kafka><null,kafka><null,stream>
- SelectKey <kafka,kafka><kafka,kafka><stream,stream>
- GroupByKey (<kafka,kafka><kafka,kafka>) (<stream,stream>)
- Count occurrence in each group <kafka,2> <stream,1>
Spring kafka sentence to word-count stream transformation
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.kstream.KStream;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.Arrays;
@Configuration
public class KafkaSentenceToWordCountStreamConfig {
@Bean
KStream<String,String> sentenceToWordsStreamProcessor(StreamsBuilder streamsBuilder){
KStream<String,String> kStream = streamsBuilder.stream("sentence");
kStream
.mapValues(s->s.toLowerCase())
.flatMapValues(s-> Arrays.asList(s.split("\\s+")))
.selectKey((k,v)->v)
.groupByKey()
.count()
.toStream()
.mapValues(e->e.toString())
.peek((key,value)-> System.out.println(String.format("Key :: %s, Value :: %s",key, value)))
.to("word-count");
return kStream;
}
}
Most Favourite Colour with KTable
- Create colour topic
- Create a producer for colour topic
./kafka-topics --create --topic colour -zookeeper localhost:2181
--replication-factor 1 --partitions 3
./kafka-console-producer --topic colour --broker-list localhost:9092
--property parse.key=true --property key.separator="-"
Reading from topic as KTable
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.kstream.KStream;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class KafkaColourCountStream {
@Bean
KStream<String,String> colourCount(StreamsBuilder streamsBuilder){
return streamsBuilder.table("colour")
.groupBy((key, value) -> KeyValue.pair(value,value) )
.count()
.toStream()
.map((k,v)->KeyValue.pair(k.toString(),v.toString()))
.peek((key, value) -> System.out.println(String.format("Key :: %s, Value :: %s",key,value)));
}
}
Start producing messages with producer
>a-yellow
>b-yellow
>a-green
>b-green
>a-blue
>b-blue
Problem Statement
On the Basis on Bank Transactions calculate the Bank Balance
- Create a Topic Bank Transaction
- Spin up producer for the topic
./kafka-topics --create --topic bank-transaction -zookeeper localhost:2181
--replication-factor 1 --partitions 3
./kafka-console-producer --topic bank-transaction --broker-list localhost:9092
Create BankTransaction Entity
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import lombok.ToString;
@NoArgsConstructor
@AllArgsConstructor
@Data
@ToString
public class BankTransaction {
private String name;
private Long amount;
}
Create Stream to calculate total balance
import com.course.kafka.kafkastream.entity.BankTransaction;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.kstream.Consumed;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Printed;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.support.serializer.JsonSerde;
@Configuration
public class KafkaBankTransactionStream {
@Bean
public KStream<String,BankTransaction> bankTransactionKStream(StreamsBuilder streamsBuilder) {
KStream<String,BankTransaction> sourceBankTransactionKStream=streamsBuilder.stream("bank-transaction", Consumed.with(Serdes.String(),new JsonSerde<>(BankTransaction.class)));
sourceBankTransactionKStream
.groupBy((k, v) -> v.getName()).aggregate(
() -> 0L,
(k, v, a) -> {
a=a+v.getAmount();
return a;
},Materialized.with(Serdes.String(),Serdes.Long())).toStream()
.print(Printed.toSysOut());
return sourceBankTransactionKStream;
}
}
Exercise 2
- Create a topic employee and produce data to the topic in the form of JSON i.e
{"id":"1","gender":"Male","age":"32","name","Sunny","salary":"500000"}
- Use Kafka stream to file which does the following operations
- For all the employees with age greater that 30
- If the gender is male then prepend name with Mr or else prepend it with Ms/Mrs
- Assuming that salary is in rupee convert it to dollar
- publish the transformed data in updated-employee topic
Exactly once semantic
- Exactly once is the ability to guarantee that data processing on each message will only happen at once. Following scenarios explains the problem with atleast once semantic
kafka
kafka
Kafka Stream, Producer or Consumer
1. Receive Message
2. Send Output
4. Commit Offset
3. Receive
Ack
- If Kafka server reboots after step 3 the same message will be received twice. (As a Kafka Consumer)
- If After step 3 a network error occurs then the message will be send twice. (As a Kafka Producer)
How Kafka solves this problem ?
- The producer is now idempotent. If the same message is send twice or more Kafka will make sure to keep only one copy of it.
- You can write multiple messages to different Kafka topics as a part of one transaction. Either all are written or none is written.
- We just need to add following property in
KafkaStreamsConfiguration
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG,StreamsConfig.EXACTLY_ONCE);
Kafka with Spring Boot Part 2
By Pulkit Pushkarna
Kafka with Spring Boot Part 2
- 940