Kafka

Intro

Kafka is a distributed streaming platform.

fast

high accuracy

resilent

fault torlerent

decouple system dependency

high throughput

Before kafka

User

interact

Application

Database

service

User

interact

service

service

service

service

Database

Application

service

User

interact

service

service

service

service

Database

service commucation

Application

service

User

interact

service

service

service

service

Database

Application

Kafka

producer

User

interact

consumer

cosumer

producer

producer

Database

communicate via

Kafka

Application

Kafka

pub

pub

pub

sub

sub

producer

Kafka

event

metadata

key,

value,

timestamp

and event(message) store in disk

Kafka

broker

topic 1

topic 2

topic 3

producer

consumer

e

.

.

.

Kafka

cluster

broker 1

broker 2

broker 3

Zookeeper

zookeeper 管理 broker

Case

Article Views

Scenario

We need to record article views count each query from server, but in scalable system, this behavior will be a bottleneck.

When you update views counter in database, it will lock the data row (or whole table with bad index configuration), affecting the query performance.

Easy Solution

  1. Cache the views counter value in memory
  2. Write back to database in long period

Easy Solution

  1. Cache the views counter value in memory
  2. Write back to database in long period

Risk

  • Accident shutdown
  • Long term value sync issues between multiple application nodes

Fantastic Solution

Stream Processing

In memory database

In my case

consume

Views Counter

Consumer

Produce Kafka Message

Update everytime

Consumer

subscribe

consume

subscribe

Update Cached View

Get Cached View

Redis

Full Logs

Dedicated Table

Get Cached View

Update periodly

Aggragated

ArticleViews

ArticleViewLogs

const REDIS_URI = process.env.REDIS_URI || 'redis://127.0.0.1:6379';
const redisClient = redis.createClient({ url: REDIS_URI });

const CACHE_EXPIRE_SECONDS = 4 * 60 * 60; // 4 hours;

export async function getArticleViews(articleId: string): Promise<number> {
  return new Promise((resolve) => {
    redisClient.get(articleId, async (_, views) => {
      if (views) {
        resolve(Number(views));
      } else {
        const manager = await getManager();

        const articleViewRecord = await manager.findOne(ArticleView, { ArticleId: articleId });

        if (articleViewRecord) {
          redisClient.set(articleId, articleViewRecord.views.toString());
          redisClient.expire(articleId, CACHE_EXPIRE_SECONDS);

          resolve(articleViewRecord.views);
        } else {
          redisClient.set(articleId, '0');
          redisClient.expire(articleId, CACHE_EXPIRE_SECONDS);

          resolve(0);
        }
      }
    });
  });
}

Get from cache

When miss cache, find from aggregated table.

import { Kafka, logLevel } from 'kafkajs';

const KAFKA_CLIENT_ID = process.env.KAFKA_CLIENT_ID || 'wealth-usage-listener';
const KAFKA_BROKER_URI = process.env.KAFKA_BROKER_URI || 'localhost:7092';
const KAFKA_ARTICLE_VIEW_TOPIC = process.env.KAFKA_ARTICLE_VIEW_TOPIC || 'wealth-article-views';

const kafka = new Kafka({
  logLevel: logLevel.INFO,
  brokers: [KAFKA_BROKER_URI],
  clientId: KAFKA_CLIENT_ID,
});

const initQueue: QueueRecord[] = [];

export default async function init() {
  if (isInitialing || isConnected) {
    return;
  }

  debugKafkaProducer('Connecting Producer...');

  isInitialing = true;

  producer.on('producer.connect', () => {
    initQueue.map((record) => async () => {
      await producer.send({
        topic: KAFKA_ARTICLE_VIEW_TOPIC,
        messages: [
          {
            key: 'id',
            value: record.articleId,
            ...(record.memberId ? {
              headers: {
                memberId: record.memberId,
              },
            } : {}),
          },
        ],
      });
    }).reduce((prev, next) => prev.then(next), Promise.resolve());
  });

  await producer.connect();

  isConnected = true;
}

Create Producer

export async function recordArticleView(
  articleId: string,
  memberId?: string | undefined,
): Promise<void> {
  redisClient.get(articleId, (_, views) => {
    if (views) {
      redisClient.set(articleId, (Number(views) + 1).toString());
    } else {
      redisClient.set(articleId, '1');
    }

    redisClient.expire(articleId, CACHE_EXPIRE_SECONDS);
  });

  if (!isConnected) {
    debugKafkaProducer('Kafka Producer is not connected.');

    initQueue.push({ articleId, memberId });

    if (!isInitialing) {
      init();
    }

    return;
  }

  await producer.send({
    topic: KAFKA_ARTICLE_VIEW_TOPIC,
    messages: [
      {
        key: 'id',
        value: articleId,
        ...(memberId ? {
          headers: {
            memberId,
          },
        } : {}),
      },
    ],
  });
}

Producer

const KAFKA_CONSUMER_GROUP_ID = process.env.KAFKA_CONSUMER_GROUP_ID || 'wealth-consumer';
const KAFKA_ARTICLE_VIEW_TOPIC = process.env.KAFKA_ARTICLE_VIEW_TOPIC || 'wealth-article-views';
const ARTICLE_VIEW_UPDATE_FEQ_IN_MS = Number(process.env.ARTICLE_VIEW_UPDATE_FEQ_IN_MS || '30000'); // 30 sec

const consumer = kafka.consumer({ groupId: KAFKA_CONSUMER_GROUP_ID });

export default async function run() {
  await consumer.subscribe({ topic: KAFKA_ARTICLE_VIEW_TOPIC });
  
  let lastSync = Date.now();

  await consumer.run({
    eachMessage: async ({ topic, message }) => {
      const articleId = message.value!.toString();

      if (topic === KAFKA_ARTICLE_VIEW_TOPIC && message?.key?.toString() === 'id') {
        const manager = await getManager();

        const log = manager.create(ArticleViewLog, {
          ArticleId: articleId,
          MemberId: message.headers?.memberId?.toString() ?? undefined,
        });

        await manager.save(log);
      }

      if (ARTICLE_VIEW_UPDATE_FEQ_IN_MS
        && (Date.now() - lastSync) > ARTICLE_VIEW_UPDATE_FEQ_IN_MS) {
        const manager = await getManager();

        const count = await manager.count(ArticleViewLog, { ArticleId: articleId });

        const viewRecord = await manager.findOne(ArticleView, articleId);

        if (viewRecord) {
          viewRecord.views = count;
          viewRecord.updatedAt = new Date();

          await manager.save(viewRecord);
        } else {
          const newRecord = manager.create(ArticleView, {
            ArticleId: articleId,
            views: count,
          });

          await manager.save(newRecord);
        }

        lastSync = Date.now();

        debugKafkaConsumer(`Article ${articleId} updated views to ${count}`);
      }
    },
  });
}

Create Consumer

import debug from 'debug';
import Koa from 'koa';
import { createConnection } from 'typeorm';
import { getArticleViews } from './worker';

const debugServer = debug('Wealth:UsageListenerServer');
const USAGE_LISTENER_PORT = Number(process.env.USAGE_LISTENER_PORT || '6068');

const app = new Koa();

app.use(async (ctx) => {
  const articleId = ctx.url.replace(/^\//, '');

  ctx.body = await getArticleViews(articleId);
});

createConnection().then(() => {
  app.listen(USAGE_LISTENER_PORT, () => {
    debugServer(`Usage Listener Server listen on ${USAGE_LISTENER_PORT}`);
  });
});

Export Service

Database

Application

User

interact

update

data

Database

Application

update

data

event

(metadata)

Logging

order sequence of events

key,

value,

timestamp

Technically speaking, event streaming is the practice of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events;

 

Kafka

By Jay Chou

Kafka

  • 503