Kafka
Design
Motivation
To be able to act as a unified platform for handling all the real-time data feeds a large company might have.
- high-throughput to support high volume event streams
- deal gracefully with large data backlogs to be able to support periodic data loads from offline systems.
- handle low-latency delivery to handle more traditional messaging use-cases.
Persistence
Don't fear the filesystem
Using the filesystem and relying on pagecache is superior to maintaining an in-memory cache or other structure All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.
Constant Time Suffices
不用 O(lgN) 的 BTree,而采用 O(1) 的 log,而且 read 不会堵塞 write
Efficiency
除了采用顺序方式访问磁盘外,还有两点优化:
- 很多小 IO 操作,采用 message set 抽象来对消息进行分组,这样就不用再一次只发送一个消息了
- 额外的 byte 拷贝 excessive byte copying,解决方案是 sendfile 系统调用,可以有效从 pagecache 传数据到 socket。具体见 Efficient data transfer through zero copy
- https://mp.weixin.qq.com/s/Hy3npWsrJg6w9gvgkRD89Q
End-to-end Batch Compression
In some cases the bottleneck is actually not CPU or disk but network bandwidth. 应用层自己做压缩也可以,但是不够好,因为 Efficient compression requires compressing multiple messages together rather than compressing each message individually.
Replication
replication 的基本单位是 partition,在 non-failure 情况下,每个 partition 都有一个 leader,零或多个 follower。 读写发生在 leader 上,followers 作为 consumer 从 leader 上同步消息
Replicated Logs: Quorums, ISRs, and State Machines (Oh my!)
Unclean leader election: What if they all die?
Availability and Durability Guarantees
Replica Management
Message Delivery Semantics
At most once/At least once/Exactly once Producer 发生消息后,如果没有收到回复,那么
- 在 0.11.0.0 版本之前,只能选择重发,会造成重复
- 0.11.0.0 之后,broker 会给每个 producer 关联一个 id,producer 发生的消息带上 sequence number,以达到去重的效果。而且,在这个版本后,producer 支持跨 partition 的事务概念:要么所有消息都写成功或都失败。不过这需要 consumer 的配合。
为了保证 Exactly once,consumer 需要把当前消费的 offset 作为 topic 的一条消息,这样就能够以事务的方式同时提交 offset 与 处理的输出,这种隔离级别( isolation.level
)称为 read_committed,默认是 read_uncommitted,即在consumer可能消费掉因事务失败而回滚的消息。
在结果输出到外部系统时,要保证保存 offset 与 保存结果 的一致性,需要 two-phase commit。But this can be handled more simply and generally by letting the consumer store its offset in the same place as its output.
学习资料
运维
reassign partition:
- http://activemq.apache.org/enterprise-integration-patterns.html
- http://camel.apache.org/
- topic的leader显示为none的解决办法
get /controller_epoch
set /brokers/topics/jliu-test/partitions/0/state {"controller_epoch":20,"leader":103,"version":1,"leader_epoch":2,"isr":[102,104]}
安装方式
- https://raw.githubusercontent.com/wurstmeister/kafka-docker/master/docker-compose.yml
- https://github.com/wurstmeister/kafka-docker
|
|
把 zk 与 kafka 均通过 docker-compose 去启动,Kafka 注册在 zk 的地址是 localhost,client 通过查询 zk 后得到这个注册地址,再去连接。