Why is Kafka so fast

The reason for Kafka's high throughput?

1. Read and write to the hard disk one at a time

Kafka's message (msg) is constantly being appended to the file. With this function, Kafka can take full advantage of the hard disk's sequential read and write performance. Sequential read and write does not require the disk head seek time and very littleSector rotation timeSo the speed is much faster than reading and writing randomly
Kafka officially gave the test data (Raid-5, 7200 rpm):
Sequential I / O: 600 MB / s
Random I / O: 100 KB / S.

2. No copy

After the Linux 2.2 kernel, a system call mechanism called "zero-coy" was displayed and the data is no longer copied to the "user mode buffer".
The system context change is reduced to double, which can double the performance, As shown below:

3. File segmentation


With the file segmentation method, each file operation is an operation on a small file, which is very portable and also increases the parallel processing ability.

4. Send batch

(Message) The message is emptied to the hard disk, the flushing strategy, the following two parameters are in the server.properties file
log.flush.interval.messages =
log.flush.interval.ms =

5. Data compression

Kafka also supports the compression of message collections. The producer can compress message collections in GZIP or Snappy format. The advantage of compression is to reduce the amount of data transferred and to reduce the pressure on the network transmission. However, after compressing the producer, the consumer has to decompress to increase the work of the CPU, but when processing big data, the bottleneck is in the network instead of the CPU, so the cost is well worth it.