Processing streaming events with Apache Kafka
The Streamer
Client Applications
Now I will leave the Kafka cluster itself and turn to the applications that Kafka either populates or taps into: the producers and consumers. These client applications contain code developed by Kafka users to insert messages into and read messages from topics. Every component of the Kafka platform that is not a Kafka broker is basically a producer or consumer – or both. Producers and consumers form the interfaces to the Kafka cluster.
Kafka Producers
The API interface of the producer library is quite lightweight: A Java class named KafkaProducer
connects the client to the cluster. This class has some configuration parameters, including the address of some brokers in the cluster, a suitable security configuration, and other settings that influence the network behavior of the producer.
Transparently for the developer, the library manages connection pools, network buffers, and the processes of waiting for confirmation of messages by brokers and possibly retransmitting messages. It also handles a multitude of other details that the application programmer does not need to worry about. The excerpt in Listing 1 shows the use of the KafkaProducer API to generate and send 10 payment messages.
Listing 1
KafkaProducer API
01 [...] 02 try (KafkaProducerString,<Payment> producer = new KafkaProducer<String, Payment>(props)) { 03 04 for (long i =**0; i <**10; i++) { 05 final String orderId = "id" + Long.toString(i); 06 final Payment payment = new Payment(orderId,**1000.00d); 07 final ProducerRecord<String, Payment> record = 08 new ProducerRecord<String, Payment>("transactions", 09 payment.getId().toString(), 10 payment); 11 producer.send(record); 12 } 13 } catch (final InterruptedException e) { 14 e.printStackTrace(); 15 } 16 [...]
Kafka Consumers
Where there are producers, there are usually also consumers. The use of the KafkaConsumer API is similar in principle to that of the KafkaProducer API. The client connects to the cluster via the KafkaConsumer
class. The class also has configuration options, which determine the address of the cluster, security options, and other parameters. On the basis of the connection, the consumer then subscribes to one or more topics.
Kafka scales consumer groups more or less automatically. Just like KafkaProducer
, KafkaConsumer
also manages connection pooling and the network protocol. However, the functionality on the consumer side goes far beyond the network cables.
When a consumer reads a message, it does not delete it, which is what distinguishes Kafka from traditional message queues. The message is still there, and any interested consumer can read it.
In fact, it is quite normal in Kafka for many consumers to access a topic. This seemingly minor fact has a disproportionately large influence on the types of software architectures that are emerging around Kafka, because Kafka is suitable not just for real-time data processing. In many cases, other systems also consume the data, including batch processes, file processing, request-response web services (representational state transfer, REST/simple object access protocol, SOAP), data warehouses, and machine learning infrastructures.
The excerpt in Listing 2 shows how the KafkaConsumer API consumes and processes 10 payment messages.
Listing 2
KafkaConsumer API
01 [...] 02 try (final KafkaConsumer<String, Payment> consumer = new KafkaConsumer<>(props)) { 03 consumer.subscribe(Collections.singletonList(TOPIC)); 04 05 while (true) { 06 ConsumerRecords<String, Payment> records = consumer.poll(10); 07 for (ConsumerRecord<String, Payment> record : records) { 08 String key = record.key(); 09 Payment value = record.value(); 10 System.out.printf("key = %s, value = %s%n", key, value); 11 } 12 } 13 } 14 [...]
Buy this article as PDF
(incl. VAT)