How to pull all historic data from a given Kafka topic?

Memgraph stream starts consuming messages from the committed offset. Is there / what’s the right way to get all data from the beginning of a topic?

E.g. kafka console consumer
./bin/kafka-console-consumer.sh --topic "topic" --from-beginning --bootstrap-server "endpoint"
does that by adding the --from-beginning flag.

Not sure how to do this in Memgraph yet, but keep in mind that you can set up kafka/zookeeper to do that for you with the auto.offset.reset option. Read more here.

1 Like

According to the documentation: Streams | Memgraph Docs

The default batch size is defaulted to 1000? I think the only sane default here is 1. Can someone explain why it is 1000?

It’s not until 1000, it’s until 1000 or batch_interval, whichever comes first.
I don’t see a reason why wouldn’t you prefer to process multiple messages at once in a specific interval. If the messages are coming at a high rate, you process them in batches, if not you process them one by one, you just have the delay set by batch interval.

Ok, but is there an option to change the offset? Not only do messages arrive around 5 minutes after I start the consumer in memgraph, but also there is a delay of around 5 minutes between each message. I send messages into kafka 1 per second, and that’s the same rate they arrive in a kafka consumer and in a python consumer, something is definitely off and it’s not stated in the docs.