Distributed Streaming Model

Distributed Architecture Example

Infrastructure

Kafka Cluster

The Kafka cluster serves as the backbone of the distributed stream processing system. It will manage multiple topics for handling tasks and their states:

  1. tasks: Collects incoming tasks from the Task Manager Frontend.
  2. success-tasks: Stores successfully processed tasks.
  3. failed-tasks: Tracks tasks that fail during processing.
  4. unknown-tasks: Captures tasks that neither succeed nor fail, requiring further analysis.
  5. distributed-tasks: Holds tasks that are distributed to external third-party systems.

Network Storage

This storage system is where the Worker Pool stores the final output of completed tasks.

Components

Task Manager Frontend

  • A simple HTTP endpoint that accepts incoming tasks via HTTP requests.
  • Domain example: taskmanagerfrontend-dev.ytamang.com
  • Tasks received by the Task Manager are sent to the Kafka topic: tasks.
  • The Task Manager produces messages to the tasks topic for processing.

Task Processor Worker Pool

  • A cluster of worker nodes, each running an application that consumes tasks from the tasks topic.
  • Task processing times vary between 1 second and 30 minutes.
  • If a task is completed within the predefined time limit, the worker node sends a message to the success-tasks topic.
  • If a task fails during processing, it is pushed to the failed-tasks topic.
  • If a task exceeds the processing time limit, it is moved to the unknown-tasks topic for further investigation.

Task Distributor

  • This component is responsible for distributing successfully processed tasks to third-party services or systems.