Distributed Streaming Model
Infrastructure
Kafka Cluster
The Kafka cluster serves as the backbone of the distributed stream processing system. It will manage multiple topics for handling tasks and their states:
- tasks: Collects incoming tasks from the Task Manager Frontend.
- success-tasks: Stores successfully processed tasks.
- failed-tasks: Tracks tasks that fail during processing.
- unknown-tasks: Captures tasks that neither succeed nor fail, requiring further analysis.
- distributed-tasks: Holds tasks that are distributed to external third-party systems.
Network Storage
This storage system is where the Worker Pool stores the final output of completed tasks.
Components
Task Manager Frontend
- A simple HTTP endpoint that accepts incoming tasks via HTTP requests.
- Domain example:
taskmanagerfrontend-dev.ytamang.com
- Tasks received by the Task Manager are sent to the Kafka topic:
tasks
. - The Task Manager produces messages to the
tasks
topic for processing.
Task Processor Worker Pool
- A cluster of worker nodes, each running an application that consumes tasks from the
tasks
topic. - Task processing times vary between 1 second and 30 minutes.
- If a task is completed within the predefined time limit, the worker node sends a message to the
success-tasks
topic. - If a task fails during processing, it is pushed to the
failed-tasks
topic. - If a task exceeds the processing time limit, it is moved to the
unknown-tasks
topic for further investigation.
Task Distributor
- This component is responsible for distributing successfully processed tasks to third-party services or systems.