The term "event" in computing is fairly overloaded and often described based on the context it is used in. I do not intend to give you guys some long complex explanation of events or try to force some description over another, but for the sake of this post and the information in it we can simply state that:
"An event is a record of a significant change at a given point in time."
So an event describes something that changed, and it is significant to us because our application(s) have to react to it. An event is always associated with time, and even if we don't care about when it happened in some cases, it does have a temporal (time) relationship to other events.
Some events may only be significant to us if they are followed up by some other one, or it may be significant only during a certain period of time. In some cases a single event is meaningless, however if it reoccurs many times during a short period of time it may become significant.
To give an example based on the currently hard situation we are all in, a single case of reported flu symptoms in a patient may not be significant or may be considered quite normal during the flu season, however hundreds of these events during a short period of time may conclude a possible pandemic.
Associating events which are related to each other is called event correlation. There are many approaches out there on how to correlate events. To provide some examples, rule-based correlation uses rule engines find complex relationships between many events. History-based correlation can be another approach where machine learning and analytics is used to learn some significant behaviors or patterns given past event occurrences. Domain-based correlation is another approach of correlating events based on their occurrences (time) and domain-specific event-relationships (data).
There is a ton more information about events out there and I do not claim by any means to be an expert in this area, however it is important to have some basic understanding of events in order to move on and start talking how and why they are important when modeling serverless workflows.
Recently someone asked in one of the
workflow group meetings what makes the Serverless Workflow specification ...well..."serverless". The problem with answering this question is that you have to use the word "serverless" in the answer itself, to describe workflows that orchestrate "serverless" applications (
event-driven applications by nature, deployed on cloud platforms).
And we are back to events :) As the applications we are trying to orchestrate are by nature event-driven and loosely-coupled (microservices), our workflows have to be able to describe how our applications should behave and operate when events occur.
Can we model our serverless workflows without events? Yes of course, if our serverless applications are not relying on them, however any serverless workflow model must be able to define events and have at least some basic means to describe how occurrences of events trigger certain business logic.
But is being able to describe events enough for a workflow specification? The answer is simply "NO!". It has to be described in a common, vendor-neutral, and portable way. Each cloud provider we deploy our apps on has their own set of services responsible for event processing. They have many names, but let's just call them "Event Hubs". These event hubs are responsible to identify significant events that our workflows act upon as well as provide the events (their information/data) to the runtimes executing our workflows. Events that we need to make our orchestration decisions can come from many different sources (hubs) and have as many different formats. So how do we reach the goal of being able to describe events and stay vendor neutral and portable?
Well either we can try to support every single event format there is or we pick a single vendor neutral and portable format...which one of these options would you chose? :)
The Serverless Workflow specification mandates that events be described using the
CloudEvents format. It does not mandate or enforce the format of these events at their originating sources, not the many different event hubs, but it does mandate that their data format be converted to the one described in the CloudEvents specification in order to be consumed by serverless workflow instances.
The CloudEvents format is thankfully very simple and straight-forward. It is described with
this JSON Schema. Here is an example of an event description using the CloudEvents format:
The type parameter identifies, well, the event type, the source parameter provides the originating source (hub) which created the event. CloudEvent format also includes context properties which can be used for event correlation, in this case we could use the customerId context property to associate all
events of this type to the same customer. The event information/data is described with the "data" property.
The CloudEvents specification allows for many different data content types (event base64 encoded values). Since the Serverless Workflow specification workflow data format is described with JSON, one limitation we must define is that in order for event data to be consumed by workflows it must be in JSON format. This does not mean that it cannot include things like encoded string values, but simply that in order to be consumed into the workflow data (we will get to this in much more detail in future posts), it has to have JSON format.
Oh and if you are asking what it means for an event to be "consumed" by a workflow instance, well it means that its information (data) can be used to make orchestration decisions. Often just an occurrence of an even is not enough to make orchestration decisions, and event data has to be inspected and reasoned over in order to trigger different function executions.
So now that we are equipped with all this awesome knowledge about events (and provided you are not asleep yet) let's finally dig into event definitions in the Serverless Workflow specification.
Similar to our last post where we described
reusable function definitions, the specification allows you to model reusable event definitions. It is a list of orchestration events which are significant for the particular workflow model. Basically we want to define in our workflow model which events can trigger orchestration decisions. Workflow states then (much more about this in future posts) can then reference these event definitions in order to describe under which conditions workflow instances can be created, or what actions to execute upon arrival of certain orchestration events, or what events to wait for in certain cases.
Here is an example event definition as defined by the Serverless Workflow specification:
Here we describe three different events. Each event definition must have a unique name which then can be referenced by different workflow states (this will be shown in future posts).
Workflow implementations should use the type and source parameters to match events (in CloudEvents format) provided by event hubs.
The correlationToken parameter is an optional parameter which references a context parameter of the event. In this example it is assumed that the produced events have a "patientId" context property which has a value of the unique patient id.
All events with the defined source of the defined type and which have the patching patientId value, are considered to be correlated and can be used to be considered significant for workflow instances created from this particular workflow definition/model.
Serverless Wokflow states such as Event or Callback states can reference these events in order to define what events (single or combination of) are needed to start workflow instances, perform certain actions, etc. Again, we will talk about all this in much more details in future posts.
I hope this article gave you a good idea on how to define events within your serverless workflow models and some of the reasoning on why. As always thanks for reading!