Tuesday, March 24, 2020

Event definitions in Serverless Workflow

The term "event" in computing is fairly overloaded and often described based on the context it is used in. I do not intend to give you guys some long complex explanation of events or try to force some description over another, but for the sake of this post and the information in it we can simply state that:

"An event is a record of a significant change at a given point in time."

So an event describes something that changed, and it is significant to us because our application(s) have to react to it. An event is always associated with time, and even if we don't care about when it happened in some cases, it does have a temporal (time) relationship to other events. 

Some events may only be significant to us if they are followed up by some other one, or it may be significant only during a certain period of time. In some cases a single event is meaningless, however if it reoccurs many times during a short period of time it may become significant. 
To give an example based on the currently hard situation we are all in, a single case of reported flu symptoms in a patient may not be significant or may be considered quite normal during the flu season, however hundreds of these events during a short period of time may conclude a possible pandemic. 

Associating events which are related to each other is called event correlation. There are many approaches out there on how to correlate events. To provide some examples, rule-based correlation uses rule engines find complex relationships between many events. History-based correlation can be another approach where machine learning and analytics is used to learn some significant behaviors or patterns given past event occurrences. Domain-based correlation is another approach of correlating events based on their occurrences (time) and domain-specific event-relationships (data).

There is a ton more information about events out there and I do not claim by any means to be an expert in this area, however it is important to have some basic understanding of events in order to move on and start talking how and why they are important when modeling serverless workflows.

Recently someone asked in one of the workflow group meetings what makes the Serverless Workflow specification ...well..."serverless". The problem with answering this question is that you have to use the word "serverless" in the answer itself, to describe workflows that orchestrate "serverless" applications (event-driven applications by nature, deployed on cloud platforms). 
And we are back to events :) As the applications we are trying to orchestrate are by nature event-driven and loosely-coupled (microservices), our workflows have to be able to describe how our applications should behave and operate when events occur.
Can we model our serverless workflows without events? Yes of course, if our serverless applications are not relying on them, however any serverless workflow model must be able to define events and have at least some basic means to describe how occurrences of events trigger certain business logic.

But is being able to describe events enough for a workflow specification? The answer is simply "NO!". It has to be described in a common, vendor-neutral, and portable way. Each cloud provider we deploy our apps on has their own set of services responsible for event processing. They have many names, but let's just call them "Event Hubs". These event hubs are responsible to identify significant events that our workflows act upon as well as provide the events (their information/data) to the runtimes executing our workflows. Events that we need to make our orchestration decisions can come from many different sources (hubs) and have as many different formats. So how do we reach the goal of being able to describe events and stay vendor neutral and portable? 
Well either we can try to support every single event format there is or we pick a single vendor neutral and portable format...which one of these options would you chose? :)

The Serverless Workflow specification mandates that events be described using the CloudEvents format. It does not mandate or enforce the format of these events at their originating sources, not the many different event hubs, but it does mandate that their data format be converted to the one described in the CloudEvents specification in order to be consumed by serverless workflow instances.

The CloudEvents format is thankfully very simple and straight-forward. It is described with this JSON Schema. Here is an example of an event description using the CloudEvents format:




The type parameter identifies, well, the event type, the source parameter provides the originating source (hub) which created the event. CloudEvent format also includes context properties which can be used for event correlation, in this case we could use the customerId context property to associate all 
events of this type to the same customer. The event information/data is described with the "data" property. 

The CloudEvents specification allows for many different data content types (event base64 encoded values). Since the Serverless Workflow specification workflow data format is described with JSON, one limitation we must define is that in order for event data to be consumed by workflows it must be in JSON format. This does not mean that it cannot include things like encoded string values, but simply that in order to be consumed into the workflow data (we will get to this in much more detail in future posts), it has to have JSON format.

Oh and if you are asking what it means for an event to be "consumed" by a workflow instance, well it means that its information (data) can be used to make orchestration decisions. Often just an occurrence of an even is not enough to make orchestration decisions, and event data has to be inspected and reasoned over in order to trigger different function executions. 

So now that we are equipped with all this awesome knowledge about events (and provided you are not asleep yet) let's finally dig into event definitions in the Serverless Workflow specification.

Similar to our last post where we described reusable function definitions, the specification allows you to model reusable event definitions. It is a list of orchestration events which are significant for the particular workflow model. Basically we want to define in our workflow model which events can trigger orchestration decisions. Workflow states then (much more about this in future posts) can then reference these event definitions in order to describe under which conditions workflow instances can be created, or what actions to execute upon arrival of certain orchestration events, or what events to wait for in certain cases. 

Here is an example event definition as defined by the Serverless Workflow specification:

Here we describe three different events. Each event definition must have a unique name which then can be referenced by different workflow states (this will be shown in future posts). 
Workflow implementations should use the type and source parameters to match events (in CloudEvents format) provided by event hubs. 

The correlationToken parameter is an optional parameter which references a context parameter of the event. In this example it is assumed that the produced events have a "patientId" context property which has a value of the unique patient id. 
All events with the defined source of the defined type and which have the patching patientId value, are considered to be correlated and can be used to be considered significant for workflow instances created from this particular workflow definition/model.

Serverless Wokflow states such as Event or Callback states can reference these events in order to define what events (single or combination of) are needed to start workflow instances, perform certain actions, etc. Again, we will talk about all this in much more details in future posts. 

I hope this article gave you a good idea on how to define events within your serverless workflow models and some of the reasoning on why. As always thanks for reading!







Tuesday, March 17, 2020

Function definitions and how to use them

In the previous post we presented an overview of the Serverless Workflow model.
With this post we will look deeper into the reusable function definitions, and how they can be used 
in your serverless workflow definitions.

The term "function definition" as used within the serverless workflow model refers to defining how our loosely-coupled services that may be deployed onto several different cloud hosting providers can be invoked during workflow execution.

Many different cloud hosting companies exist today, offering bleeding-edge FaaS platforms onto which we can deploy our microservices (serverless functions). Some of the most notable ones include:

Each of these platforms may provide a unique set of ways to invoke the deployed functions which can include direct access to run your code, or many different event/queue-driven services which you can utilize to trigger executions based on different criteria (timers, events, messages, db updates etc).

Now let's take a step back and remember why we are interested in using serverless workflows tobegin with, namely to introduce a clear separation of concerns. Our functions should have the strict focus on dealing with the actual business requirements. Workflows then take on the orchestration aspect which includes defining invocations based on triggers (events), dealing with managing data between function invocations, and defining the overall orchestration control flow logic.

With that in mind then we can see that with the function definition in the serverless workflows model we are defining direct invocation of our serverless functions only. Invocations based on events (triggers) etc can be explicitly defined using workflow event definitions and different workflow states, which will be covered in depth in future posts. This allows you to clearly define the conditions under which our functions should be invoked as well as how they should be invoked in a vendor-neutral way. 

Let's take a look at an example on how direct function access can be done in AWS. They use Amazon Resource Names (ARNs) which uniquely identify AWS resources including our functions that may run on AWS Lambda.
An ARN is a string which can look like:

"arn:aws:lambda:us-east-1:123456789012:function:lambda-hello-world"

RESTful API invocations can also be accomplished with simple routing rules, for example you may define in your API gateway:
"GET /hello_world => arn:aws:lambda:us-east-1:123456789012:function:lambda-hello-world"

to expose function invocation via REST.

With Azure Functions for example you can define a http endpoint to trigger function execution, in the form of a string:
"http://<APP_NAME>.azurewebsites.net/api/<FUNCTION_NAME>"

In most if not all cases function invocation can be defined with a string value which defines the unique access point (resource) for particular function invocation.

Now lets finally take a look at functions definitions in the Serverless Workflow model :)
The Serverless Workflow JSON Schema defines the top-level functions array as:


Which is an array of specific function definitions described in the schema as:

So finally lets take an example workflow definitions where we have two function invocations definitions that can be then referenced (via its unique name) in workflow state actions:



Since functions can be invoked by many different states during workflow execution, this reusable "functions" array allows workflow states to reference the function invocation resource via the uniquely defined function name. We will get into details on workflow states and state actions in future posts, but for now here is an example state action which references a defined function and passes input parameters to it:


This action references our previously defined "HelloWorldFunction" function and passes a parameter to it which tells it what language to display the greeting in.

The function definition also includes a non-required "type" parameter. This parameter can be used give more information to the runtime implementations what the resource function parameter defines. Some example values of the type parameter can be: "arn", "POST", "kafka-topic", etc. The type parameter does not influence workflow execution logic, it simply gives more information about the function invocation resource.

Thanks for reading, and as always feel free to get your own ideas and expertise into the Serverless Workflow specification by contributing and getting involved.

-- Tihomir Surdilovic --

Friday, March 13, 2020

Overview of the Serverless Workflow Model

In this post we present a quick overview of the workflow model as defined in the Serverless Workflow specification.

Note that the specification is still a work in progress and is subject to change. You can reference the specification roadmap to see the projected release status.

Serverless Workflow uses JSON or YAML formats for defining workflow models. The entire workflow model is described using JSON Schema which you can find here.

The core workflow definitions is very simple:

"Core workflow definition elements"

Each workflow model has a unique id. It can also have a version, a name and a description. 
The "startsAt" parameter defines the starting point of the workflow.

Next thing to look at is a list of reusable function definitions. Function definitions express how to invoke needed services during workflow execution. Since multiple workflow states may need to invoke the same services, defining them on the workflow top-level allows us to describe their  invocation information only once. We will go into function definition specifics in future posts.

Events play a big role in serverless orchestration. After all, we are focusing on orchestrating event-based applications running in the cloud. 

Let's take a step back and define what an event is. An event is a data record expressing an occurrence and its context (information).  Events are things that happen that we want to act upon. "A file was uploaded", or "Email was received", or "Application was submitted", or "Purchase was made", all can be considered important events that our workflow must act upon. Events are often described as "triggers". They can trigger workflow execution and/or actions (function calls for example).

Events must have a format which describes them. Currently mane different event formats exist which makes it virtually impossible to have a single event description that would work across multiple cloud providers. 
To solve this problem, the Serverless Workflow specification mandates for events to conform to the CloudEvents format.  CloudEvents is a specification for describing event data in common formats to provide interoperability across services, platforms and systems.  

Going back to the workflow model, the "events" array includes reusable event definitions. Their definitions can be referenced by different workflow states. 
We will go much deeper into event definitions and how the are used in future posts.

Next, and probably the most important part of the core workflow definitions are the workflow states. States are building blocks of our workflows. Each state defines a certain control flow logic block and combined define the overall control flow logic of our applications or orchestration. States can reference the defined functions and events. 

Serverless Workflow specification currently defines nine states:

"Workflow States"


We will have dedicated posts for each of the states in the near future so stay tuned. If you can't wait, feel free to dive into the entire specification and check out the many examples available currently.


I hope this this quick intro has given you a good look into the core serverless workflow model. 
Again, we are looking for community contributors for the specification, so feel free to get involved!








Thursday, March 12, 2020

Introducing the Serverless Workflow Specification

"Serverless Workflow is a vendor-neutral specification 
for defining the model of workflows
responsible for orchestrating event-driven serverless applications."


It is apparent that workflows have become a key component in serverless application development, and this is good news for many reasons. 

For once workflows have been around for years in many different shapes or forms. The term "workflow" is used interchangeably today with the term "business process", a set of repeatable activities that need to be carried out to accomplish some sort of business/organizational goals. 

Separation of concerns is one of the core benefits of using workflows - the ability to cleanly divide responsibilities in your applications with the overall goal to create well-organized systems.

Another core benefit is encompassed in the word "orchestration" which basically means coordination and management of services and events. This is specifically important in cloud environments where our applications are composed of many loosely-coupled, distributed, event-triggered services deployed across multiple clouds. This architecture almost demands orchestration, or the need to tie and coordinate all these services and events together to describe clear business-oriented tasks and goals.

Many of the major cloud providers have adopted workflows as first-class citizens in their service offerings, with the major ones being:



As well as many many others, most notable ones including Fn Flow, StackStorm Orquesta, Netflix Conductor, Fission Workflows, etc.
It feels as there is a new serverless workflow offering coming out every month now and the numbers keep growing.

Even tho this growth and popularity is very encouraging, it creates some big issues for companies trying to adopt workflows in their serverless architectures with the major one being a complete vendor-lock.

Once you commit to a workflow solution offered by a cloud provider there is "no way out".
To explain why this is the case, we have to look at the two pieces which allow our wokflows to become "executable" in any environment:


  • Workflow model - the definition of what the workflow does. This is typically described in a markup language format such as XML, JSON, YAML, etc
  • Workflow runtime - the runtime engine which interprets the workflow model into an executable application.

Current serverless workflow offerings define proprietary implementations for both the workflow model and runtime. This makes it impossible to port your workflows from one vendor to another, thus falling into a vendor-lock.

To get out of this, there is not necessarily a need for a vendor-neutral workflow runtime implementations, quite frankly this is not possible. However this is possible to achieve on the workflow model level which is the core goal of the Serverless Workflow Specification we are introducing in this post.

Serverless Workflow is a specification being worked on by the CNCF Serverless Working Group and hosted by the Cloud Native Computing Foundation (CNCF).

The main goals of the specification include:


  • To facilitate Serverless Workflow portability across different vendor platforms
  • To be completely vendor neutral
  • To support both stateless and stateful Serverless Workflow orchestration
  • To define a light-weight and powerful Serverless Workflow model

The Serverless Workflow specification is completely open-source and operates under the Apache License version 2.0.

It is currently still in its infancy, but has an active community consisting of companies and individuals that have shown interest in moving it forward.

I am personally heavily involved in this specification and think that adoption of this specification will enhance the adoption of workflows in serverless applications.

I would like to encourage all readers to contribute to this specification to help it grow.

We will address specifics of the Serverless Workflow specification in further posts. For now I just wanted to introduce it to the public and raise interest. Looking forward to hear your thoughts and comments!

- Tihomir Surdilovic -