понедельник, 12 октября 2020 г.

[prog.c++] The follow-up for "What's new in rotor v0.09" article

There is a new article "What's new in rotor v0.09" that tells several things about yet another C++ actor framework named "rotor". As for me it's a good example of how different implementations of Actor Model could be. And it's a good reason to write some words about SObjectizer's features to show how the same things were addressed a long time ago.

Before I start, it's necessary to note that some of the decisions described below are rather philosophical than technical ones. It also means that some decisions taken from philosophical standpoints had a significant impact on technical aspects of SObjectizer's internals and API, and on applications built on top of the SObjectizer.

Messages can be lost

One of the decisions that was made at the very beginning of SObjectizer's life is that messages can be lost. There could be different reasons for that.

The target agent can be destroyed right in the time of message flight.

Or the target agent can be overloaded and the message sent can be thrown out by overload defense.

Or the target agent can ignore some messages in its current state (another cornerstone decision for SObjectizer was that all agents are finite-state machines).

Anyway, regardless of the actual reason for the message loss, the fact is that the message sender doesn't know if the message sent was received or not. It leads to interesting consequences.

Decoupling of entities in a program

Message-passing in SObjectizer has a fire-and-forget nature. Because of the possibility of the message loss, there is no need to support a strong relationship between agents in SObjectizer. A sender just sends messages and it doesn't depend on the presence of a receiver. It means that you can recreate a receiver at any moment and that can not disturb the sender at all.

There is also support for 1-to-many message delivery. It means that it's possible to hide a group of receivers behind a message box. It also means broadcasting a message to several receivers is already a part of the SObjectizer.

Application reliability

At the first glance an ability to lose messages has an obvious negative impact on the application's reliability. Some can even think that it is very hard to write reliable code if an interaction between two agents can be broken so easily.

But the reality is not so dramatic. At least in my experience.

There are usually some messages that can be safely ignored. For example, an agent periodically distributes its current state. A single instance of such messages can be lost and that won't have an impact because there will be another message with fresh information inside.

More often messages are used as commands for performing some actions. Or as requests to gather some information. Loss of such messages isn't appropriate.

There is a very simple and almost bulletproof solution for such cases: the usage of timers and resending of messages if there is no acknowledgment.

It is a very simple approach, but it's hard to imagine how practical and useful it is.

Very rarely you can find yourself with messages that simply can't be lost. The loss of such messages means that something went totally wrong and there is no way to recover. But this is a big topic that can't be discussed in the current blog post.

As a summary I can say that usage of timers with resending of messages makes applications even more reliable than it can be expected. So the fact of ability to lose messages leads to paradoxical consequences: applications based on async message-passing can be more robust and reliable.

Messages do not lost very often in practice

My experience tells that messages are rarely lost in real life.

First of all we should distinguish the loss of a message and the disregarding of the message by the receiver. Agents in SObjectizer are finite-state machines and it's normal when an agent handles message M in one state and dismisses it in another state. This is a part of agent business logic and such ignoring is not considered as the loss.

One of the cases is the loss of messages as a result of overload defense. For example, agent B has a limited bandwidth and uses some kind of defense for overloading of incoming messages. When agent A sends a request M to overloaded agent B then M will be ignored.

But the most widespread case is the absence of the destination of the message. For example, agent A sends a request M to agent B, but agent A ends its lifetime just before agent B receives M. In that case B will send the response to a non-existent destination and the response will be lost.

The trick is that in most applications there is no such problem as the absence of the destination for a message. It is because agents in SObjectizer are grouped into cooperations. And members of one cooperation can't work without each other.

Cooperations of agents

Another decision made very early in SObjectizer's history is that agents should work in groups.

We analyzed our experience with the predecessor of SObjectizer and it became obvious that one of the problems of the initializing of an agent-based application was the creation of interacting agents in the right order.

Just suppose that there should be agents A and B that have to send messages to each other. We create those agents separately: agent A first, the agent B. How will agent A know that agent B is already here?

The way we won't go was the usage of some additional notification for A and B: we create A, then create B, then send a separate 'group-are-ready' message to A and B. This makes implementation of A and B harder. It also makes the procedure of creation of A and B more complex, because if A is already created and the creation of B fails we have to destroy A.

To avoid that we came to an idea of cooperations. Cooperation (or just coop) is a group of agents that should work together. Agents are introduced into the SObjectizer only within coops. It means that if agents A and B have to work with each other we just place them into one coop. If coop successfully registered then A and B are both working. If the registration of a coop fails no one from the coop's agent is present.

That is a very simple idea. And I can say that it simplifies the working with agents in SObjectizer a lot. Some users even tell us that coops is one of the SObjectizer's features they love the most.

The presence of coops in SObjectizer is one of the reasons why SObjectizer doesn't have Erlang-like supervisor trees.

Hierarchies of agents

The Erlang programming language is one of the icons of implementation of Actor Model. And one of the most influential parts of Erlang is supervisor trees (or hierarchies of Erlang processes). SObjectizer implements Actor Model too but hasn't something like Erlang's supervisors. Why?

I think the Erlang's supervisor trees solves two tasks:

The first one is the implementation of let-it-crush (or fail-fast) principle. The cornerstone of Erlang's ideology is the presence of failures. Every process inside the application can crush and there is a need to cope with it. So there come supervisors. A supervisor handles crushes of any of its children. If a supervisor crushes then its crush will be handled by supervisor's supervisor and so on.

The second one is the dividing of processes into various groups each of them works on separate parts of the business logic (e.g. some processes perform networking stuff, some handles application data, some work with databases and so on). Each group is represented as a separate process hierarchy.

So let's look at how those tasks are addressed in the SObjectizer.

The applicability of let-it-crush principle in C++

Erlang is a safe managed language with its own virtual machine that implements process isolation. As a consequence if one of Erlang processes crushes that crush doesn't have an impact on other Erlang processes in the same VM. That is why supervisor trees work so perfectly in Erlang. And that is why let-it-crush ideology can be the cornerstone of Erlang.

But the situation is completely different in C++.

C++ isn't a safe or managed language. Agents in SObjectizer are just C++ objects that all live in the same address space. It means that if there is some unexpected failure in one agent then this failure can corrupt any other object in the same process.

So in C++, we can just restart a failed agent because we don't know how many damages were made.

Because of that I strongly believe that supervisor trees for reliability purposes are just useless in C++.

Groups of agents for performing various tasks

As I said earlier coops are the way of grouping agents that work together. So a coop itself can already be seen as a very simple supervisor that controls several agents that were placed into the coop.

But the experience showed us that several coops on the same hierarchy level are not a perfect solution. Sometimes agents for performing some action can be created only after some time after the start of an application. Sometimes those agents should be ensured that their creators are still here.

We solved this issue by adding a parent-child relationship between coops in the SObjectizer.

It is possible to create a coop that will be a child for another coop. And SObjectizer guarantees two important things:

  • a child coop will be deregistered before the deregistration of its parent coop;
  • if a coop is being deregistered all its children coops (including children of children) will be automatically registered too.

Such behavior allows us to use another assumption: an agent from a child coop can be ensured that an agent from the parent coop is still here. It also simplifies the interaction between agents and minimizes cases where messages are lost.

So it seems that coops with parent-child relationships solve the task of grouping various agents for performing different business-logic actions just like Erlang's supervisor trees.

Instead of conclusion

I wrote this post because it was interesting for me to recognize how different are approaches in the SObjectizer and the rotor frameworks.

I hope this post will help to understand why SObjectizer looks like it looks. And maybe help someone to make his/her choice between different C++ actor frameworks.

PS. There is a fair amount of text and there are no code examples or graphical schemes/diagrams. If someone finds that that absence makes the understanding of the post (or some parts of the post) difficult please let me know. I'll add some illustrations.

Комментариев нет: