вторник, 10 марта 2020 г.

[prog.c++.bicycle] I'm working on the first draft of experimental type-safe router for RESTinio

The express-like router is available in RESTinio for several years. I think it is a very easy to use tool, that is familiar to many developers. But it has some congenital defects I wanted to repair for a long time.

The congenital defects of express-like router

The propensity to errors

The first defect is that express-like router is error-prone and isn't friendly for repeated cases. Let's look at one of the examples from RESTinio, express_router:

auto server_handler( book_collection_t & book_collection )
{
   auto router = std::make_unique< router_t >();
   auto handler = std::make_shared< books_handler_t >( std::ref(book_collection) );

   auto by = [&]( auto method ) {
      using namespace std::placeholders;
      return std::bind( method, handler, _1, _2 );
   };

   router->http_get( "/", by( &books_handler_t::on_books_list ) );

   router->http_get( R"(/:booknum(\d+))", by( &books_handler_t::on_book_get ) );

   router->http_get( "/author/:author", by( &books_handler_t::on_author_get ) );

   router->http_post( "/", by( &books_handler_t::on_new_book ) );

   router->http_put( R"(/:booknum(\d+))", by( &books_handler_t::on_book_update ) );

   router->http_delete( R"(/:booknum(\d+))", by( &books_handler_t::on_book_delete ) );

   return router;
}

We have to manually specify a format for "booknum" parameter (it's a decimal number). And we are doing it wrong. Ok, not completely wrong, but not quite right: we do not limit the number of digits, so a user can specify more digits that can be stored into std::size_t.

We have also to repeat the mask for the "booknum" pattern. That can be avoided, of course, that way, for example:

auto server_handler( book_collection_t & book_collection )
{
   auto router = std::make_unique< router_t >();
   auto handler = std::make_shared< books_handler_t >( std::ref(book_collection) );

   auto by = [&]( auto method ) {
      using namespace std::placeholders;
      return std::bind( method, handler, _1, _2 );
   };

   const auto booknum = restinio::string_view_t{ R"(/:booknum(\d+))" };

   router->http_get( "/", by( &books_handler_t::on_books_list ) );

   router->http_get( booknum, by( &books_handler_t::on_book_get ) );

   router->http_get( booknum, by( &books_handler_t::on_author_get ) );

   router->http_post( "/", by( &books_handler_t::on_new_book ) );

   router->http_put( booknum, by( &books_handler_t::on_book_update ) );

   router->http_delete( booknum, by( &books_handler_t::on_book_delete ) );

   return router;
}

But that can be someway less readable in cases when the repeated fragment occurs in different places of the path. An example can be seen in restinio-long-output-sample:

auto make_router(asio_ns::io_context & ctx) {
   auto router = std::make_unique<router_t>();

   router->http_get("/", [&ctx](auto req, auto) {
         ...
      });

   router->http_get(
      R"(/:value(\d+):multiplier([MmKkBb]?))",
      [&ctx](auto req, auto params) { ... });

   router->http_get(
      R"(/:value(\d+):multiplier([MmKkBb]?)/:count(\d+))",
      [&ctx](auto req, auto params) { ... });

   return router;
}

In that case it's necessary to use something more complex like:

auto make_router(asio_ns::io_context & ctx) {
   auto router = std::make_unique<router_t>();

   router->http_get("/", [&ctx](auto req, auto) {
         ...
      });

   const char * chunk_size_pattern = R"(:value(\d+):multiplier([MmKkBb]?))";
   router->http_get(
      fmt::format(R"(/{})", chunk_size_pattern),
      [&ctx](auto req, auto params) { ... });

   router->http_get(
      fmt::format(R"(/{}/:count(\d+))", chunk_size_pattern),
      [&ctx](auto req, auto params) { ... });

   return router;
}

The absence of type-safety

Let's look at the standard express_router example again:

auto on_book_get(
   const restinio::request_handle_t& req, rr::route_params_t params );

This the declaration of a handler for a GET request. There are no requirements for a list of parameters for that method and for types of those parameters. We can't guess which parameters are required for that method until we examine the method's code.

This is not good because we can make a mistake during the definition of a route:

router->http_get( R"(/:book)", by( &books_handler_t::on_book_get ) );

And this error will be detected only at the run time.

A type-safe router for a rescue

What I wanted to have in RESTinio for a long time is a type-safe router that allows me to avoid the errors described above. Or, at least, catch them at compile-time.

Now I'm working on an experimental draft of such a router and there are the first results I want to discuss.

The roots of a new router

Some time ago a mini-framework with the name "easy_parser" was developed inside RESTinio. That framework implements somewhat similar to PEG recursive-descent parser. The easy_parser is used by RESTinio's helpers for working with HTTP-fields.

For example, let's assume that we have a rule:

flag = 'A' | 'B' | 'C-' | 'C' ['+']

This rule allows values like "A", "B", "C", "C+" or "C-". It can be expressed in easy_parser that way:

using namespace restinio::easy_parser;
alternatives(
   symbol('A'),
   symbol('B'),
   sequence(symbol('C'), symbol('-')),
   sequence(symbol('C'), maybe(symbol('+')))
);

This syntax is not as concise as Boost.Spirit or PEGTL DSL, but it works.

The simple example shown above is just a part of the story. Just parsing is not a goal of easy_parser. The goal is to produce some value during the parsing.

Suppose we want to have a value of that enumeration after the parsing:

enum class flag_type { a, b, c_minus, c_plus };

A parser for that rule that produces a value of type "flag_type" will look like:

using namespace restinio::easy_parser;
auto parse = produce<flag_type>(
   alternatives(
      symbol_producer('A') >> just(flag_type::a) >> as_result(),
      symbol_producer('B') >> just(flag_type::b) >> as_result(),
      sequence(
         symbol_producer('C') >> just(flag_type::c_minus) >> as_result(),
         symbol('-')),
      sequence(
         symbol_producer('C') >> just(flag_type::c_plus) >> as_result(),
         maybe(symbol('+')))
   )
);

It means that if symbol 'A' is extracted from the input stream it should be replaced by flag_type::a value and that value should be used as the result of the whole parsing. The similar thing is for the 'B' symbol. The rules for 'C' and the subsequent symbol are slightly harder, but the principle is the same.

The DSL above is translated to something like that:

auto parse = [](input_t & from) -> expected_t<flag_type, parse_error_t> {
   flag_type result;
   const auto ch = from.getch();
   if(ch.eof()) return make_unexpected(parse_error_t::unexpected_eof);
   if(ch.val() == 'A') { result = flag_type::a; goto success; }
   if(ch.val() == 'B') { result = flag_type::b; goto success; }
   if(ch.val() == 'C') {
      result = flag_type::c_minus;
      const auto next = from.getch();
      if(!next.eof() && next.val() == '-') { goto success; }
      from.putback();
   }
   if(ch.val() == 'C') {
      result = flag_type::c_plus;
      const auto next = from.getch();
      if(!next.eof() && next.val() == '+') { goto success; }
      from.putback();
      goto success;
   }
   return make_unexpected(parse_error_t::pattern_not_found);
success:
   return result;
};

Unfortunately, easy_parser is not a simple topic and it isn't described yet in the documentation. And the description of easy_parser can take up too much space :(

A new experimental type-safe router in RESTinio is based on easy_parser. That's why it's called easy_parser_router.

The basic principle of the easy_parser_router

The basic principle of the easy_parser_router is to produce an object of some type during the parsing of the path and to call a handler passing the produced object to it.

It means that a user has to define route handlers that way:

namespace epr = restinio::router::easy_parser_router;

router->add_handler(restinio::http_method_get(),
   epr::produce<first_type>(....),
   [](const restinio::request_handle_t & req, first_type & params) {...});

router->add_handler(restinio::http_method_get(),
   epr::produce<second_type>(...),
   [](const restinio::request_handle_t & req, second_type & params) {...});

... // and so on.

The easy_parser_router stores all route parsers and request handlers. When a new request arrives the easy_parser_router tries all stored parsers until one of them returns a successful parsing result. In that case, the corresponding request handler is invoked and the reference to the parsed value is passed to the handler.

A simple example: the express_router example rewritten

Let's see how the express_router example can be rewritten with easy_parser_router. The full source code can be seen here.

We need some short names for convenience:

namespace rr = restinio::router;
namespace epr = rr::easy_parser_router;

using router_t = rr::easy_parser_router_t;

using book_number_t = std::uint32_t;
using author_name_t = std::string;

Type "book_number_t" defines the type of the parameter for the case when a book is identified by its number. Type "author_name_t" defines the type for the case when the name of the author is used.

So now the methods of "book_handler_t" class will have signatures that tell what parameter is expected:

class books_handler_t
{
public :
   ...

   auto on_books_list(
      const restinio::request_handle_t& req,
      const epr::root_t & ) const;

   auto on_book_get(
      const restinio::request_handle_t& req,
      const book_number_t booknum );

   auto on_author_get(
      const restinio::request_handle_t& req,
      const author_name_t & author );

   auto on_new_book(
      const restinio::request_handle_t& req,
      const epr::root_t & )

   auto on_book_update(
      const restinio::request_handle_t& req,
      const book_number_t booknum );

   auto on_book_delete(
      const restinio::request_handle_t& req,
      const book_number_t booknum );

And the definition of routes now looks a bit differently:

auto server_handler( book_collection_t & book_collection )
{
   auto router = std::make_unique< router_t >();
   auto handler = std::make_shared< books_handler_t >( std::ref(book_collection) );

   auto book_num = epr::produce< book_number_t >(
         epr::slash(),
         epr::non_negative_decimal_number_producer< book_number_t >()
               >> epr::as_result()
      );

   auto by = [&]( auto method ) {
      using namespace std::placeholders;
      return std::bind( method, handler, _1, _2 );
   };

   router->add_handler(
         restinio::http_method_get(),
         epr::root(), by( &books_handler_t::on_books_list ) );

   router->add_handler(
         restinio::http_method_get(),
         book_num, by( &books_handler_t::on_book_get ) );

   router->add_handler(
         restinio::http_method_put(),
         book_num, by( &books_handler_t::on_book_update ) );

   router->add_handler(
         restinio::http_method_delete(),
         book_num, by( &books_handler_t::on_book_delete ) );

   router->add_handler(
         restinio::http_method_post(),
         epr::root(), by( &books_handler_t::on_new_book ) );

   router->add_handler(
         restinio::http_method_get(),
         epr::produce< author_name_t >(
               epr::slash(),
               epr::exact("author"),
               epr::slash(),
               epr::path_fragment() >> epr::unescape() >> epr::as_result() ),
         by( &books_handler_t::on_author_get ) );

   router->non_matched_request_handler( ... );

   return router;
}

We can see the local variable "book_num" that holds a parser for a route with book number. That variable is used several times for the definition of similar routes.

The most complex definition is a route for getting all books of a specific author:

epr::produce< author_name_t >(
   epr::slash(),
   epr::exact("author"),
   epr::slash(),
   epr::path_fragment() >> epr::unescape() >> epr::as_result() ),

It corresponds to the following PEG rule:

path = '/' 'a' 'u' 't' 'h' 'o' 'r' '/' (!'/')+

where "epr::path_fragments()" implements "a non-empty sequence of symbols that are not slash".

It's also needed to mention that value produced by "path_fragment" is not only stored as the result of parsing procedure, it's also transformed from percent-escaped representation:

epr::path_fragment() >> epr::unescape() >> ...

What is good in that example is that it is impossible to pass a parameter of type "book_number_t" to a request-handler that accepts "author_name_t" or vice versa.

A more complex case: restinio-long-output-sample rewritten

Now it's time to review a more complex code from restinio-long-output-sample. In it's express-router based version a helper function for transforming ":value(\d+):multiplier([MmKkBb]?)" pattern to a number is defined:

std::size_t extract_chunk_size(const restinio::router::route_params_t & params) {
   const auto multiplier = [](const auto sv) noexcept -> std::size_t {
      if(sv.empty() || "B" == sv || "b" == sv) return 1u;
      else if("K" == sv || "k" == sv) return 1024u;
      else return 1024u*1024u;
   };

   return restinio::cast_to<std::size_t>(params["value"]) *
         multiplier(params["multiplier"]);
}

This function was called from request handlers:

router->http_get(
   R"(/:value(\d+):multiplier([MmKkBb]?))",
   [&ctx](auto req, auto params)
   {
      const auto chunk_size = extract_chunk_size(params);

      if(0u != chunk_size) {
         request_processor(ctx, chunk_size, 10000u, std::move(req));
         return restinio::request_accepted();
      }
      else
         return restinio::request_rejected();
   });

With easy_parser/easy_parser_router this pattern can be parsed that way:

struct chunk_size {
   std::size_t count_;
   std::size_t multiplier_;
};

constexpr std::size_t one_byte = 1u;
constexpr std::size_t one_kib = 1024u * one_byte;
constexpr std::size_t one_mib = 1024u * one_kib;

auto parse = epr::produce<chunk_size>(
   epr::non_negative_decimal_number_producer() >> &chunk_size::count_,
   epr::maybe(
      epr::produce<std::size_t>(
         epr::alternatives(
            epr::caseless_symbol_producer('b') >> just(one_byte) >> as_result(),
            epr::caseless_symbol_producer('k') >> just(one_kib) >> as_result(),
            epr::caseless_symbol_producer('m') >> just(one_mib) >> as_result(),
         )
      ) >> &chunk_size::multiplier_
   )
);

This code parses the following PEG rule:

chunk_size = number [ 'b'|'B' |  'k'|'K' | 'm'|'M' ]

and produces an instance of type "chunk_size" with two integers inside.

So we can write two paths in the following form:

handler->add_handler(restinio::http_method_get(),
   epr::produce<chunk_size>(epr::slash(), parse >> as_result()),
   [](const restinio::request_handler_t & req, chunk_size & params) {...});

struct chunk_size_and_count {
   chunk_size size_;
   std::size_t count_;
};
handler->add_handler(restinio::http_method_get(),
   epr::produce<chunk_size_and_count>(
      epr::slash(),
      parse >> &chunk_size_and_count::size_,
      epr::slash(),
      epr::non_negative_decimal_number_producer<std::size_t>() >> &chunk_size_and_count::count_),
   [](const restinio::request_handler_t & req, chunk_size_and_count & params) {...});

But we can also use the fact that rules in PEG can contain optional parts. So instead of the rule with two alternatives:

params = '/' chunk_size '/' number
       | '/' chunk_size

we can use a simpler rule:

params = '/' chunk_size [ '/' number ]

And because of that we can define a single path for the router:

struct distribution_params
{
   std::size_t size_value_{1u};
   std::size_t size_multiplier_{1u};

   std::size_t count_{10000u};

   std::size_t chunk_size() const noexcept
   {
      return size_value_ * size_multiplier_;
   }

   std::size_t count() const noexcept
   {
      return count_;
   }
};

constexpr std::size_t one_byte = 1u;
constexpr std::size_t one_kib = 1024u * one_byte;
constexpr std::size_t one_mib = 1024u * one_kib;

router->add_handler(restinio::http_method_get(),
         epr::produce<distribution_params>(
            epr::slash(),
            epr::non_negative_decimal_number_producer<std::size_t>()
               >> &distribution_params::size_value_,
            epr::maybe(
               epr::produce<std::size_t>(
                  epr::alternatives(
                     epr::caseless_symbol_producer('b')
                        >> epr::just(one_byte) >> epr::as_result(),
                     epr::caseless_symbol_producer('k')
                        >> epr::just(one_kib) >> epr::as_result(),
                     epr::caseless_symbol_producer('m')
                        >> epr::just(one_mib) >> epr::as_result()
                  )
               ) >> &distribution_params::size_multiplier_
            ),
            epr::maybe(
               epr::slash(),
               epr::non_negative_decimal_number_producer<std::size_t>()
                  >> &distribution_params::count_
            )
         ),
         [&ctx](auto req, const auto & params) {
      if(0u != params.chunk_size()) {
         request_processor(
               ctx,
               params.chunk_size(),
               params.count(),
               std::move(req));
         return restinio::request_accepted();
      }
      else
         return restinio::request_rejected();
   });

See also Upd.2 at the end of the post.

Instead of the conclusion

I've received the first results of an experiment with the creation of type-safe router with strict compile-time checking. The results look promising. The experiment has proven that such type-safe router can be done.

However, the current way of specifying the parsers looks weird and, maybe, scaring. I hope the DSL for parsers can be improved, but don't know if it is really possible. And, if so, how much improvements can I make here. So the main problem on the way to add easy_parser_router to the RESTinio is the complexity of easy_parser itself.

Another obstacle is the miss of the benchmarks for the new router. The first draft is just implemented and I've no time to take some measures. It's obvious that actual numbers will be a function of the number of paths in the router and the complexity of those paths. I hope that even current implementation should show dozens of thousands of paths per second, but it isn't measured yet. Upd. See "Updates" section below.

It would be great to hear opinions from existing or potential RESTinio's users: do you like that direction? Do you want to have a type-safe alternative to express-like routers in RESTinio? Does the easy_parser DSL really scare you?

Please let me know.

And it would be great if you share this post somehow and somewhere. The more eyes see it the more constructive feedback we can get. I hope.

Updates

Upd.1.

I've made a simple benchmark that repeats our simple hardcoded router used for benchmarks described in this section of RESTinio's documentation. When it is run with one worker thread the wrk shows ~53K req/sec on my notebook. The hardcoded cmp_router_bench in the same setup shows ~55K req/sec, the express_router_bench (with std::regex) shows ~48K req/sec. The wrk is run with the following parameters:

wrk --latency -t 4 -c 256 -d 10 -s cmp_routes.lua http://localhost:8080

So it looks that easy_parser_router can be quite competitive. At least in some cases.

Upd.2.

After some refactoring of RESTinio's easy_parser and easy_parser_router the corresponding fragment from restinio-long-output-sample looks this way now:

struct distribution_params
{
   std::size_t chunk_size_{100u*1024u};
   std::size_t count_{10000u};
};

auto make_router(asio_ns::io_context & ctx) {
   auto router = std::make_unique<router_t>();

   router->add_handler(restinio::http_method_get(),
      epr::root(),
      [&ctx](const auto & req, auto)
      {
         distribution_params params; // Use default values.
         request_processor(ctx, params.chunk_size_, params.count_, std::move(req));
         return restinio::request_accepted();
      });

   struct chunk_size { std::uint32_t c_{1u}, m_{1u}; };

   router->add_handler(restinio::http_method_get(),
      epr::produce<distribution_params>(
         epr::slash(),
         epr::produce<chunk_size>(
            epr::non_negative_decimal_number_p<std::uint32_t>()
               >> &chunk_size::c_,
            epr::maybe(
               epr::produce<std::uint32_t>(
                  epr::alternatives(
                     epr::caseless_symbol_p('b') >> epr::just_result(1u),
                     epr::caseless_symbol_p('k') >> epr::just_result(1024u),
                     epr::caseless_symbol_p('m') >> epr::just_result(1024u * 1024u)
                  )
               ) >> &chunk_size::m_
            )
         ) >> epr::convert(
               [](chunk_size cs) { return std::size_t{cs.c_} * cs.m_; })
            >> &distribution_params::chunk_size_,
         epr::maybe(
            epr::slash(),
            epr::non_negative_decimal_number_p<std::size_t>()
               >> &distribution_params::count_
         )
      ),
      [&ctx](auto req, const auto & params)
      {
         if(0u != params.chunk_size_) {
            request_processor(
                  ctx,
                  params.chunk_size_,
                  params.count_,
                  std::move(req));
            return restinio::request_accepted();
         }
         else
            return restinio::request_rejected();
      });

   return router;
}

Комментариев нет: