Fang, async background processing for Rust

Even though the first stable version of Rust was released in 2015, there are still some holes in its ecosystem for solving common tasks. One of which is background processing.

In software engineering background processing is a common approach for solving several problems:

Carry out periodic tasks. For example, deliver notifications, update cached values.
Defer expensive work so your application stays responsive while performing calculations in the background

Most programming languages have go-to background processing frameworks/libraries. For example:

Ruby - sidekiq. It uses Redis as a job queue.
Python - dramatiq. It uses RabbitMQ as a job queue.
Elixir - oban. It uses a Postgres DB as a job queue.

The async programming (async/await) can be used for background processing but it has several major disadvantages if used directly:

It doesn’t give control of the number of tasks that are being executed at any given time. So a lot of spawned tasks can overload a thread/threads that they’re started on.
It doesn’t provide any monitoring which can be useful to investigate your system and find bottlenecks
Tasks are not persistent. So all enqueued tasks are lost on every application restart

To solve these shortcomings of the async programming we implemented the async processing in the fang library.

Threaded Fang

Fang is a background processing library for rust. The first version of Fang was released exactly one year ago. Its key features were:

Each worker is started in a separate thread
A Postgres table is used as the task queue

This implementation was written for a specific use case - el monitorro bot. This specific implementation of background processing was proved by time. Each day it processes more and more feeds every minute (the current number is more than 3000). Some users host the bot on their infrastructure.

You can find out more about the threaded processing in fang in this blog post.

Async Fang

Async provides significantly reduced CPU and memory overhead, especially for workloads with a large amount of IO-bound tasks, such as servers and databases. All else equal, you can have orders of magnitude more tasks than OS threads, because an async runtime uses a small amount of (expensive) threads to handle a large amount of (cheap) tasks

From the Rust's Async book

For some lightweight background tasks, it’s cheaper to run them on the same thread using async instead of starting one thread per worker. That’s why we implemented this kind of processing in fang. Its key features:

Each worker is started as a tokio task
If any worker fails during task execution, it’s restarted
Tasks are saved to a Postgres database. Instead of diesel, tokio-postgres is used to interact with a db. The threaded processing uses the diesel ORM which blocks the thread.
The implementation is based on traits so it’s easy to implement additional backends (redis, in-memory) to store tasks.

Usage

The usage is straightforward:

Define a serializable task by adding serde derives to a task struct.
Implement AsyncRunnable runnable trait for fang to be able to run it.
Start workers.
Enqueue tasks.

Let’s go over each step.

Define a job

use fang::serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
#[serde(crate = "fang::serde")]
pub struct MyTask {
    pub number: u16,
}

impl MyTask {
    pub fn new(number: u16) -> Self {
        Self { number }
    }
}

Fang re-exports serde so it’s not required to add it to the Cargo.toml file

Implement the AsyncRunnable trait

use fang::async_trait;
use fang::typetag;
use fang::AsyncRunnable;
use std::time::Duration;

#[async_trait]
#[typetag::serde]
impl AsyncRunnable for MyTask {
    async fn run(&self, queue: &mut dyn AsyncQueueable) -> Result<(), Error> {
        let new_task = MyTask::new(self.number + 1);
        queue
            .insert_task(&new_task as &dyn AsyncRunnable)
            .await
            .unwrap();

        log::info!("the current number is {}", self.number);
        tokio::time::sleep(Duration::from_secs(3)).await;

        Ok(())
    }
}

Fang uses the typetag library to serialize trait objects and save them to the queue.
The async-trait is used for implementing async traits

Init queue

use fang::asynk::async_queue::AsyncQueue;

let max_pool_size: u32 = 2;
let mut queue = AsyncQueue::builder()
    .uri("postgres://postgres:postgres@localhost/fang")
    .max_pool_size(max_pool_size)
    .duplicated_tasks(true)
    .build();

Start workers

use fang::asynk::async_worker_pool::AsyncWorkerPool;
use fang::NoTls;

let mut pool: AsyncWorkerPool<AsyncQueue<NoTls>> = AsyncWorkerPool::builder()
    .number_of_workers(10_u32)
    .queue(queue.clone())
    .build();

pool.start().await;

Insert tasks

let task = MyTask::new(0);

queue
    .insert_task(&task1 as &dyn AsyncRunnable)
    .await
    .unwrap();

Pitfalls

The async processing is suitable for lightweight tasks. But for heavier tasks it’s advised to use one of the following approaches:

start a separate tokio runtime to run fang workers
use the threaded processing feature implemented in fang instead of the async processing

Future directions

There are a couple of features planned for fang:

Retries with different backoff modes
Additional backends (in-memory, redis)
Graceful shutdown for async workers (for the threaded processing this feature is implemented)
Cron jobs

Conclusion

The project is available on GitHub

The async feature and this post is written in collaboration between Ayrat Badykov (github) and Pepe Márquez Romero (github)

Twitter Facebook LinkedIn