Introduction

yaws io_uring abstracts the underlying low level Linux io_uring library.

We aim too provide safer semantics towards managing all the moving parts given these have complex lifetimes and ownership, typically shared between the kernel, userspace and the user.

high-level

Bearer

UringBearer (io-uring-bearer) is the carrier holding all of the io_uring associated generated instances of types.

OpCode

OpCode and OpCode (io-uring-opcode) are the contracts between bearer and individual opcode implementations.

In addition it provides the required extension traits for implementing the individual opcodes.

Ownership

Owner (io-uring-owner) provides the ownership semantics between kernel, userspace and the user.

Filehandles

RegisteredFd and FdKind (io-uring-fd) provides the types representing filehandles within io_uring.

Buffers

UringBuffer (io-uring-buffer) will provide the types representing buffers within io_uring.

Note: As of time writing buffers are still within the bearer but this will change.

Bearer

The crate io-uring-bearer consists of the below main types:

TypeDescription
UringBearerThe main carrier type holding all the instaces of Completion, Registered Filehandles & Buffers
UringBearerErrorThe error type for UringBearer
BearerCapacityKindDescriptor used to describe the boundaries of capacities required
CompletionThe main completion type bringing together all the possible competions

Associated Built-In Types

The below will be migrated into separate crates, implementing the io-uring-opcode trait later, similar to EpollCtl.

Until then, io-uring-bearer still holds some of the required holding types:

Holding TypeDescription
BuffersRecHolds the actual allocation for the Buffers that either owned by the Kernel or Userspace.
FutexRecfutex2(2) -like, Used for FutexWait

In addition io-uring-bearer still holds some of the individual OpCode Pending / Completion slab types:

OpCode TypeDescription
AcceptRecaccept4(2), used for Accept and AcceptMulti
FutexWaitRecRepresents FutexWait
RecvRecRepresents Recv
RecvMultiRecRepresents RecvMulti
SendZcRecRepresents SendZc

Capacity

UringBearer requires bounded capacities described to it.

Leaving things unbounded and without any capacity planning, it would be easy to create opportunities for denial of service, crashes etc. resulting from unbounded capacities at runtime.

All consumers must describe the required capacities through BearerCapacityKind type.

Given that the capacity is described and used at runtime context,

You should test your intended runtime with the described capacity to ensure that the given environment meets it where ever it's deployed in.

Anything that hits the capacity will typically see SlabbableError AtCapacity(setting) variant.

The capacity values are only used when the UringBearer is constructed and are not changeable within it's lifetime after.

See the BearerCapacityKind for an example how to use it along the capacity crate.

Slabbable

See blog which describes the Slabbable trait in general.

Normal Vec etc. would implicitly allocate without warning and invalidate the underlying addresses.

Given that memory addresses need to be kept stable for many things submitted to the io_uring queue,

UringBearer uses Slabbable to back the underlying pending or completed instances of Completion.

Any top-level binary can configure the desired Slabbable implementation through SelectedSlab.

By default the crate currently uses nohash-hasher through HashBrown through slabbable-hash.

Completions

We exploit the fact that Kernel doesn't make any opinion what the userdata should contain for the pending completions by giving each submission a rotating u64 identifier which can then be mapped into the completed type upon completion.

This avoids using any from_raw_parts -like re-construction when we can refer through rotating u64 serial through storage.

We could also use std::mem::forget but we can also model this through the trait giving the option to the user what kind of storage they would like.

Main reason we like to track the items is we that can have some level of ownership / pointer provenance, manage easily any associated data (e.g. regarding the ownership) and more importantly we can construct a gated accessor / barrier to it whilst when for example the kernel has only immutable reference to the said data meaning we can also hold immutable references whilst doing so where as simply forgetting the submitted data.

Not only this but sometimes you might have an arena of buffers (e.g. group of buffers) that are provided to kernel and you must handle the ownership status of window / slice into the part of group of continuous buffers that have been provided back essentially fragmenting the type in some scenarios.

In short, the trait gives us the flexibility and we are able to test that all the implementations uphold the guarantees whilst leaving us easy way to benchmark any possible scenarios through the varying implementation in one go.

OpCode

All io_uring operations or ops are described through their opcodes.

The below opcodes are currently implemented:

OpCodeLinux Kerneltrait / built-inDescription
Acceptbuilt-inaccept4(2) (single-shot)
EpollCtltrait implEpoll Control
FuteXWaitbuilt-inFutex Wait
ProvideBuffersbuilt-inRegister Buffers with Kernel for faster I/O
Recvbuilt-inReceive (single-shot)
RecvMultibuilt-inReceive (multi-shot)
SendZcbuilt-inSend (Zero Copy)

All the in-Bearer built-in OpCodes will be moved to implement the associated trait in the future.

All the builti-in & OpCode + OpCompletion trait impls are mapped into Completion enum.

To push a bearer-aware submission, use the push_* or push_op_typed methods via the associated UringBearer.

Accept

Accept abstracts the underlying io_uring::opcode::Accept.

This Single-shot Accept provides the source socket address and port within the completion.

If you either don't need the source address and / or port, consider using the AcceptMulti instead.

Note: This is being migrated to implement the OpCode + OpCompletion traits similar to EpollCtl.

Submission

You can currently push single-shot Accept unsafely depending on whether the underlying TcpListener RawFd is IPv4 or IPv6:

To use it safely, user must ensure that the underlying socket is either exclusively IPv4 or IPv6 given the returned source address and structure layout is different depending on which one it is.

If someone needs UNIX sockets, please feel free to send a PR.

Completion

Accept(AcceptRec) will show up as normal through the handler API through UringBearer.

Lifetime (Manual handling)

SubmissionRecordStatus::Forget is safe given the pending Completion was Single-shot.

If the completed record is retained, this will result in memoryleak.

EpollCtl

EpollCtl abstracts the underlying io_uring::opcode::EpollCtl.

You can use io_uring to batch the control calls of Epoll.

For now the separate io-uring-epoll crate also provides a convenient syscall abstraction into epoll_wait which is not yet available through the io_uring interaface over a regular syscall as of now.

If you have millions of sockets that constantly change status, it is helpful to be able to batch the control calls.

See the examples from the io-uring-epoll repository.

Note: io-uring-epoll 0.1 crate is different to 0.2 which implements the new OpCode/Completion traits.

Construct

Submission

Completion

EpollCtl(impl OpExtEpollCtl) will show up as normal through the handler API through UringBearer.

Lifetime (Manual handling)

You should only Forget the underlying EpollCtl when the RawFd is removed from monitored filehandles list.

Failing to Retain the underlying EpollCtl before removing it will result in undefined behaviour.

FutexWait

FutexWait abstracts the underlying io_uring::opcode::FutexWait.

This can be used, among other use-cases, to combine epoll_wait or other events waiting on another thread with io_uring completions given atomics can be used to generate completion events through FutexWait.

Note: This is being migrated to implement the OpCode + OpCompletion traits similar to EpollCtl.

Construct

You can create the underlying indexed AtomicU32 which will be owned by the UringBearer through:

Alternatively and unsafely you can provide your own AtomicU32 to UringBearer through:

Submission

You can submit a FutexWait through UringBearer::add_futex_wait.

Completion

FutexWait(FutexWaitRec) will show up as normal through the handler API through UringBearer.

Lifetime (Manual handling)

SubmissionRecordStatus::Forget is safe given the pending Completion was Single-shot.

If the completed record is retained, this will result in memoryleak.

ProvideBuffers

ProvideBuffers abstracts the underlying io_uring::opcode::ProvideBuffers.

To do faster I/O it is essential to register (or map) any userspace buffers with the kernel so the kernel can spend less time mapping the userspace buffers between the Recv / Send calls.

Note: This is being migrated to implement the OpCode + OpCompletion traits similar to EpollCtl.

Construct

See the main section about Buffers how to manage these.

Submission

Use the UringBearer::provide_buffers to both submit and associate any created Buffers with kernel-mapped identifiers.

We may later have API to make this more easier to use but for now the user must keep track of the kernel-mapped identifiers.

Completion

ProvideBuffers(ProvideBuffersRec) is provided normally through the handler API via UringBearer.

Lifetime (Manual handling)

The actual buffers are separate from the registering it so the submission can be forgotten after completion safely.

Recv

Single-shot Recv abstracts the underlying io_uring::opcode::Recv.

Note: This is being migrated to implement the OpCode + OpCompletion traits similar to EpollCtl.

Construct

Recv requires a registered filehandle and previously created indexed buffer.

Submission

Use UringBearer::add_recv to submit a Single-shot Recv to kernel.

Completion

Recv(RecvRec) is provided normally through the handler API via UringBearer.

Lifetime (Manual handling)

SubmissionRecordStatus::Forget is safe given the pending Completion was Single-shot.

RecvMulti

RecvMulti abstracts the underlying [io_uring::opcode::RecvMulti](https://docs.rs/io-uring/latest/io_uring/opcode/struct.Recv Multi.html).

Note: This is being migrated to implement the OpCode + OpCompletion traits similar to EpollCtl.

Construct

RecvMulti requires both a registered filehandle and previously registered / kernel-mapped buffer/s with the referred group.

Submission

Use UringBearer::add_recv_multi to submit a RecvMulti to kernel.

Completion

RecvMulti(RecvMultiRec) is provided normally through the handler API via UringBearer.

See Buffers on how to deal with individual "selected buffers" within registered "grouped" Buffers.

Lifetime (Manual handling)

It would be undefined behaviour if the submission record is invalidated before the Multi-shot submission is either confirmed cancelled or timed out.

SendZc

SendZc abstracts the underlying io_uring::opcode::SendZc.

Note: This is being migrated to implement the OpCode + OpCompletion traits similar to EpollCtl.

Construct

Currently SendZc requires the associated indexing (through submission) into:

Submission

Currently submit using UringBearer::add_send_singlebuf.

Completion

SendZc(SendZcRec) is provided normally through the handler API via UringBearer.

Lifetime (Manual handling)

Extensions

Every OpCode should add:

Ownership

Filehandles

Buffers