Buffering by block in Rust

2025-01-23

I’m taking a competitive programming course this semester, and the platform we’re using (Kattis) apparently prioritizes fast I/O, to the point where some right answers will be rejected with a Time Limit Exceeded (TLE).

I wanted to see if there was a way to squeeze out just a bit more read/write performance in Rust, and was surprised to learn that stdout is line-buffered by default. This means that an innocent-looking code sample like the following writes slower to console than the theoretical speed limit:

use std::time::Instant;

fn main() {
    let time = Instant::now();

    for i in 0..500000 {
        println!("{}", i);
    }

    let duration = time.elapsed();
    println!("Time elapsed: {:?}", duration);
}

When you run it, you’ll get an output like the following:

(...)
499997
499998
499999
Time elapsed: 17.5983885s

So why does this happen? Every time you call the println!() macro, it emits a newline character (\n) to stdout, and every time stdout sees a newline character it flushes its buffer. To flush the buffer, it executes a syscall that writes the contents of the buffer to the console, or to a file if there are redirects, whatever.

The problem is, every time you execute a syscall there is some overhead because the processor has to change from user to kernel mode and shift around registers and so on. And we’re doing this 500k times! So the overhead stacks up.

The solution is to stick a BufWriter in front of the stdout, which might seem weird and the exact opposite of what we’re trying to achieve (we wanted less buffering, right?), until you realize that BufWriter holds the contents you are trying to write in a separate buffer. When we call .flush() on BufWriter, it dumps all of the contents in stdout and calls .flush() on it as well, which effectively makes stdout block-buffered.

I was initially skeptical of this approach and thought stdout would just end up calling the write-syscall whenever it saw the newline character being passed through from BufWriter’s buffer, but the performance improvements suggest otherwise.

The modified code:

use std::io;
use std::io::{BufWriter, Write};
use std::time::Instant;

fn main() {
    let time = Instant::now();
    let output = io::stdout().lock();
    let mut buffer = BufWriter::new(output);

    for i in 0..500000 {
        writeln!(buffer, "{}", i).unwrap();
    }
    buffer.flush().unwrap();


    let duration = time.elapsed();
    println!("Time elapsed: {:?}", duration);
}

And the results in case you thought I was lying:

(...)
499997
499998
499999
Time elapsed: 3.7357123s

So why doesn’t Rust just let us toggle between line-buffered and block-buffered for stdout? Well, the issue exists on their GitHub issue tracker, but the associated pull request was closed back in 2022 and as a result the problem has been stuck like that for the past 3 years. (If you count the original issue that makes it nearly 6 years!)

So until this becomes part of the language, the workaround above seems to be the best crate-free way of getting faster I/O in Rust. Of course, you could also try and get the raw, unbuffered stdout using workarounds such as this one by @WieeRd on GitHub, or do what ripgrep does with their conditional buffering.

If only Kattis allowed crates.