I’m taking a competitive programming course this semester, and the platform we’re using (Kattis) apparently prioritizes fast I/O, to the point where some right answers will be rejected with a Time Limit Exceeded (TLE).
I wanted to see if there was a way to squeeze out just a bit more read/write performance in Rust, and was surprised to learn that stdout
is line-buffered by default. This means that an innocent-looking code sample like the following writes slower to console than the theoretical speed limit:
use std::time::Instant;
fn main() {
let time = Instant::now();
for i in 0..500000 {
println!("{}", i);
}
let duration = time.elapsed();
println!("Time elapsed: {:?}", duration);
}
When you run it, you’ll get an output like the following:
(...)
499997
499998
499999
Time elapsed: 17.5983885s
So why does this happen? Every time you call the println!()
macro, it emits a newline character (\n
) to stdout
, and every time stdout
sees a newline character it flushes its buffer. To flush the buffer, it executes a syscall that writes the contents of the buffer to the console, or to a file if there are redirects, whatever.
The problem is, every time you execute a syscall there is some overhead because the processor has to change from user to kernel mode and shift around registers and so on. And we’re doing this 500k times! So the overhead stacks up.
The solution is to stick a BufWriter
in front of the stdout
, which might seem weird and the exact opposite of what we’re trying to achieve (we wanted less buffering, right?), until you realize that BufWriter
holds the contents you are trying to write in a separate buffer. When we call .flush()
on BufWriter
, it dumps all of the contents in stdout
and calls .flush()
on it as well, which effectively makes stdout
block-buffered.
I was initially skeptical of this approach and thought stdout
would just end up calling the write-syscall whenever it saw the newline character being passed through from BufWriter
’s buffer, but the performance improvements suggest otherwise.
The modified code:
use std::io;
use std::io::{BufWriter, Write};
use std::time::Instant;
fn main() {
let time = Instant::now();
let output = io::stdout().lock();
let mut buffer = BufWriter::new(output);
for i in 0..500000 {
writeln!(buffer, "{}", i).unwrap();
}
buffer.flush().unwrap();
let duration = time.elapsed();
println!("Time elapsed: {:?}", duration);
}
And the results in case you thought I was lying:
(...)
499997
499998
499999
Time elapsed: 3.7357123s
So why doesn’t Rust just let us toggle between line-buffered and block-buffered for stdout
? Well, the issue exists on their GitHub issue tracker, but the associated pull request was closed back in 2022 and as a result the problem has been stuck like that for the past 3 years. (If you count the original issue that makes it nearly 6 years!)
So until this becomes part of the language, the workaround above seems to be the best crate-free way of getting faster I/O in Rust. Of course, you could also try and get the raw, unbuffered stdout
using workarounds such as this one by @WieeRd on GitHub, or do what ripgrep does with their conditional buffering.
If only Kattis allowed crates.