Wednesday, April 25, 2012

How to get good performance when writing a list of integers from 1 to 10 million to a file?

question



I want a program that will write a sequence like,



1
...
10000000


to a file. What's the simplest code one can write, and get decent performance? My intuition is that there is some lack-of-buffering problem. My C code runs at 100 MB/s, whereas by reference the Linux command line utility dd runs at 9 GB/s 3 GB/s (sorry for the imprecision, see comments -- I'm more interested in the big picture orders-of-magnitude though).



One would think this would be a solved problem by now ... i.e. any modern compiler would make it immediate to write such programs that perform reasonably well ...



C code



#include <stdio.h>

int main(int argc, char **argv) {
int len = 10000000;
for (int a = 1; a <= len; a++) {
printf ("%d\n", a);
}
return 0;
}


I'm compiling with clang -O3. A performance skeleton which calls putchar('\n') 8 times gets comparable performance.



Haskell code



A naiive Haskell implementation runs at 13 MiB/sec, compiling with ghc -O2 -optc-O3 -optc-ffast-math -fllvm -fforce-recomp -funbox-strict-fields. (I haven't recompiled my libraries with -fllvm, perhaps I need to do that.) Code:



import Control.Monad
main = forM [1..10000000 :: Int] $ \j -> putStrLn (show j)


My best stab with Haskell runs even slower, at 17 MiB/sec. The problem is I can't find a good way to convert Vector's into ByteString's (perhaps there's a solution using iteratees?).



import qualified Data.Vector.Unboxed as V
import Data.Vector.Unboxed (Vector, Unbox, (!))

writeVector :: (Unbox a, Show a) => Vector a -> IO ()
writeVector v = V.mapM_ (System.IO.putStrLn . show) v

main = writeVector (V.generate 10000000 id)


It seems that writing ByteString's is fast, as demonstrated by this code, writing an equivalent number of characters,



import Data.ByteString.Char8 as B
main = B.putStrLn (B.replicate 76000000 '\n')


This gets 1.3 GB/s, which isn't as fast as dd, but obviously much better.





No comments:

Post a Comment