camp update
Ian Lynagh
igloo at earth.li
Wed Nov 19 15:18:07 EST 2008
Hi all,
I haven't written about camp in some time, and a lot has happened, so I
figure I should send an e-mail. So, here's the first edition of the
"Camp Irregular News", if you will :-)
===== Mailing list
Camp now has a mailing list. I'll probably continue to send things of
more general interest to the darcs list, but camp-specific stuff will
generally go to the camp list. For details, please see
http://projects.haskell.org/camp/contact
===== Bug tracker
But the main reason that camp has acquired a mailing list is that camp
also now has a bug tracker:
http://trac.haskell.org/camp/
and I wanted somewhere for the ticket change messages to go. Fow now,
this is really just a TODO list, with the major missing pieces listed.
===== Development
And some real work, too. At and around the sprint, I:
* Implemented "chunky" hunks, which mean that we don't need to break
a file up into lines and then join it back together again when
applying hunk patches
* Implemented primitive interactive patch selection. It's nothing fancy,
but it makes it easier to work with than the all-or-nothing record
that camp had before
* General improvement, e.g. there is now a repository type, rather than
just misusing FilePath
* Worked out how to pkg-config, libcurl and Cabal to play nicely on
Windows/MSYS/mingw
* Made a darcs2camp tool
* Implemented the "get" command
===== darcs2camp
darcs2camp is currently fiddly to build, as it needs to be linked
against some of darcs's sources. In the near future it will either use
libdarcs, or I'll fork a copy of darcs and wibble it until it just
builds darcs2camp.
Due to working on each primitive patch separately, darcs2camp isn't the
fastest beast in the world; on the 19766 megapatch (359470 primitive
patches) GHC repo it takes me 1 hour 47 mins to convert from darcs to
camp format. Then again, the original git conversion took 3 days, so it
could be worse! And it shows a patches-converted count to keep you
entertained.
The disk usage for darcs's patches directory is
* disk usage: 115M
* actual number of bytes: 49M
* actual number of bytes when uncompressed: 204M
Meanwhile, camp's patch file weighs in at 214M (which is both the
actual number of bytes and the disk usage, as it's all in one file).
There are a number of things going on here:
* camp currently doesn't store any meta-data, so it should be a little
more than 214M.
* currently, if we store the primitive patch "name-3" inside the patch
"name" then we store the string "name-3" even though we don't have to.
* We could easily compress individual patches. Presumably if we did this
with gzip then we'd get down to about 50M.
* With a little work we could compress clumps of patches. However,
gzipping the whole file only gets us down to 46M, so there is little
to be gained there. bzip2ing the whole file gets us down to 38M.
===== "get"ing repos
And that means we can do timings etc for large repos easily.
Some timings for get and the ghc repo:
* With darcs 1.0.9rc1, get takes around 5.5 seconds. However, I believe
it's copying the pristine directory rather than actually applying the
patches, which isn't safe if you can't lock the repo. However,
"darcs check" takes 1 minute 45 seconds, and that does essentially the
same work that "get" is supposed to
* with darcs 2.1.0, get takes 1 minute 29 seconds (and looks like it's
behaving safely)
* with camp, get takes 1 minute 37 seconds
I haven't looked at optimising get with camp yet, but one thing that
should definitely make a big difference is batching up multiple changes
to a single file. It is common to get a megapatch which contains a
sequence of n patches which change a hunk the same file. When applying
such a megapatch, camp currently reads and writes the whole file n
times, which obviously isn't optimal! IIRC that made a significant
difference when we added it to darcs, and I expect it will for camp too.
camp is also cheating slightly, as it doesn't do a syntactic-validity
check of the patches it is given before applying them. This means that
it'll fail less prettily than it ought to. However, I'm not sure if
darcs also cheats, and I don't expect that it will make much difference
to the time taken anyway.
Camp's space usage while "get"ing is currently higher than it should be
because of
http://hackage.haskell.org/trac/ghc/ticket/2762
so I can't get good figures for that at the moment.
===== What next?
The above is mostly development stuff, mainly due to being at the
sprint. I plan to focus more on theory stuff next. As you may have seen
on the darcs list, I've started thinking about conflict marking, and I
also have some patch theory proofs in my head that I need to get written
down in the paper.
Thanks
Ian
More information about the Camp
mailing list