Ticket Change Details
Not logged in
Overview

Artifact ID: a4c3ed78a947b4c531251a5c502e544df95c74f4
Ticket: 6c5471445173a17fffa6aaf867f2f7d3da8f8151
fossil clone is slow
User & Date: drh 2010-10-08 14:31:42
Changes

  1. Appended to comment:
    
    
    <hr /><i>drh added on 2010-10-08 14:31:42:</i><br />
    A very impressive repository - thanks for sharing it.
    
    pkgsrc.fossil is about 10x or 100x larger than any repository we have dealt
    with before.  SQLite is really the biggest repository that Fossil deals with
    on a regular basis.  SQLite has 10.3 years of history compared to 13.1 years
    of history with pkgsrc.  But there have been an average of just 2.2 checkins
    per day to SQLite for a total of 8446 checkins, whereas pkgsrc has had on
    average almost 40 checkins per day for a total of 190281 checkins.  
    Fossil likes to store the original content of the latest checkin as
    full-text and then store prior versions as deltas from the latest.  (Most
    other VCSes do the same - RCS in particular.)  In SQLite, the longest delta
    chain is therefore 8446 deep - which is deep but not unmanageable.  With
    pkgsrc, the longest change is a whopping 190281 deep - 22x deeper.  It
    makes me wonder if we shouldn't trick fossil into storing a full-text
    version of each file after some fixed number of deltas - for example
    store full text after each 500 or 1000 deltas.  That will make the
    respository larger, but it will also make a 100x performance improvement
    when trying to access files that are very deep in the delta stack.
    
    SQLite has 1057 separate files.  We host a few other repositories with
    more than this, but never more than a couple thousand.  pksrc, on the
    other hand has over 100,000 separate files.  Many of the file browsing
    links scan the entire list of files.  With only a thousand or so, this
    is not problem.  But with 100x as many, those pages are slow.  We might
    need to reconsider the way some of the file browsing pages are rendered.
    
    The huge number of files also makes for large manifests.  I looked at
    one of the more recent manifests in pkgsrc and it was 60936 lines long
    and contains nearly 5MB of text.  There are about 340MB of content in
    a checkout.  Fossil does lots of consistency checks for each check-in,
    which involves computing MD5 checksums over the entire checkout, twice.
    Running MD5 over 720MB takes some time.  A simple checkin on my 
    4-year-old desktop used 18 seconds of CPU time and 55 seconds of
    real-time.  (Note that a clone of the repository was running in parallel
    while doing the performance test of the previous sentence.)  Checkins
    would be very fast if we turned off the error checks.  Maybe that
    needs to be an option.
    
    My clone (via localhost) has made substantial progress in 33 minutes
    of CPU time.  But it seems to be stuck now.