Overview
| Artifact ID: | a4c3ed78a947b4c531251a5c502e544df95c74f4 |
|---|---|
| Ticket: | 6c5471445173a17fffa6aaf867f2f7d3da8f8151
fossil clone is slow |
| User & Date: | drh 2010-10-08 14:31:42 |
Changes
- Appended to comment:
<hr /><i>drh added on 2010-10-08 14:31:42:</i><br /> A very impressive repository - thanks for sharing it. pkgsrc.fossil is about 10x or 100x larger than any repository we have dealt with before. SQLite is really the biggest repository that Fossil deals with on a regular basis. SQLite has 10.3 years of history compared to 13.1 years of history with pkgsrc. But there have been an average of just 2.2 checkins per day to SQLite for a total of 8446 checkins, whereas pkgsrc has had on average almost 40 checkins per day for a total of 190281 checkins. Fossil likes to store the original content of the latest checkin as full-text and then store prior versions as deltas from the latest. (Most other VCSes do the same - RCS in particular.) In SQLite, the longest delta chain is therefore 8446 deep - which is deep but not unmanageable. With pkgsrc, the longest change is a whopping 190281 deep - 22x deeper. It makes me wonder if we shouldn't trick fossil into storing a full-text version of each file after some fixed number of deltas - for example store full text after each 500 or 1000 deltas. That will make the respository larger, but it will also make a 100x performance improvement when trying to access files that are very deep in the delta stack. SQLite has 1057 separate files. We host a few other repositories with more than this, but never more than a couple thousand. pksrc, on the other hand has over 100,000 separate files. Many of the file browsing links scan the entire list of files. With only a thousand or so, this is not problem. But with 100x as many, those pages are slow. We might need to reconsider the way some of the file browsing pages are rendered. The huge number of files also makes for large manifests. I looked at one of the more recent manifests in pkgsrc and it was 60936 lines long and contains nearly 5MB of text. There are about 340MB of content in a checkout. Fossil does lots of consistency checks for each check-in, which involves computing MD5 checksums over the entire checkout, twice. Running MD5 over 720MB takes some time. A simple checkin on my 4-year-old desktop used 18 seconds of CPU time and 55 seconds of real-time. (Note that a clone of the repository was running in parallel while doing the performance test of the previous sentence.) Checkins would be very fast if we turned off the error checks. Maybe that needs to be an option. My clone (via localhost) has made substantial progress in 33 minutes of CPU time. But it seems to be stuck now.