View Ticket
Not logged in
Ticket UUID: 2a1e8e3c4b0b39e08fdde0d24d9fb35fbc66d39a
Title: content_get can recursive too deep
Status: Fixed Type: Code_Defect
Severity: Severe Priority:
Subsystem: Resolution: Fixed
Last Modified: 2010-10-05 03:29:45
Version Found In: c492eab395
Description & Comments:
I'm trying to convert an existing larger CVS repository into fossil. The process bar from rebuild_db had reached the 100% and fossil was spending several minutes without giving any indication on what. After attaching gdb, I saw
#2556 0x000000000040d048 in content_get (rid=11398, pBlob=<value optimized out>) at content_.c:256

in the bt output. So it seems like it is creating a stack frame for ~every CVS revision of a file. That is screaming like it should be done differently by not depending on huge stacks.


anonymous added on 2010-10-03 12:52:15:
This issue results in stack overflows with the default 2MB limit while cloning a medium sized (40MB after vacuum) repository. It also seems to be responsible for the 6h of rebuild_db() after the equivalent of fossil reconstruct.


anonymous claiming to be Joerg Sonnenberger added on 2010-10-03 17:52:58:
Bumping the content cache from 50 entries to 2500 brings the rebuild time for this repository from 6h to 2min or so. It doesn't fix the deep stack nesting.


drh added on 2010-10-03 18:02:19:
I don't understand your issue with the stack size. Yes, content_get() is deeply recursive, but each stack frame is only 52 bytes. What workstation these days can't handle 100K or more recursions of a 52-byte stack frame?

Content is stored as a sequence of deltas. So the extraction algorithm is inherently recursive. We could switch to using a loop and store the recursion information in memory obtained from the heap. But what does that really accomplish, other than making the code more obtuse.

We've not had performance issues with rebuild before. This suggests that your repository has a different structure than what we have seen in the past. Can we clone a copy of your repository for further study, so that we can get a better grip on the source of your performance problem?


drh added on 2010-10-03 18:32:30:
Changing the cache size from 50 to 2500 makes no measurable difference in the rebuild time for the 10.35-year, 8400 check-in, 46MB repository used by SQLite. This is further evidence that your repository structure is different from what we are used to and that we will need to see your repository in order to figure out what is slowing things down for you.


drh added on 2010-10-03 19:16:34:
Perhaps I have misunderstood the nature of the problem reported above. Are you trying to run "fossil rebuild" or "fossil reconstruct"? I've been assuming "fossil rebuild", but now I'm thinking I misunderstood.

There is an inefficiency in "fossil reconstruct", I think. And I think I have a simple fix. But I need to fix some inefficiencies in "fossil deconstruct" first, so that I can construct a large set of files from which to test "fossil reconstruct". I'll try to post a patch soon....


anonymous added on 2010-10-04 10:00:21:
I'm seeing this as part of the rebuild step of reconstruct and clone. I haven't tried "fossil rebuild". With "fossil clone", the stack nesting is deep enough to hit the stack limit. Backtrace starts with a few thousand content_get and ends with a few thousand of:

#14478 0x000000000040d6a0 in after_dephantomize (rid=25789, linkFlag=1) at content_.c:361

I can't share this repository, but the problematic part is a single file that has ~3000 commits, incrementally extending it. Think of a ChangeLog if you want.

I think the best approach here is to avoid deep recursions by limiting the length of a delta chain. What do you think about storing the start and depth of the delta chain with each delta and introducing an on-disk content cache to cut it when it reaches a specific limit. Phantoms could be processed as forward process without recursion by using a small todo bag.