Ticket UUID: | 2a1e8e3c4b0b39e08fdde0d24d9fb35fbc66d39a | ||
Title: | content_get can recursive too deep | ||
Status: | Fixed | Type: | Code_Defect |
Severity: | Severe | Priority: | |
Subsystem: | Resolution: | Fixed | |
Last Modified: | 2010-10-05 03:29:45 | ||
Version Found In: | c492eab395 | ||
Description & Comments: | |||
I'm trying to convert an existing larger CVS repository into fossil.
The process bar from rebuild_db had reached the 100% and fossil was spending several minutes without giving any indication on what. After attaching gdb, I saw
#2556 0x000000000040d048 in content_get (rid=11398, pBlob=<value optimized out>) at content_.c:256 in the bt output. So it seems like it is creating a stack frame for ~every CVS revision of a file. That is screaming like it should be done differently by not depending on huge stacks. anonymous added on 2010-10-03 12:52:15: anonymous claiming to be Joerg Sonnenberger added on 2010-10-03 17:52:58: drh added on 2010-10-03 18:02:19: Content is stored as a sequence of deltas. So the extraction algorithm is inherently recursive. We could switch to using a loop and store the recursion information in memory obtained from the heap. But what does that really accomplish, other than making the code more obtuse. We've not had performance issues with rebuild before. This suggests that your repository has a different structure than what we have seen in the past. Can we clone a copy of your repository for further study, so that we can get a better grip on the source of your performance problem? drh added on 2010-10-03 18:32:30: drh added on 2010-10-03 19:16:34: There is an inefficiency in "fossil reconstruct", I think. And I think I have a simple fix. But I need to fix some inefficiencies in "fossil deconstruct" first, so that I can construct a large set of files from which to test "fossil reconstruct". I'll try to post a patch soon.... anonymous added on 2010-10-04 10:00:21: #14478 0x000000000040d6a0 in after_dephantomize (rid=25789, linkFlag=1) at content_.c:361 I can't share this repository, but the problematic part is a single file that has ~3000 commits, incrementally extending it. Think of a ChangeLog if you want. I think the best approach here is to avoid deep recursions by limiting the length of a delta chain. What do you think about storing the start and depth of the delta chain with each delta and introducing an on-disk content cache to cut it when it reaches a specific limit. Phantoms could be processed as forward process without recursion by using a small todo bag. |