Skip to content

Allow subtypes of RevCommit and RevWalk to use information stored in the commit graph #250

@flori-schwa

Description

@flori-schwa

Description

It should be possible for various sub-types of RevCommit and RevWalk to access and use the information stored in the commit graph. This is mostly relevant for EGit's SWTWalk and SWTCommit. These types cannot currently use the information (see additional context for why) from the commit graph, meaning that for example:

  • An incremental implementation of the topological sorting akin to git cannot use commit graph generation numbers to produce commits incrementally. The entire history must be processed first
  • The Changed Path Bloom filter cannot be used and a TreeWalk is performed for every commit in TreeRevFilter.include()

both leading to a bad performance of EGit's History View

Motivation

Dicussion #243 describes the performance problems we are currently facing in regards to EGits History View when viewing the changes of the currently selected resource. I am currently working on improving the performance of JGit's topological sorting by allowing an incremental implementation similar to git itself (here).

During Development I noticed, that even though I generated a commit graph (git commit-graph write --changed-paths) and enabled commit graphs in the git config (core.commitGraph), my implementation of git's topological sorting algorithm (which uses generation numbers obtained from the commit graph) only ever saw Constants.COMMIT_GENERATION_UNKNOWN for getGeneration() (since the data from the commit graph is never parsed for SWTCommits).
Additionally the Changed Path Bloom filter was never used and a Tree Walk was performed for every commit.

Alternatives considered

One possibility is replacing the default implementation in RevCommit with concrete lookups in the commit graph, this has the following downsides:

  • A new CommitData object would be instantiated for every invocation of getGeneration() and getChangedPathFilter()
  • Even if commit graph data is available, any RevCommit that is not a RevCommitCG would still be parsed from the commit object instead of the commit graph (See RevCommit.parseCanonical() and RevCommitCG.parseInGraph())

My workaround for the incremental topological sorting algorithm is to query the RevWalk.commitGraph() if RevCommit.getGeneration() does not return a usable value: https://github.com/vi-eclipse/jgit/blob/06b19880b694b20a7e76f388345a80fc64243359/org.eclipse.jgit/src/org/eclipse/jgit/revwalk/TopoSortPendingGenerator.java#L287-L305
This approach is not possible for ChangedPathTreeFilter.shouldTreeWalk(), since it cannot access the package-private RevWalk.commitGraph()

Additional context

RevCommit currently offers two APIs that provide information stored in the Commit Graph:

  • getGeneration() for determining the Generation Number of a commit
  • getChangedPathFilter() for retrieving a bloom filter for the modified paths

RevCommit itself always returns Constants.COMMIT_GENERATION_UNKNOWN for getGeneration() and null for getChangedPathFilter(), these Methods are only overridden in RevCommitCG. RevCommitCG instances are created by a RevWalk when a position in the commit graph can be found for the desired commit.

private RevCommit createCommit(AnyObjectId id, int graphPos) {
if (graphPos >= 0) {
return new RevCommitCG(id, graphPos);
}
return new RevCommit(id);
}

However, this only applies when strictly using RevWalk, when using a subtype of RevWalk (i.e. GitHistoryWalk), none of the information present in the commit graph is used, even though it could be used to optimize both topological sorting and filtering by Changed paths.
Subtypes of RevCommit, which are located in a different package (such as PlotCommit), cannot currently override and implement getGeneration() and getChangedPathFilter() because:

  • RevCommit.getGeneration() has package-private visibility
  • Generation Numbers (and other information from the CommitData) for RevCommitCG are parsed by overriding the RevCommit.parseCanonical() method, which also has package-private visibility
  • Even if subtypes of RevCommit override and implement getChangedPathFilter(), they cannot retrieve the Commit Graph via the RevWalk.commitGraph() method, since it has package-private visibility

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions