March 20, 2020
Xingyu Xie
Tsinghua University
After building maps between isomorphic subtrees, "recover" mappings between similar but not isomophic subtrees.
Here comes the question, how to find recovery mappings?
A simple edit script between two ordered labeled trees, consisting of three kinds of edit action:
Best recovery mapping <=> Best edit script
How to characterize best edit script?
Assume that every kind of action has a cost:
The edit script with minimum cost is recognized as the best edit script.
Finding the edit script with minimum cost between two ordered labeled trees (edit distance) is a well-studied question.
algorithm | time complexity | characteristics |
---|---|---|
ZS[3] | only when the tree is almost balanced | |
Demaine[4] | worst case optimal; usually run worst |
|
RTED[2] | Not worse than ZS and Demaine in any case |
Question: the minimum cost of editing one tree into another
Let's think from the perspective of dynamic programming!
Consider the roots of the trees...
The question becomes the minimum edit cost between forests, it looks so difficult...
Repermutate the nodes in (post) dfs ordering,
Forest => Interval
subtree => interval
root => the last element of interval
subtree without root => interval
Question: the minimum edit script from one interval into another.
Let's consider the root of the rightest subtree:
Three cases all could be handled appropriately.
Time complexity and space complexity are both
Space: in post-order, only the edit distance between subtrees and the current calculating subtree pairs needs to be memoized.
Time: considering a balanced tree, the sum of sizes of red substrees are , so the total time is reduced to
Question: which root of subtree to delete for recursion
ZS: the rightmost subtree
Demaine: the largest subtree
RTED: the optimal subtree
Actual implementation:
Possible reasons:
[1] J.-R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus. Fine-grained and accurate source code differencing. Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ser. ASE ’14. ACM, 2014.
[2] M. Pawlik and N. Augsten. RTED: a robust algorithm
for the tree edit distance. PVLDB, 5(4):334–345, 2011.
[3] K. Zhang and D. Shasha. Simple fast algorithms for
the editing distance between trees and related
problems. SIAM J. Comput., 18(6):1245–1262, 1989.