pip install deepdiff
github.com/seperman/deepdiff
Sep Dehpour
zepworks.com
sep at zepworks.com
github.com/seperman
Pycon - June 1st 2016
[
{
"_id": "574ddc8215220308d959b480",
"index": 0,
"guid": "b750dcce-f8ee-45e8-bbb6-32371ce13dd2",
"isActive": true,
"balance": "$1,135.27",
"picture": "http://placehold.it/32x32",
"thatguy": {
"age": 25,
"eyeColor": "brown",
"name": "Figueroa Kemp",
"gender": "male",
"friends": [
{
"id": 0,
"name": "Mendez Foley",
"thatguy": {
"age": 23,
"gender": "male",
"friends": [
{
"thatguy": {
"age": 32,
"eyeColor": "blue",
"name": "Albert Vaughn",
"gender": "male",
"friends": [
{
"id": 0,
"name": "Harry"
},
{
"id": 1,
"name": "Joe"
}
]
},
"name": "Robinson Fischer"
}
]
}
},
{
"id": 1,
"name": "Alyce Simpson",
"thatguy": {
"age": 40,
"gender": "female",
"friends": [
{
"thatguy": {
"age": 21,
"eyeColor": "blue",
"name": "Hobbs Galloway",
"gender": "male",
"friends": [
{
"id": 0,
"name": "Miranda Hartman"
},
{
"id": 1,
"name": "Rowland Peck"
}
]
},
"name": "Dena Mccall"
}
]
}
}
]
},
"company": "CALLFLEX",
"email": "denamccall@callflex.com",
"phone": "+1 (813) 507-3202",
"address": "821 Wolf Place, Bagtown, Nebraska, 2245",
"about": "In ex velit voluptate aute velit",
"registered": "2015-03-22T02:45:12 +07:00",
"latitude": 55.78885,
"longitude": 37.013185,
},
][
{
"_id": "574ddc8215220308d959b480",
"index": 0,
"guid": "b750dcce-f8ee-45e8-bbb6-32371ce13dd2",
"isActive": true,
"balance": "$1,135.27",
"picture": "http://placehold.it/32x32",
"thatguy": {
"age": 25,
"eyeColor": "brown",
"name": "Figueroa Kemp",
"gender": "male",
"friends": [
{
"id": 0,
"name": "Mendez Foley",
"thatguy": {
"age": 23,
"gender": "male",
"friends": [
{
"thatguy": {
"age": 32,
"eyeColor": "blue",
"name": "Albert Vaughn",
"gender": "male",
"friends": [
{
"id": 0,
"name": "Harry"
},
{
"id": 1,
"name": "John"
}
]
},
"name": "Robinson Fischer"
}
]
}
},
{
"id": 1,
"name": "Alyce Simpson",
"thatguy": {
"age": 40,
"gender": "female",
"friends": [
{
"thatguy": {
"age": 21,
"eyeColor": "blue",
"name": "Hobbs Galloway",
"gender": "male",
"friends": [
{
"id": 0,
"name": "Miranda Hartman"
},
{
"id": 1,
"name": "Rowland Peck"
}
]
},
"name": "Dena Mccall"
}
]
}
}
]
},
"company": "CALLFLEX",
"email": "denamccall@callflex.com",
"phone": "+1 (813) 507-3202",
"address": "821 Wolf Place, Bagtown, Nebraska, 2245",
"about": "In ex velit voluptate aute velit",
"registered": "2015-03-22T02:45:12 +07:00",
"latitude": 55.78885,
"longitude": 37.013185,
},
]{ 'values_changed': { "root[0]['thatguy']['friends'][0]['thatguy']['friends'][0]['thatguy']['friends'][1]['name']": {
'oldvalue': ' '
'newvalue': 'John',
}}}
Joe
Objectives
{ 'values_changed': { "root[0]['thatguy']['friends'][0]['thatguy']['friends'][0]['thatguy']['friends'][1]['name']": {
'oldvalue': ' '
'newvalue': 'John',
}}}
Joe
[0, 1, 2] vs. [2, 0, 1]1. Text Sequences
2. Numerics
3. Sets
5. Mappings
6. Other Iterables (List, Generator, Deque, Tuple, Custom Iterables)
7. User Defined Objects
>>> import difflib
>>> t1="""
... Hello World!
... """.splitlines()
>>> t2="""
... Hello World!
... It is ice-cream time.
... """.splitlines()
>>> g = difflib.unified_diff(t1, t2, lineterm='')
>>> print('\n'.join(list(g)))
---
+++
@@ -1,2 +1,3 @@
Hello World!
+It is ice-cream time.1. Text Sequences
2. Numerics
3. Sets
5. Mappings
6. Other Iterables (List, Generator, Deque, Tuple, Custom Iterables)
7. User Defined Objects
>>> t1 = {1,2,3}
>>> t2 = {3,4,5}
>>> items_added = t2 - t1
>>> items_removed = t1 - t2
>>> items_added
set([4, 5])
>>> items_removed
set([1, 2])1. Text Sequences
2. Numerics
3. Sets
5. Mappings
6. Other Iterables (List, Generator, Deque, Tuple, Custom Iterables)
7. User Defined Objects
{
'common1': {
...
},
'common2': {
...
},
}Dict, OrderedDict, Defaultdict
{
'common1': {
...
},
'common2': {
...
},
'added':{
...
}
}t1_keys= set(t1.keys())
t2_keys= set(t2.keys())
same_keys = t2_keys.intersection(t1_keys)
added = t2_keys - same_keys
removed = t1_keys - same_keysDict, OrderedDict, Defaultdict
And then recursively check same_keys values
1. Text Sequences
2. Numerics
3. Sets
5. Mappings
6. Other Iterables (List, Generator, Deque, Tuple, Custom Iterables)
7. User Defined Objects
>>> t1 = [1, 2, 3]
>>> t2 = [1, 2, 5]Consider Order
>>> t1 = [1, 2, 3]
>>> t2 = [1, 2, 5, 6]Consider Order
>>> t1 = [1, 2, 3]
>>> t2 = [1, 2, 5, 6]
>>>
>>> class NotFound(object):
... "Fill value for zip_longest"
... def __repr__(self):
... return "NotFound"
...
>>> notfound = NotFound()
>>>
>>> list(zip_longest(t1, t2, fillvalue=notfound))
[(1, 1), (2, 2), (3, 5), (NotFound, 6)]Consider Order
>>> for (x, y) in zip_longest(t1, t2, fillvalue=NotFound):
... if x != y:
... if y is NotFound:
... removed.append(x)
... elif x is NotFound:
... added.append(y)
... else:
... modified.append("{} -> {}".format(x, y))
...
>>> print removed
[]
>>> print added
[6]
>>> print modified
['3 -> 5']
Consider Order
Ignore Order
>>> t1=[1,2]
>>> t2=[1,3,4]
>>> t1set=set(t1)
>>> t2set=set(t2)
>>> t1set-t2set
{2}
>>> t2set-t1set
{3, 4}Diff Iterable > Ignore order > convert to set
>>> t1=[1, 2, {3:3}]
>>> t2=[1]
>>> t1set = set(t1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'but ...
Diff Iterable > Ignore order > convert to set
A set object is an unordered collection of distinct hashable objects.
Diff Iterable > Ignore order > convert to set
Hashable vs. Unhashable
Mutable vs. Immutable
Diff Iterable > Ignore order > convert to set
Mutable vs. Immutable
>>> a=[1,2]
>>> id(a)
400304246
>>> a.append(3)
>>> id(a)
400304246
>>> b=(1,2)
>>> id(b)
399960722
>>> b += (3,)
>>> id(b)
400670561Diff Iterable > Ignore order > convert to set
Hashable
Diff Iterable > Ignore order > convert to set
Unhashable vs. Mutable
Diff Iterable > Ignore order > convert to set
Hashable that is Mutable
>>> class A:
... aa=1
...
>>> hash(A)
2857987
>>> A.aa=2
>>> hash(A)
2857987Diff Iterable > Ignore order > convert to set
Diff Iterable > Ignore order > convert to set
list conversion to set fails when any item is unhashable.
Now what?
>>> t1=[{1:1}, {3:3}, {4:4}]
>>> t2=[{3:3}, {1:1}, {4:4}, {5:5}]Diff Iterable > Ignore order > sort
>>> t1=[{1:1}, {3:3}, {4:4}]
>>> t2=[{1:1}, {3:3}, {4:4}, {5:5}]>>> t1=[{1:1}, {3:3}, {4:4}]
>>> t2=[{3:3}, {1:1}, {4:4}]
>>> t1.sort()
>>> t1
[{1: 1}, {3: 3}, {4: 4}]
>>> t2.sort()
>>> t2
[{1: 1}, {3: 3}, {4: 4}]
>>> [(a, b) for a, b in zip(t1,t2) if a != b]
[]Py2
Diff Iterable > Ignore order > sort
>>> t1=[{1:1}, {3:3}, {4:4}]
>>> t2=[{3:3}, {1:1}, {4:4}]
>>> t1.sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: dict() < dict()Py3
Diff Iterable > Ignore order > sort
Sort key
Diff Iterable > Ignore order > sort
>>> students = [
('john', 'A', 15),
('jane', 'B', 12),
('dave', 'B', 10),
]
>>> sorted(students, key=lambda s: s[2])
[('dave', 'B', 10),
('jane', 'B', 12),
('john', 'A', 15)]Diff Iterable > Ignore order > sort
What to use for sort key to order list of dictionaries?
Diff Iterable > Ignore order > sort
Sort key: hash of dictionary contents
>>> from json import dumps
>>> t1=[{1:1}, {3:3}, {4:4}]
>>> t2=[{3:3}, {1:1}, {4:4}]
>>> t1.sort(key=lambda x: hash(dumps(x)))
>>> t2.sort(key=lambda x: hash(dumps(x)))
>>> t1
[{1: 1}, {3: 3}, {4: 4}]
>>> t2
[{1: 1}, {3: 3}, {4: 4}]
>>> [(a, b) for a, b in zip(t1,t2) if a != b]
[]Py2 & 3
Diff Iterable > Ignore order > sort
Iterables with different length
Diff Iterable > Ignore order > sort
iterables with different lengths
>>> import json
>>>
>>> t1=[10, {1:1}, {3:3}, {4:4}]
>>> t1.sort(key=lambda x: hash(json.dumps(x)))
>>>
>>> t2=[{3:3}, {1:1}, {4:4}]
>>> t2.sort(key=lambda x: hash(json.dumps(x)))
>>> t1
[{1: 1}, {3: 3}, {4: 4}, 10]
>>> t2
[{1: 1}, {3: 3}, {4: 4}]Diff Iterable > Ignore order > sort
iterables with different lengths
>>> t1=[10, "a", {1:1}, {3:3}, {4:4}]
>>> t1.sort(key=lambda x: hash(dumps(x)))
>>> t1
['a', {1: 1}, {3: 3}, {4: 4}, 10]
>>> t2
[{1: 1}, {3: 3}, {4: 4}]
...
['a -> {1: 1}', '{1: 1} -> {3: 3}',
'{3: 3} -> {4: 4}']Diff Iterable > Ignore order > sort
Put items in a dictionary of
{item_hash: item}
>>> t1 = [10, "a", {1:1}, {3:3}, {4:4}]
>>> t2 = [{3:3}, {1:1}, {4:4}, "b"]
>>> def create_hashtable(t):
... hashes = {}
... for item in t:
... try:
... item_hash = hash(item)
... except TypeError:
... try:
... item_hash = hash(json.dumps(item))
... except:
... pass # For presentation purposes
... else:
... hashes[item_hash] = item
... else:
... hashes[item_hash] = item
... return hashesDiff Iterable > Ignore order > hashtable
>>> t1 = [10, "a", {1:1}, {3:3}, {4:4}]
>>> t2 = [{3:3}, {1:1}, {4:4}, "b"]
>>> h1 = create_hashtable(t1)
>>> h2 = create_hashtable(t2)
>>>
>>> items_added = [h2[i] for i in h2 if i not in h1]
>>> items_removed = [h1[i] for i in h1 if i not in h2]
>>>
>>> items_added
['b']
>>> items_removed
['a', 10]Diff Iterable > Ignore order > hashtable
Diff Iterable > Ignore order > hashtable
What if the object is not json serializable?
What if json serializable version of 2 different objects are the same?
Diff Iterable > Ignore order > hashtable
Diff Iterable > Ignore order > hashtable
>>> from pickle import dumps
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10},
'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(t)
"((dp0\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\
nI10\nsS'Hello World'\np1\n(I1\nI2\nI3\nI4\nI5\
ntp2\n(lp3\nI1\naI2\naI3\naI4\naI5\natp4\n."
>>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10},
'Hello World', (1, 2, 3, 4, 5),
[1, 2, 3, 4, 5]))
"((dp0\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\
nI10\nsS'Hello World'\np1\n(I1\nI2\nI3\nI4\nI5
\ntp2\n(lp3\nI1\naI2\naI3\naI4\naI5\natp4\n."Diff Iterable > Ignore order > hashtable
>>> from cPickle import dumps
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10},
'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(t)
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\n
I10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\n
tp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
>>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10},
'Hello World', (1, 2, 3, 4, 5),
[1, 2, 3, 4, 5]))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\n
I10\nsS'Hello World'\n(I1\nI2\nI3\nI4\nI5\nt(lp2
\nI1\naI2\naI3\naI4\naI5\natp3\n."What about cPIckle? It is faster than Pickle!
Diff Iterable > Ignore order > hashtable
cPickle includes if the object is referenced in the
serialization!
Diff Iterable > Ignore order > hashtable
What did we learn from diffing iterables?
- Difference of unhashable and mutable
- Sets can only contain hashable
- Create hash for dictionary
- Custom sorting with a key function
- Converting a squence into hashtable
- Pickling
1. Text Sequences
2. Numerics
3. Sets
5. Mappings
6. Other Iterables (List, Generator, Deque, Tuple, Custom Iterables)
7. User Defined Objects
__dict__
>>> class CL:
... attr1 = 0
... def __init__(self, thing):
... self.thing = thing
>>> obj1 = CL(1)
>>> obj2 = CL(2)
>>> obj2.attr1 = 10
>>> obj1.__dict__
{'thing': 1} # Notice that att1 is not here
>>> obj2.__dict__
{'attr1': 10, 'thing': 2}__slots__
>>> class ClassA(object):
... __slots__ = ['x', 'y']
... def __init__(self, x, y):
... self.x = x
... self.y = y
...
>>> t1 = ClassA(1, 1)
>>> t2 = ClassA(1, 2)
>>>
>>> t1.new = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'ClassA' object has no attribute 'new'>>> t1 = {i: getattr(t1, i) for i in t1.__slots__}
>>> t2 = {i: getattr(t2, i) for i in t2.__slots__}
>>> t1
{'x': 1, 'y': 1}
>>> t2
{'x': 1, 'y': 2}>>> a=[1,2]
>>> a.append(a)
>>> a
[1, 2, [...]]>>> b={1:1, 2:2}
>>> b[3]=b
>>> b
{1: 1, 2: 2, 3: {...}}>>> class LoopTest(object):
... def __init__(self, a):
... self.loop = self
... self.a = a
...
>>> t1 = LoopTest(1)
>>> t2 = LoopTest(2)
>>> t1
<__main__.LoopTest object at 0x02B9A910>
>>> t1.__dict__
{'a': 1, 'loop': <__main__.LoopTest object at 0x02B9A910>}Loops
Detect Loop with ID
A --> B --> C --> A
11 --> 23 --> 2 --> 11
Detect Loop with ID
def diff_common_children_of_dictionary(t1, t2,
t_keys_intersect, parents_ids):
for item_key in t_keys_intersect:
t1_child = t1[item_key]
t2_child = t2[item_key]
item_id = id(t1_child)
if parents_ids and item_id in parents_ids:
print ("Warning, a loop is detected.")
continue
parents_added = set(parents_ids)
parents_added.add(item_id)
parents_added = frozenset(parents_added)
diff(t1_child, t2_child, parents_ids=parents_added)What did we learn about diffing custom objects
Zepworks.com
sep at zepworks.com
https://github.com/seperman/deepdiff
http://zepworks.com/blog/diff-it-to-digg-it
pip install deepdiff