AutoDocish: Automated-ish Dataset Documentation

Elizabeth Wickes    @elliewix

University of Illinois at Urbana Champaign

My hats

  • Data Curator -- UIUC
  • Co-organizer -- Py-CU
  • Python instructor

Data Curation Specialist in the Research Data Service, located in the University Library of the University of Illinois at Urbana-Champaign.  Also an adjunct instructor with the School of Information Sciences (iSchool) at Illinois.

(Now aren't you glad I didn't say)

The Problem:

  • Documentation is usually left to a researcher.
  • Who has higher priority research to work on.
  • Many think that documentation needs to be extensive.
  • So documentation isn't always done.

What does great documentation look like?

What does can great documentation look like?

This isn't about
open science

  • Well documented and preserved datasets and code are first useful to you and your team.
  • You should be able to reuse your own data as part of future work.
  • Documentation is almost always necessary for reuse -- even for Future You.
  • Enough information,
  • about the project, methods, and materials
  • such that the information is maintainable over time,
  • in an accessible format,
  • and valuable for those who need it.

Mimimum viable documentation

tl;dr

  • Something is better than nothing
  • It doesn't need to be a dissertation
  • Seriously, just write something
  • Seriously.

What are the basic pieces of documentation?

Codebook

  • "What do the data values mean?"
  • Describes what, if any, coded values mean within the data file.

 

Data dictionary

  • "What does this data file contain?"
  • Describes the individual questions or measurements contained within a data file.

readme.txt

  • May contain a mix of both, and other important contextual info.

Sometimes it can do all the things

ICPSR

Variable name: Human name

Descriptive information

Descriptive statistics and codes

http://doi.org/10.3886/ICPSR36498.v1

Sometimes it does a little less, but you can still get along

 http://doi.org/10.3886/E17507V2

Sometimes they are machine readable

Column name Corresponding survey question Codes
Q1 Please select the option that best describes your status. 1="I am a US citizen or permanent resident"; 2= "I am an international student"
Q2 Please select your age 1="<18"; 2="18-25"; 3="26-30"; 4="31-35"; 5="36-40"; 6="41-45"; 7="45+"
Q3 What is your gender? 1="Male"; 2="Female", 3="Other", 4="I do not wish to respond"

 http://doi.org/10.3886/E43668V1

Or sometimes there's nothing...

http://www.doi.org/¯\_(ツ)_/¯

What are the core elements?

  • A little bit of data profiling
  • A little bit of data cleaning
  • A little bit of human narrative

Perfect > Something > Nothing

Easy to improve something...

Hard to make something from nothing...

  • Automate what you can
  • Focus on the elements only you can answer
  • Move on with your life

AutoDocish

Automate what you can

  • Data dictionaries report the headers
  • Codebooks report unique values
  • Descriptive statistics are...statistics

 

These are things a computer can do.

What's left for the human?

  • Context
  • Methods
  • Pesky human sentences

Let's take a look

So you've got a CSV...

$ python data_profile.py source output missing_code

1

2

3

1. The data source, either folder or file

2. Folder name for the profile files to be written

3. Missing code for the data, presumes empty string if not provided.

Outputs

  • Inside the output folder:
    • One profile per data file
    • A single JSON file with all profile data

out/

json

.md

$ python data_profile.py vagrants.csv vagrants/ [missing]
Generating profile for: vagrants.csv
vagrants/ already exists. Will OVERWRITE.
Profiles written into vagrants/

Example data from:

Crymble, Adam et al.. (2015). Vagrant Lives: 14,789 Vagrants Processed by Middlesex County, 1777-1786 (version 1.1). Zenodo. 10.5281/zenodo.31026

Single file usage

$ python data_profile.py fakedata fakes ''
Generating profiles for 1000 files
fakes/ created
Profiles written into fakes/

Folder usage

File level information

File level information

yfuncu,lhnpgel,dejgqsnl,ttzsbzzztt,fswewrmgbl,lqwtwpo,wnlat,jfgmzi,tqpvwxqsvk,kozisucyqc
862,186,435,435,27,535,581,200,699,507
200,133,8,934,864,319,177,382,151,477
476,193,411,559,890,385,749,483,343,452
298,853,590,375,669,603,885,340,262,909
872,514,398,870,718,180,730,872,219,559
76,621,104,380,139,611,549,825,902,595
544,511,684,990,443,730,185,440,21,360
778,91,389,852,273,811,676,793,302,842
416,912,359,393,376,948,451,944,526,430
238,363,210,119,922,972,491,37,876,907
731,801,91,55,810,799,315,719,163,88
632,200,71,33,942,588,407,250,889,517
582,765,7,356,548,586,859,831,409,967
584,535,897,468,531,618,888,280,945,959
657,745,223,355,690,345,412,872,336,35
601,175,656,551,816,816,94,660,546,145
488,268,593,878,247,127,306,950,452,202
383,201,413,717,147,864,354,134,678,719
507,501,476,927,726,942,462,798,368,127
114,890,700,369,19,8,861,322,377,866
143,967,180,268,307,285,882,456,914,963
795,742,665,952,297,46,268,578,495,909
34,581,735,560,880,70,714,932,191,253
642,330,471,807,127,314,515,135,160,733
548,353,438,897,717,371,760,577,818,325
1,878,199,173,855,249,107,320,867,923
711,997,589,610,61,537,359,903,261,881
242,329,318,625,816,60,294,433,349,817
818,649,956,433,20,336,543,88,365,315
271,3,187,638,872,571,704,336,58,355
692,707,873,965,628,911,701,624,810,621
696,374,114,863,852,925,731,131,517,349
318,925,263,400,660,117,241,815,424,928
870,239,687,906,587,738,38,902,707,227
798,7,525,409,630,637,655,793,908,707
696,765,977,889,329,483,971,567,526,165
692,960,116,990,399,125,993,902,692,809
762,365,211,76,187,671,772,677,173,871
575,552,711,690,303,909,14,361,501,633
345,727,248,923,816,152,694,37,301,931
736,371,416,731,255,154,216,219,863,292
112,121,52,859,535,773,292,654,160,697
601,533,204,396,707,595,615,85,973,487
153,742,584,571,0,102,329,140,259,457
661,42,709,881,870,359,164,910,733,609
862,399,669,363,201,206,913,133,597,385
405,981,296,616,75,916,110,646,772,294
594,255,628,968,346,153,876,532,243,876
902,464,334,601,607,266,79,212,507,400
686,980,822,174,887,885,907,561,493,789
74,493,565,403,500,579,36,455,236,919
850,673,257,930,39,906,262,721,53,312
397,400,589,750,567,746,217,8,866,834
96,495,719,551,701,145,554,654,850,416
801,646,777,351,324,867,23,105,603,838
812,966,258,66,647,511,132,605,501,456
494,929,546,11,498,283,778,554,218,615
457,905,807,738,105,842,417,945,564,512
492,396,508,663,957,533,366,788,863,965
528,648,170,747,310,51,384,862,106,608
582,203,365,861,746,724,451,945,344,167
638,732,103,755,420,107,291,871,112,18
378,611,31,489,595,165,628,827,652,729
418,42,311,435,554,337,142,110,629,40
264,529,806,660,83,450,385,365,868,899
877,250,263,844,928,424,332,17,955,268
61,581,338,92,864,298,809,871,608,558
869,462,505,78,599,326,61,838,472,316
772,180,159,630,749,444,798,634,416,959
927,371,355,819,110,991,0,578,635,413
626,885,543,260,46,293,725,779,182,727
343,33,931,871,374,764,384,907,814,950
236,952,473,596,990,796,886,853,528,556
173,950,696,990,153,809,9,442,70,29
761,358,75,618,530,939,845,382,582,662
996,285,394,283,994,475,727,82,229,512
381,621,164,898,518,546,402,147,815,316
557,566,713,812,451,104,159,303,100,160
622,21,29,976,711,585,310,153,947,883
270,56,249,63,568,229,620,797,577,840
22,542,448,579,810,987,572,743,912,285
420,643,978,438,562,24,68,220,759,135
844,534,493,88,312,632,47,23,275,126
272,47,70,568,738,82,765,52,709,271
101,594,371,21,667,318,458,763,963,325
162,335,783,607,825,225,504,53,966,248
874,327,765,144,724,489,219,264,959,196
749,715,938,992,80,126,372,841,804,779
168,241,528,787,877,843,538,981,972,949
730,387,175,382,621,707,282,651,801,325
604,876,139,115,612,757,970,7,51,119
333,275,186,386,875,232,773,489,898,530
882,26,129,911,185,29,172,22,181,231
816,997,598,785,914,95,80,934,670,662
848,789,384,509,828,867,639,730,744,538
41,917,521,191,492,141,207,822,603,162
931,202,345,103,463,17,849,14,26,769
802,356,450,506,131,208,377,416,312,988
207,269,527,921,613,478,397,49,839,433
157,748,700,92,767,296,277,507,801,647
95,906,103,147,299,247,926,3,41,942
188,98,582,395,347,663,428,342,713,553
285,247,290,727,530,758,48,658,446,165
398,807,877,387,3,150,674,63,629,636
939,461,559,695,66,67,652,918,108,393
306,454,327,291,675,99,777,416,89,614
553,200,507,98,302,324,434,65,849,130
901,564,654,266,207,830,833,354,735,80

Given a numerical CSV file

File level information

Data Profile for fakedata/0.csv

Generated on: 2016-Aug-11 20:46:47


Number of columns: 10
Number of rows: 108
Using missing value of: (empty string)

Column info: numerical

**yfuncu**
--------
* Description of column:
* Collection methods:
* Description of data values and units:
* Reason for missing values: 

* percent_digit: 100%
* percent_missing: 0%
* min_digit: 1.0
* missing: 0
* unique_value_content: Not reported (More than 10 unique values)
* unique_values: 103 (this includes missing values)
* max_digit: 996.0

Column name

]

Questions to fill out

Descriptive statistics

File level information

Vagrant ID Number,Given Names,Surname,Gender of Lead Vagrant,Relationship to Lead Vagrant,Number of People in Group,Person Type,Vagrant Category,Session Start Day,Session Start Month,Session Start Year,Session End Day,Session End Month,Session End Year,Session # (out of 8 annually),URL of Primary Source,Magistrate Name,Taken From,Conveyed To,Georeference (Taken From),Georeference (Conveyed To),Settlement (Micro Level),Settlement (Area Level),Settlement County,Settlement Country,Settlement Georeference (Micro Level),Settlement Georeference (Area Level),Settlement Georeference (County),Settlement Georeference (Country)
6625.1.1,Mitchell,Bruce,M,[lead vagrant],1,Solo Male,City Vagrant,12,9,1784,14,10,1784,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50787/LMSMPS507870004.jpg,John Hart,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Pomona,[n/a],Zetland,Scotland,59;-3.25,[n/a],60.33333;-1.33333,56;-4
6720.1.1,John,Drivee,M,[lead vagrant],1,Solo Male,City Vagrant,12,9,1784,14,10,1784,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50787/LMSMPS507870007.jpg,John Hart,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Pomona,[n/a],Zetland,Scotland,59;-3.25,[n/a],60.33333;-1.33333,56;-4
8352.1.1,Peter,Smith,M,[lead vagrant],1,Solo Male,City Vagrant,5,5,1785,24,6,1785,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50797/LMSMPS507970077.jpg,P LeMesurier,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Orkney,Scotland,[unknown],[n/a],59;-3,56;-4
5750.1.1,Thomas,Herry,M,[lead vagrant],1,Solo Male,City Vagrant,19,2,1784,15,4,1784,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50781/LMSMPS507810081.jpg,Richard Clark,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Isle of Sanday,[n/a],Orkney,Scotland,59.2581;-2.5683,[n/a],59;-3,56;-4
5265.1.1,James,Guttery,M,[lead vagrant],1,Solo Male,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780024.jpg,Nathaniel Newnham,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Orkney,Scotland,[unknown],[n/a],59;-3,56;-4
5851.1.1,Robert,Ogilby,M,[lead vagrant],1,Solo Male,City Vagrant,19,2,1784,15,4,1784,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50781/LMSMPS507810084.jpg,John Hart,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Bow Brig,[n/a],Orkney,Scotland,[unknown],[n/a],59;-3,56;-4
460.1.1,Laurence,Least,M,[lead vagrant],1,Solo Male,Middlesex Vagrant,17,9,1778,22,10,1778,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50684/LMSMPS506840156.jpg,John Staples,Clerkenwell,Cheshunt,51.524444;-0.107222,51.699888;-0.028486,Wisedale,[n/a],Orkney,Scotland,[unknown],[n/a],59;-3,56;-4
1964.1.1,Betty,Bruce,F,[lead vagrant],1,Single Female,City Vagrant,11,1,1781,22,2,1781,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50737/LMSMPS507370008.jpg,Evan Pugh,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Caithness,Scotland,[unknown],[n/a],58.416667;-3.5,56;-4
8929.1.1,John,McFarland,M,[lead vagrant],1,Solo Male,Middlesex Vagrant,23,6,1785,8,9,1785,5,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50800/LMSMPS508000177.jpg,William Hyde,Clerkenwell,Cheshunt,51.524444;-0.107222,51.699888;-0.028486,[unknown],[n/a],Caithness,Scotland,[unknown],[n/a],58.416667;-3.5,56;-4
1914.2.2,[Child],Scarlet,[unknown],[Child],2,Dependent,City Vagrant,7,12,1780,11,1,1781,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50736/LMSMPS507360008.jpg,Watkins Lewes,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Caithness,Scotland,[unknown],[n/a],58.416667;-3.5,56;-4
1914.1.2,Isabella,Scarlet,F,[lead vagrant],2,Group Leader,City Vagrant,7,12,1780,11,1,1781,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50736/LMSMPS507360008.jpg,Watkins Lewes,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Caithness,Scotland,[unknown],[n/a],58.416667;-3.5,56;-4
9315.1.1,Christiana,Gray,F,[lead vagrant],1,Single Female,Westminster Vagrant,8,9,1785,13,10,1785,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50801/LMSMPS508010130.jpg,William Hyde,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,[unknown],[n/a],Sutherland,Scotland,[unknown],[n/a],58.25;-4.5,56;-4
4208.1.1,Donald,Ross,M,[lead vagrant],1,Solo Male,Westminster Vagrant,20,2,1783,24,4,1783,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50766/LMSMPS507660022.jpg,Charles Triquet,Tothillfields,Ridge,51.496111;-0.139722,51.683333;-0.233333,[unknown],[n/a],Ross-shire,Scotland,[unknown],[n/a],57.66667;-5,56;-4
10228.2.2,[Child],McKenzie,[unknown],[Child],2,Dependent,Westminster Vagrant,8,12,1785,5,1,1786,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50806/LMSMPS508060092.jpg,Michael Downes,Tothillfields,Ridge,51.496111;-0.139722,51.683333;-0.233333,[unknown],[n/a],Ross-shire,Scotland,[unknown],[n/a],57.66667;-5,56;-4
10228.1.2,Mary,McKenzie,F,[lead vagrant],2,Group Leader,Westminster Vagrant,8,12,1785,5,1,1786,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50806/LMSMPS508060092.jpg,Michael Downes,Tothillfields,Ridge,51.496111;-0.139722,51.683333;-0.233333,[unknown],[n/a],Ross-shire,Scotland,[unknown],[n/a],57.66667;-5,56;-4
2769.2.2,[Wife],Frazier,F,[Wife],2,Dependent,City Vagrant,15,10,1781,3,12,1781,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50747/LMSMPS507470011.jpg,John Hart,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Ross-shire,Scotland,[unknown],[n/a],57.66667;-5,56;-4
2769.1.2,David,Frazier,M,[lead vagrant],2,Group Leader,City Vagrant,15,10,1781,3,12,1781,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50747/LMSMPS507470011.jpg,John Hart,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Ross-shire,Scotland,[unknown],[n/a],57.66667;-5,56;-4
2052.1.1,Charles,Nishie,M,[lead vagrant],1,Solo Male,City Vagrant,11,1,1781,22,2,1781,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50737/LMSMPS507370011.jpg,Richard Clark,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Nairn,[n/a],Nairnshire,Scotland,57.58094;-3.87973,[n/a],57.5;-3.83333,56;-4
4586.2.2,[Wife],Gordon,F,[Wife],2,Dependent,City Vagrant,4,9,1783,23,10,1783,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50773/LMSMPS507730016.jpg,James Kettleby,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Banffshire,Scotland,[unknown],[n/a],57.5;-3.08333,56;-4
4586.1.2,James,Gordon,M,[lead vagrant],2,Group Leader,City Vagrant,4,9,1783,23,10,1783,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50773/LMSMPS507730016.jpg,James Kettleby,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Banffshire,Scotland,[unknown],[n/a],57.5;-3.08333,56;-4
881.1.1,Alexander,McFerson,M,[lead vagrant],1,Solo Male,City Vagrant,16,7,1778,17,9,1778,5,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50687/LMSMPS506870374.jpg,James Esdaile,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Inveran,[n/a],Banffshire,Scotland,57.93333;-4.4,[n/a],57.5;-3.08333,56;-4
1074.1.1,Mary,Beager,F,[lead vagrant],1,Single Female,Middlesex Vagrant,13,1,1780,24,2,1780,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50722/LMSMPS507220054.jpg,William Palmer,Clerkenwell,Cheshunt,51.524444;-0.107222,51.699888;-0.028486,Drumblate,[n/a],Banffshire,Scotland,[unknown],[n/a],57.5;-3.08333,56;-4
4091.1.1,Robert,Innis,M,[lead vagrant],1,Solo Male,Westminster Vagrant,20,2,1783,24,4,1783,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50766/LMSMPS507660018.jpg,Sampson Wright,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,Elgin,[n/a],Morayshire,Scotland,57.65;-3.33333,[n/a],57.41667;-3.25,56;-4
468.2.2,[Child],Middleston,[unknown],[Child],2,Dependent,City Vagrant,17,9,1778,22,10,1778,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50684/LMSMPS506840156.jpg,Thomas Wright,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Medlick,Aberdeen,Aberdeenshire,Scotland,57.14369;-2.09814,57.14369;-2.09814,57.16667;-2.66667,56;-4
468.1.2,Elizabeth,Middleston,F,[lead vagrant],2,Group Leader,City Vagrant,17,9,1778,22,10,1778,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50684/LMSMPS506840156.jpg,Thomas Wright,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Medlick,Aberdeen,Aberdeenshire,Scotland,57.14369;-2.09814,57.14369;-2.09814,57.16667;-2.66667,56;-4
6473.1.1,John,Miller,M,[lead vagrant],1,Solo Male,City Vagrant,20,5,1784,1,7,1784,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50784/LMSMPS507840087.jpg,John Bates,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
2299.2.2,[Child],McKenzie,[unknown],[Child],2,Dependent,Westminster Vagrant,9,7,1781,10,9,1781,5,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50744/LMSMPS507440013.jpg,George Stubbs,Tothillfields,Ridge,51.496111;-0.139722,51.683333;-0.233333,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
2299.1.2,Margaret,McKenzie,F,[lead vagrant],2,Group Leader,Westminster Vagrant,9,7,1781,10,9,1781,5,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50744/LMSMPS507440013.jpg,George Stubbs,Tothillfields,Ridge,51.496111;-0.139722,51.683333;-0.233333,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
89.1.1,Mary,Thompson,F,[lead vagrant],1,Single Female,Middlesex Vagrant,4,12,1777,15,1,1778,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50677/LMSMPS506770121.jpg,John Staples,Clerkenwell,Ridge,51.524444;-0.107222,51.683333;-0.233333,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
7149.1.1,Margaret,Grant,F,[lead vagrant],1,Single Female,City Vagrant,2,12,1784,6,1,1785,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50792/LMSMPS507920080.jpg,Richard Clark,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
1182.2.3,[Child],Anderson,[unknown],[Child],3,Dependent,City Vagrant,24,2,1780,6,4,1780,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50724/LMSMPS507240058.jpg,Francis Hugonin,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
1182.3.3,[Child],Anderson,[unknown],[Child],3,Dependent,City Vagrant,24,2,1780,6,4,1780,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50724/LMSMPS507240058.jpg,Francis Hugonin,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
1182.1.3,Elizabeth,Anderson,F,[lead vagrant],3,Group Leader,City Vagrant,24,2,1780,6,4,1780,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50724/LMSMPS507240058.jpg,Francis Hugonin,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
1473.1.1,Robert,Mason,M,[lead vagrant],1,Solo Male,City Vagrant,14,9,1780,19,10,1780,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50730/LMSMPS507300050.jpg,B Kennett,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
4781.1.1,John,Young,M,[lead vagrant],1,Solo Male,City Vagrant,4,9,1783,23,10,1783,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50773/LMSMPS507730023.jpg,John Hart,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
2959.1.1,Margaret,Scott,F,[lead vagrant],1,Single Female,City Vagrant,15,10,1781,3,12,1781,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50747/LMSMPS507470017.jpg,William Plomer,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3900.2.3,[Child],Steward,[unknown],[Child],3,Dependent,City Vagrant,28,11,1782,9,1,1783,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50763/LMSMPS507630017.jpg,Richard Turner,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3900.3.3,[Child],Steward,[unknown],[Child],3,Dependent,City Vagrant,28,11,1782,9,1,1783,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50763/LMSMPS507630017.jpg,Richard Turner,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3900.1.3,Mary,Steward,F,[lead vagrant],3,Group Leader,City Vagrant,28,11,1782,9,1,1783,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50763/LMSMPS507630017.jpg,Richard Turner,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3027.1.1,Peter,Forbes,M,[lead vagrant],1,Solo Male,City Vagrant,3,12,1781,7,1,1782,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50750/LMSMPS507500009.jpg,James Kettleby,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
1710.2.2,[Wife],Robertson,F,[Wife],2,Dependent,City Vagrant,19,10,1780,7,12,1780,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50732/LMSMPS507320058.jpg,Edward Hulse,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Longmay,Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
1710.1.2,Alexander,Robertson,M,[lead vagrant],2,Group Leader,City Vagrant,19,10,1780,7,12,1780,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50732/LMSMPS507320058.jpg,Edward Hulse,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Longmay,Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
7443.1.1,John,Dickle,M,[lead vagrant],1,Solo Male,City Vagrant,6,1,1785,17,2,1785,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50793/LMSMPS507930095.jpg,Richard Clark,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
8288.1.1,Andrew,McLorn,M,[lead vagrant],1,Solo Male,Westminster Vagrant,5,5,1785,24,6,1785,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50797/LMSMPS507970075.jpg,James Fielding,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
341.1.1,Catherine,Nicholas,F,[lead vagrant],1,Single Female,City Vagrant,4,6,1778,16,7,1778,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50682/LMSMPS506820201.jpg,James Esdaile,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
9274.1.1,Alexander,Davidson,M,[lead vagrant],1,Solo Male,Westminster Vagrant,8,9,1785,13,10,1785,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50801/LMSMPS508010129.jpg,Robert Taylor,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
663.2.2,[Child],Mackenzie,[unknown],[Child],2,Dependent,Passing Vagrant,22,10,1778,10,12,1778,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50687/LMSMPS506870212.jpg,John Bullock,Colnbrook,Cheshunt,51.4835;-0.5221,51.699888;-0.028486,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
663.1.2,Isabella,Mackenzie,F,[lead vagrant],2,Group Leader,Passing Vagrant,22,10,1778,10,12,1778,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50687/LMSMPS506870212.jpg,John Bullock,Colnbrook,Cheshunt,51.4835;-0.5221,51.699888;-0.028486,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
10170.1.1,George,Ferguson,M,[lead vagrant],1,Solo Male,Passing Vagrant,8,12,1785,5,1,1786,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50806/LMSMPS508060090.jpg,William Parker,Colnbrook,Cheshunt,51.4835;-0.5221,51.699888;-0.028486,[unknown],Aberdeen,Aberdeenshire,Scotland,[unknown],57.14369;-2.09814,57.16667;-2.66667,56;-4
5227.2.2,[Wife],Donovan,F,[Wife],2,Dependent,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780023.jpg,Nathaniel Newnham,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
5227.1.2,John,Donovan,M,[lead vagrant],2,Group Leader,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780023.jpg,Nathaniel Newnham,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
5248.1.1,John,Forbes,M,[lead vagrant],1,Solo Male,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780023.jpg,John Hart,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
5385.1.1,George,Medlar,M,[lead vagrant],1,Solo Male,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780028.jpg,Robert Peckham,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
5489.3.3,[Child],Sclitt,[unknown],[Child],3,Dependent,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780032.jpg,John Bates,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
5489.2.3,[Wife],Sclitt,[unknown],[Wife],3,Dependent,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780032.jpg,John Bates,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
5489.1.3,James,Sclitt,M,[lead vagrant],3,Group Leader,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780032.jpg,John Bates,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3937.1.1,James,Alderdue,M,[lead vagrant],1,Solo Male,City Vagrant,20,2,1783,24,4,1783,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50766/LMSMPS507660013.jpg,Nathaniel Newnham,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3998.2.2,[Wife],Cummins,F,[Wife],2,Dependent,City Vagrant,20,2,1783,24,4,1783,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50766/LMSMPS507660015.jpg,Nathaniel Newnham,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3998.1.2,John,Cummins,M,[lead vagrant],2,Group Leader,City Vagrant,20,2,1783,24,4,1783,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50766/LMSMPS507660015.jpg,Nathaniel Newnham,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3478.1.1,William,Henderson,M,[lead vagrant],1,Solo Male,City Vagrant,13,5,1782,1,7,1782,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50756/LMSMPS507560011.jpg,William Plomer,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3514.1.1,John,MacPheason,M,[lead vagrant],1,Solo Male,City Vagrant,13,5,1782,1,7,1782,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50756/LMSMPS507560013.jpg,Evan Pugh,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3784.2.2,[Child],Grant,[unknown],[Child],2,Dependent,City Vagrant,28,11,1782,9,1,1783,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50763/LMSMPS507630009.jpg,John Boydell,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
3784.1.2,Margaret,Grant,F,[lead vagrant],2,Group Leader,City Vagrant,28,11,1782,9,1,1783,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50763/LMSMPS507630009.jpg,John Boydell,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
7298.1.1,Margaret,Smith,F,[lead vagrant],1,Single Female,Middlesex Vagrant,2,12,1784,6,1,1785,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50792/LMSMPS507920085.jpg,John Staples,Clerkenwell,Cheshunt,51.524444;-0.107222,51.699888;-0.028486,[unknown],[n/a],Aberdeenshire,Scotland,[unknown],[n/a],57.16667;-2.66667,56;-4
5798.1.1,James,McPharson,M,[lead vagrant],1,Solo Male,City Vagrant,19,2,1784,15,4,1784,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50781/LMSMPS507810083.jpg,Richard Pickham,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
3319.1.1,Catherine,Butler,F,[lead vagrant],1,Single Female,City Vagrant,8,4,1782,13,5,1782,3,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50754/LMSMPS507540008.jpg,John Burcombe,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
2309.1.1,Duncan,McArthy,M,[lead vagrant],1,Solo Male,City Vagrant,9,7,1781,10,9,1781,5,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50744/LMSMPS507440014.jpg,Thomas Wright,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
3619.2.3,[Child],Frazier,[unknown],[Child],3,Dependent,City Vagrant,9,9,1782,14,10,1782,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50759/LMSMPS507590019.jpg,Evan Pugh,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
3619.3.3,[Child],Frazier,[unknown],[Child],3,Dependent,City Vagrant,9,9,1782,14,10,1782,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50759/LMSMPS507590019.jpg,Evan Pugh,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
3619.1.3,Jane,Frazier,F,[lead vagrant],3,Group Leader,City Vagrant,9,9,1782,14,10,1782,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50759/LMSMPS507590019.jpg,Evan Pugh,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
3665.2.2,[Child],Mackinnon,[unknown],[Child],2,Dependent,City Vagrant,9,9,1782,14,10,1782,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50759/LMSMPS507590021.jpg,John Levy,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
3665.1.2,Sarah,Mackinnon,F,[lead vagrant],2,Group Leader,City Vagrant,9,9,1782,14,10,1782,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50759/LMSMPS507590021.jpg,John Levy,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
5033.1.1,Alexander,Robinson,M,[lead vagrant],1,Solo Male,City Vagrant,4,12,1783,8,1,1784,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50777/LMSMPS507770275.jpg,Robert Peckham,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
2587.3.5,[Child],McDonald,[unknown],[Child],5,Dependent,City Vagrant,10,9,1781,15,10,1781,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50745/LMSMPS507450014.jpg,John Russell,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Croy,[n/a],Inverness-shire,Scotland,57.51667;-4.03333,[n/a],57.08333;-4.66667,56;-4
2587.4.5,[Child],McDonald,[unknown],[Child],5,Dependent,City Vagrant,10,9,1781,15,10,1781,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50745/LMSMPS507450014.jpg,John Russell,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Croy,[n/a],Inverness-shire,Scotland,57.51667;-4.03333,[n/a],57.08333;-4.66667,56;-4
2587.5.5,[Child],McDonald,[unknown],[Child],5,Dependent,City Vagrant,10,9,1781,15,10,1781,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50745/LMSMPS507450014.jpg,John Russell,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Croy,[n/a],Inverness-shire,Scotland,57.51667;-4.03333,[n/a],57.08333;-4.66667,56;-4
2587.2.5,[Wife],McDonald,F,[Wife],5,Dependent,City Vagrant,10,9,1781,15,10,1781,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50745/LMSMPS507450014.jpg,John Russell,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Croy,[n/a],Inverness-shire,Scotland,57.51667;-4.03333,[n/a],57.08333;-4.66667,56;-4
2587.1.5,Alexander,McDonald,M,[lead vagrant],5,Group Leader,City Vagrant,10,9,1781,15,10,1781,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50745/LMSMPS507450014.jpg,John Russell,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Croy,[n/a],Inverness-shire,Scotland,57.51667;-4.03333,[n/a],57.08333;-4.66667,56;-4
9868.1.1,John,McCoy,M,[lead vagrant],1,Solo Male,City Vagrant,13,10,1785,8,12,1785,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50803/LMSMPS508030142.jpg,William Mason,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Inverness,[n/a],Inverness-shire,Scotland,57.47908;-4.22398,[n/a],57.08333;-4.66667,56;-4
5159.2.2,[Wife],Cramond,F,[Wife],2,Dependent,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780021.jpg,Nathaniel Newnham,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
5159.1.2,Robert,Cramond,M,[lead vagrant],2,Group Leader,City Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780021.jpg,Nathaniel Newnham,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
5840.1.1,George,McLeod,M,[lead vagrant],1,Solo Male,City Vagrant,19,2,1784,15,4,1784,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50781/LMSMPS507810084.jpg,Robert Shank,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
517.1.1,Jane,Stewart,F,[lead vagrant],1,Single Female,Westminster Vagrant,17,9,1778,22,10,1778,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50684/LMSMPS506840158.jpg,William Barrett,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,Inverness,[n/a],Inverness-shire,Scotland,57.4717;-4.2254,[n/a],57.08333;-4.66667,56;-4
575.1.1,Eleanor,Clarke,F,[lead vagrant],1,Single Female,Westminster Vagrant,22,10,1778,10,12,1778,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50687/LMSMPS506870210.jpg,Jonah Durden,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,Flask,[n/a],Inverness-shire,Scotland,[unknown],[n/a],57.08333;-4.66667,56;-4
1840.1.1,Jane,Grant,F,[lead vagrant],1,Single Female,City Vagrant,7,12,1780,11,1,1781,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50736/LMSMPS507360005.jpg,Watkins Lewes,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Montrose,[n/a],Angus,Scotland,56.7;-2.45,[n/a],56.66667;-2.91667,56;-4
1518.1.1,John,Brown,M,[lead vagrant],1,Solo Male,Middlesex Vagrant,19,10,1780,7,12,1780,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50732/LMSMPS507320051.jpg,John Staples,Clerkenwell,Ridge,51.524444;-0.107222,51.683333;-0.233333,Glamis,[n/a],Angus,Scotland,56.60858;-3.00332,[n/a],56.66667;-2.91667,56;-4
6362.3.4,[Child],Colville,[unknown],[Child],4,Dependent,City Vagrant,20,5,1784,1,7,1784,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50784/LMSMPS507840084.jpg,John Hart,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
6362.4.4,[Child],Colville,[unknown],[Child],4,Dependent,City Vagrant,20,5,1784,1,7,1784,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50784/LMSMPS507840084.jpg,John Hart,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
6362.2.4,[Wife],Colville,F,[Wife],4,Dependent,City Vagrant,20,5,1784,1,7,1784,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50784/LMSMPS507840084.jpg,John Hart,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
6362.1.4,Peter,Colville,M,[lead vagrant],4,Group Leader,City Vagrant,20,5,1784,1,7,1784,4,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50784/LMSMPS507840084.jpg,John Hart,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
3746.2.3,[Child],Colden,[unknown],[Child],3,Dependent,City Vagrant,28,11,1782,9,1,1783,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50763/LMSMPS507630008.jpg,Richard Turner,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
3746.3.3,[Child],Colden,[unknown],[Child],3,Dependent,City Vagrant,28,11,1782,9,1,1783,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50763/LMSMPS507630008.jpg,Richard Turner,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
3746.1.3,Eleanor,Colden,F,[lead vagrant],3,Group Leader,City Vagrant,28,11,1782,9,1,1783,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50763/LMSMPS507630008.jpg,Richard Turner,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
1516.1.1,John,Bennett,M,[lead vagrant],1,Solo Male,City Vagrant,19,10,1780,7,12,1780,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50732/LMSMPS507320051.jpg,B Kennett,House,Ridge,51.5321;-0.1066,51.683333;-0.233333,[unknown],[n/a],Angus,Scotland,[unknown],[n/a],56.66667;-2.91667,56;-4
7248.1.1,John,Neal,M,[lead vagrant],1,Solo Male,City Vagrant,2,12,1784,6,1,1785,8,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50792/LMSMPS507920083.jpg,Richard Clark,House,Ran from House,51.5321;-0.1066,[runner],[unknown],[n/a],Angus,Scotland,[unknown],[n/a],56.66667;-2.91667,56;-4
5682.1.1,James,Emery,M,[lead vagrant],1,Solo Male,City Vagrant,19,2,1784,15,4,1784,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50781/LMSMPS507810079.jpg,Richard Clark,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Montrose,[n/a],Angus,Scotland,56.7;-2.45,[n/a],56.66667;-2.91667,56;-4
694.1.1,Elizabeth,Rathay,F,[lead vagrant],1,Single Female,City Vagrant,22,10,1778,10,12,1778,7,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50687/LMSMPS506870213.jpg,James Fielding,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.5;-2.96667,[n/a],56.66667;-2.91667,56;-4
5501.3.4,[Child],Taylor,[unknown],[Child],4,Dependent,Passing Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780032.jpg,Thomas Davies,Colnbrook,Cheshunt,51.4835;-0.5221,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
5501.4.4,[Child],Taylor,[unknown],[Child],4,Dependent,Passing Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780032.jpg,Thomas Davies,Colnbrook,Cheshunt,51.4835;-0.5221,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
5501.2.4,[Wife],Taylor,F,[Wife],4,Dependent,Passing Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780032.jpg,Thomas Davies,Colnbrook,Cheshunt,51.4835;-0.5221,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
5501.1.4,Robert,Taylor,M,[lead vagrant],4,Group Leader,Passing Vagrant,8,1,1784,19,2,1784,1,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50778/LMSMPS507780032.jpg,Thomas Davies,Colnbrook,Cheshunt,51.4835;-0.5221,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
5819.2.4,[Child],Maxwell,[unknown],[Child],4,Dependent,Westminster Vagrant,19,2,1784,15,4,1784,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50781/LMSMPS507810083.jpg,Robert Abington,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
5819.3.4,[Child],Maxwell,[unknown],[Child],4,Dependent,Westminster Vagrant,19,2,1784,15,4,1784,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50781/LMSMPS507810083.jpg,Robert Abington,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
5819.4.4,[Child],Maxwell,[unknown],[Child],4,Dependent,Westminster Vagrant,19,2,1784,15,4,1784,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50781/LMSMPS507810083.jpg,Robert Abington,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
5819.1.4,Anna Maria,Maxwell,F,[lead vagrant],4,Group Leader,Westminster Vagrant,19,2,1784,15,4,1784,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50781/LMSMPS507810083.jpg,Robert Abington,Tothillfields,Cheshunt,51.496111;-0.139722,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
10731.1.1,David,Hunter,M,[lead vagrant],1,Solo Male,Middlesex Vagrant,16,2,1786,21,4,1786,2,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50809/LMSMPS508090220.jpg,Joseph Beate,Clerkenwell,Cheshunt,51.524444;-0.107222,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
2543.2.3,[Child],Jackstone,[unknown],[Child],3,Dependent,City Vagrant,10,9,1781,15,10,1781,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50745/LMSMPS507450012.jpg,Watkins Lewes,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4
2543.3.3,[Child],Jackstone,[unknown],[Child],3,Dependent,City Vagrant,10,9,1781,15,10,1781,6,http://hri.shef.ac.uk/san/pl/SM/PS/LMSMPS50745/LMSMPS507450012.jpg,Watkins Lewes,House,Cheshunt,51.5321;-0.1066,51.699888;-0.028486,Dundee,[n/a],Angus,Scotland,56.46667;-2.91667,[n/a],56.66667;-2.91667,56;-4

Given a file of text codes

Column info: text

**Gender of Lead Vagrant**
------------------------
* Description of column:
* Collection methods:
* Description of data values and units:
* Reason for missing values: 

* percent_digit: 0%
* percent_missing: 0%
* min_digit: no digits
* missing: 0
* unique_value_content: The values are:
    * [unknown]
    * M
    * F

* unique_values: 3 (this includes missing values)
* max_digit: no digits

Column name

]

Questions to fill out

Descriptive statistics

{
    "MiddlesexVagrants1777-1786v1.1.csv": {
        "columns": {
            "Gender of Lead Vagrant": {
                "percent_digit": "0%", 
                "percent_missing": "0%", 
                "min_digit": "no digits", 
                "missing": 0, 
                "unique_value_content": 
                        "The values are:\n\t* [unknown]\n\t* M\n\t* F\n", 
                "unique_values": "3 (this includes missing values)", 
                "max_digit": "no digits"
            },
            [more columns...]
        },
        "csv_basic": {
            "num_rows": 14789, 
            "missing": "[missing]", 
            "num_columns": 29
        }, 
        "file_metadata": {
            "last_access": "2016-08-11 21:02:30", 
            "size": 4641076, 
            "last_modified": "2016-04-24 17:31:01", 
            "filename": "MiddlesexVagrants1777-1786v1.1.csv"
        }
    },
    [more files...]
}

So what's going on in the code?

for f in files:
    if f.endswith('.csv'):
        finfo = basic_stats(f)
        headers = get_headers(f)
        csvinfo = review_csv(f, mode = 'rU', missing = missingcode)
        all_file_data[f] = ({'file_metadata': finfo, \
                         'csv_basic': csvinfo['csv_basic'], \
                         'columns': csvinfo['cols']})
        make_md(f, all_file_data[f], headers, target)

For each file

if it is a CSV

get some info on it

organize that info and write profile file

BYOPF

"bring your own profiling functions"

for f in files:
    if f.endswith('¯\_(ツ)_/¯'):
        finfo = ຈلຈ(f)
        headers = ಠДಠ(f)
        csvinfo = ಥДಥ(f, mode = 'rU', missing = missingcode)
        all_file_data[f] = ({'file_metadata': finfo, \
                         'csv_basic': csvinfo['csv_basic'], \
                         'columns': csvinfo['cols']})
        make_md(f, all_file_data[f], headers, target)

(valid python3 code ^_^)

pip not required

  • Uses all standard packages in 2.7
import os
from os.path import isfile, join
import csv
import datetime
import glob
import sys
import json

Future directions

Just a proof of concept

  • Easier to start with something than nothing goes for code as well
  • CSVs are common and easy, so the best of low hanging fruit
  • Needs more work for data type, more granular control, etc.

Features I'd like to add:

  • Turn this into a web tool that can be locally launched
  • Better auto detection for data types
  • More statistics
  • Prettier outputs

Questions?

https://github.com/elliewix/data-profile-tool

@elliewix

Autodocish

By Elizabeth W.

Autodocish

Slide deck for PyData Chicago 2016 presentation on August 27, under construction until then.

  • 758