Foreign instructor has a PhD, but no patronymic, how do we address him?!?
If needed
Churikova, Ekaterina; Arzyamova, Dasha; Kanter, Daria; Egorova, Anastasia; Dergunova, Ekaterina; Skubko, Anfisa; Fomenkova, Anastasia; Tambasov, Eugene; Nogay, Anastasia; Somkova, Daria; Sherman, Elina; Skopintseva, Valentina; Papishvili, Anastasia; Smagina, Elizaveta; Komareeva, Tatiana; Pavelko, Ekaterina; Melianova, Kate; Naryan, Svetlana; Shubenicheva, Liya; Lebanova, Anastasia; Uchaneyshuili, Iya; Kovalenko, Olga; Klimeshova, Julia; Remizova, Yuliana; Tkachuk, Dmitry; Kudryavtseva, Maria; Nyagina, Maria; Novazilova, Ekaterina; Chukina, Nina; Lyalina, Nadya; Bugaeva, Anastasija; Lukina, Anastasia
"[T]he ability to eliminate alternative explanations of the dependent variable" Neuman (2007:212)
"[T]he ability to generalize experimental findings to events and settings outside the experiment itself" Neuman (2007:216)
Was it about cats?
What do you think it was about?
Industry folks call it "A/B Testing"
The broad class of units that are covered in a hypothesis. All the units to which the findings of a specific study might be generalized. (Neuman 2007)
"The name for the large group of many cases from which a researcher draws a sample and which is usually stated in theoretical terms." (Neuman 2007)
A list of cases in a population, or the best approximation of it. (Neuman 2007)
A smaller set of cases a researcher selects from a larger pool and generalizes to the population. (Neuman 2007)
The number of sampled cases divided by the size of the population they represent
A characteristic of the population, typically estimated with statistics
The difference between the measured parameter in a sample and the population parameter
As the number of random samples on a measurement increase, their average approaches the population parameter
An interval in which a research claims, with a given degree of certainty, includes the population parameter
A distribution created by drawing many random samples from the same population" (Neuman 2007)
Is there a known probability of a case being selected?
Nonprobability Samples
Haphazard/Accidental/Convenience Sampling
Photo courtesy of Anneli Salo
Haphazard/Accidental/Convenience Sampling
What are the limitations to this sampling method?
When should this sampling method be used for substantive knowledge?
NEVER!
EVER!
EVER!
Quota Sampling
Photo courtesy of BrianZim
Quota Sampling
Steps
What are the problems with this sampling method?
Purposive/Judgmental Sampling
When is it appropriate?
Continues until data or research exhaustion
Purposive/Judgmental Sampling
Outliers: The Story of Success is a non-fiction book written by Malcolm Gladwell….. In Outliers, Gladwell examines the factors that contribute to high levels of success. To support his thesis, he examines the causes of why the majority of Canadian ice hockey players are born in the first few months of the calendar year, how Microsoft co-founder Bill Gates achieved his extreme wealth, how The Beatles became one of the most successful musical acts in human history.... Throughout the publication, Gladwell repeatedly mentions the "10,000-Hour Rule".... (Wikipedia)
Purposive/Judgmental Sampling
Variant: Sequential Sampling
Continues until no new information or sample diversity attained
Purposive/Judgmental Sampling
Variant: Deviant Case
("extreme" case)
To be discussed during our lessons on case studies.
Snowball Sampling
("network," "chain referral," or "reputational" sampling)
Snowball Sampling
Steps
Snowball Sampling
Snowball Sampling
Simulating the process
Snowball Sampling
Which social phenomena is this method good for studying?
Who are we more likely to reach in this population?
Who are we least likely to reach in this population?
Types
Simple Random Sampling
Steps
Photo courtesy of saschapohflepp
Systematic Sampling
Steps
How would this method compare to random sampling?
How could a cyclical sampling frame affect your results?
Stratified Sampling
Steps
Main benefits
Stratified Sampling
Consider our example "population"
(i.e., students in our class)
How could we construct a stratified sample?
Cluster Sampling
(aka, "multistage sampling")
Steps
Cluster Sampling
Consider our example "population"
(i.e., students in our class)
How could we construct a cluster sample?
Cluster Sampling
Advantages
Disadvantages
Tradeoff on cluster numbers and cluster size
Who carries more money on hand?
Photo courtesy of Takkk
Is there a sampling frame?
Random digit dialing as cluster sampling
Words of caution
What do weights do?
Why are weights sometimes needed?
On which criteria should respondents be weighted?
What are hidden populations?
Capture-Recapture
Respondent-Driven Sampling
Scale-up Methods
Photo courtesy of Oldmaison
Photo courtesy of Todd Huffman
Photo courtesy of Orangeadnan
Photo courtesy of AdamCohn
Photo courtesy of maxintosh
Photo courtesy of kargaltsev
Photo courtesy of T-Hino
Lack a Sampling Frame
Characteristics
Commonalities: They're not Weberian Bureaucracies.
Capture-Recapture
Photo courtesy of Mickey Samuni-Blank
Capture-Recapture
Two Capture Sweeps
N = M * C / R
R / M = C / N
How do we ethically "capture" and "mark" humans?
Scale-up Methods
Scale-up Methods
Basic points:
Respondent Driven Sampling
(Heckathorn and Jeffri 2001)
Respondent Driven Sampling
(Heckathorn and Jeffri 2001)
Address Problems of Chain Referrals
Respondent Driven Sampling
(Heckathorn and Jeffri 2001)
Address Problems of Chain Referrals
Photo courtesy of Roland zh
Steps
Picture courtesy of Bill Ebbesen
The following lesson relies upon and draws heavily from Gerring (2007)
Case connotes a spatially delimited phenomenon (a unit) observed at a single point in time or over some period of time. Gerring (2007:19)
A case study may be understood as the intensive study of a single case where the purpose of that study is -- at least in part -- to shed light on a larger class of cases (a population). Gerring (2007:20)
At the point where the emphasis of a study shifts from the individual case to a sample of cases, we shall say that a study is cross-case. Gerring (2007:20)
An observation is the most basic element of any empirical endeavor. Gerring (2007:20)
Typically, "N " refers to the number of observations
Population > Sample > Case ≥ Observation
Neither inherently qualitative nor quantitative.
There are certain affinities, though.
Population typically difficult to discern.
A single observation may be understood as containing several dimensions, each of which may be measured ...as a variable. Gerring (2007:20)
Y
X
Data Frames and Matrices
Typically done in a spreadsheet
Research must examine variation across cases or units
Dimensions of variation
Research Goals
Empirical Considerations
Griffin (1993:1110) AJS
Griffin (1993:1110) AJS
Tendencies
Practical Reasons
Let's focus our strategy to this concept for an example.
We'll pretend we're going to investigate the causes in rich institutional detail.
Representation
Hypothesis testing
Income Inequality Example
Values range rather than distribution
Hypothesis testing
Hypothesis generation
African Income Inequality by Former European Occupier (circa 1914)
Outliers
Representative only relative to larger sample of cases
Hypothesis generation
Case of Extreme Inequality: Seychelles, 65.77, empirical maximum Gini index
Outlier due to nonconforming relationship
Identify alternative relationships
Hypothesis generation
Example: Macedonia has a very high Gini index for a European state and especially a former socialist republic (44.2). Why is it the exception?
Case least likely to exhibit relationship
Representativeness questionable
Hypothesis testing
Select cases based on covariational patterns (Combinations)
Interests
Hypothesis testing
Joined Soviet Union/CIS agreement (protocol) ratified
Select very similar cases with different outcomes
Cases should have only one independent difference
That difference is the key variable
Maybe representative
Hypothesis testing and generating
See "Movements and Memory: The Making of the Stonewall Myth," to be discussed later.
Select very different cases with similar outcomes
Cases should have only one independent commonality
That commonality is the key variable
Maybe representative
Hypothesis testing and generating
E.g., Why do Iraq and Serbia have very comparable levels of income inequality? (29.54 and 29.65, respectively)
Let's create some simple data!
Without looking at your neighbor's responses, write down
No questions.
Now write it on the board, exactly as you wrote it on paper.
What are examples of each form of data type?
Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.
- Donald Rumsfeld, 2002
Much of the material from this lesson draws from
Krippendorff (2004) Content Analysis: An Introduction to Its Methodology
How to Collect Data from Texts
What is a "text?"
What sort of projects is this method good for?
Advantages
What are some disadvantages?
Steps
Provide an example of a text.
Krippendorff (2004:83)
Which unit will you be recording?
Even though large tracts of Europe and many old and famous states have fallen or may fall into the grip of the Gestapo and all the odious apparatus of Nazi rule, we shall not flag or fail. We shall go on to the end, we shall fight in France, we shall fight on the seas and oceans, we shall fight with growing confidence and growing strength in the air, we shall defend our Island whatever the cost may be, we shall fight on the beaches, we shall fight on the landing grounds, we shall fight in the fields and in the streets, we shall fight in the hills; we shall never surrender.
Which samples could these examples represent?
Which limitations would these samples face?
How could we sample within these examples?
Let's use a panel as our unit. What could we code?
E.g.,
What proportion of panels portray violence?
What proportion of panels with violence display violence directed against one or more Nazis?
What proportion of nouns in Churchill's speech were first person plural?
What would be the context for these two examples?
Which sociological topics do they speak to?
Suggested Application
Images of text are not machine-readable
Software required to convert text images
(Always check for quality.)
Latoya Ammons, from Gary Indiana, and her three children claimed to have been possessed by evil spirits....Recently, the priest who dealt with the actual exorcisms of this family, Rev. Michael Maginot has signed with Evergreen Media Holdings to make his account of the story into a movie.
Latoya Ammons [person], from Gary Indiana [location], and her three children claimed to have been possessed by evil spirits....Recently, the priest who dealt with the actual exorcisms of this family, Rev. Michael Maginot [person] has signed with Evergreen Media Holdings [organization] to make his account of the story into a movie.
Source: Exorcism in Gary Indiana by Wikinews
What are some possible uses for sociologists?
Which relationships do you think could be extracted?
What is the emotional state of the author?
<Words associated with happiness> :D
<Words associated with unhappiness> :'(
Positive values suggest happiness
Negative values suggest unhappiness
Remember when I asked you to describe yourself?
I consider myself to be a very talented person in many different fields. I am a perfectionist and aim to be someone people admire and look up to.
Sentiment = 3
I am cheerful, active, and talkative; love group projects, but sometimes I get shy and depressed...
Sentiment = 1.25
Let's test a hypothesis!
Partnered students have a happier sentiment when describing themselves than single students.
Can a computer detect...
Photo by nosound
Application
Jetpac
...and it's often free or cheap.
Organized by subject
Offers description and reference to contemporary data
Publication of the Statistical Abstract of the United States stopped in 2012 due to budget cuts
Read the friggin' bibliography!
Andrews (2001:91)
Andrews (2001:92, 94-5)
McAdam (1983:739)
Marshakova, Irina V. 1981. Scientometrics 3, 1: 13-26.
Marshakova, Irina V. 1973. Scientific and Technical Information Serial of VINITI 6: 3-8
How would you convert a time series like McAdam (1983) into data?
Software
Their contact information is provided for a reason!
Reasons they say "no":
Reasons they say "yes":
Image courtesy of Nicknilov
What types of organizations are these?
What types of information do they release?
What the intended purposes for the data?
Image courtesy of Brion VIBBER
...1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000...
...2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011....
Quarter 1
Quarter 2
Quarter 3
Quarter 4
Module I
Module II
Module III
Module IV
January
February
March
April
May
June
July
August
September
October
November
December
Answer: Aggregate units
"Mashups"
Characteristics of the data behind the next diagram:
Where did the data come from?
The following lesson relies upon and draws heavily from Neuman (2007), Chapter 12.
How did we measure gender equity?
Linda Lark (Dell, no CCA) relative to Nurse Betsy Crane (Charlton, CCA)
Weak support CCA less gender equitable hypothesis
What was the case selection strategy?
What was the sampling strategy employed?
What are the suggested historical implications for gender socialization?
What are the other limitations to this exercise?
Purpose
Difficulties
(Milligan, JD. [1979] History and Theory 18:2:177-96.)
How does your research fit into the existent literature?
Tell a compelling story for readers.
Uses
Downsides
Examples?
Northrop, John Worrell. 1904. Chronicles from the diary of a war prisoner in Andersonville and other military prisons of the South in 1864. Wichita, KS. p. 66.
Uses
Downsides
Files and statistical documents produced by organizations.
Refer to the previous lesson.
Individuals recounting their past experiences.
Uses
Downsides
Uses
Downsides
Units of Comparison
Is the comparison appropriate?
Types of equivalence
Time 1, colored by study group
Time 2, colored by study group
Time 3, colored by study group
(Freeman 2004:3, 5)
A finite set or sets of actors and the relation or relations defined on them
(Wasserman and Faust 1994:20)
Actors are social entities
Actors do not necessarily have the ability to act
Actors (typically) are all of the same type
Formal terms for actors
Examples?
Actors may also have attributes
(e.g., age, sex, ethnicity)
Social ties link pairs of actors
Relations collect a specific set of ties among group members
Related formal terms
Relations may also have attributes
Degree
Number of edges incident upon a node
Density
Proportion of observed edges in a network
A set of nodes and edges within a graph
Undirected
Directed
You should attend funerals, because if you don't go to people's funerals, they won't go to yours.
Directed
A walk is a sequence of nodes and lines, starting and ending with nodes, in which each node is incident with the lines following and proceeding it in the sequence.
-Wasserman and Faust (1994:105)
A walk such that every edge traversed is unique
(yet not necessarily every node )
A trail such that every vertex traversed is distinct
There could be zero, one, or multiple walks, trails, and paths between any two vertices!
Problem: Walk must cross every bridge only once
Euler (1735) proved there is no solution for the walk
Pairwise
Path length: Number of edges traversed between two nodes
Geodesic: Shortest path between two nodes
Geodesic distance: Length of the shortest path between two nodes
Graph and Subgraph
Average path length
Mean geodesic distance
Diameter: Longest geodesic distance
A walk "that begins and ends at the same node" and has "at least three nodes in which all lines are distinct, and all nodes except the beginning and ending node are distinct."
Wasserman and Faust (1994:107-8)
Cycles have a length
If a path exists between each pair of vertices in a graph, then the graph is connected
A component is a maximally connected subgraph
An isolate is the smallest possible component: a single vertex without any ties to other vertexes in the graph
A bridge is an edge that, if removed, creates more components
A cutpoint is a node that, if removed, creates more components
Centrality: Nodal measurement
Who are the most important actors in a network?
Centralization: Graph measurement
How much difference in "importance" is there between actors within a network?
Generally, compares the observed network's centralization against the theoretical maximum
(Freeman 1979)
How many geodesics go through a node (or edge)?
Variations
Edge weighted
Edge betweenness
Proximity, Scale Long Paths, and Cutoff
Endpoints
Random walk
Q: What is closeness?
A: The inverse of farness!
Q: What is farness?
If connected, the sum of a node's geodesic distances to all other nodes
Variations
Unconnected graphs
Edge weighted
Random walk
the forces holding the individuals within the groupings in which they are
- Moreno and Jennings (1937:137)
Cohesive groups tend to
A maximally complete subgroup - Luce and Perry (1949)
~In other words~
Everyone has a tie to everyone else in the subgroup (complete)
No other, smaller subgroups include only a subset of the same actors (maximal)
Critique: Too stingy!
Can you identify the 3-cliques?
Can you identify the 3-cliques?
Can you identify the 3-cliques?
Check it out, there has been one stable 4-clique throughout the three time points!
Cohesive "seedbeds" nested within a network
Minimum #ties (k) each member of a subgroup has to other subgroup members
Directed graphs may measure k -cores through
Alvarez-Hamelin et al. (2006); Seidman (1983)
("Assortativity")
Birds of a feather flock together
Categorical vs. continuous variables
Sources?
Which relationships?
Felds's Foci
Forms of homophily
Intervening considerations
E-I Index
One (of many) measurements
EI = ( E - I ) / ( E + I )
E = #Ties between subunits
I = #Ties within subunits
Range: [-1, 1]
Lower values: More homophily
Higher values: Less homophily
Krackhardt (2003) The Journal of Applied Behavioral Science
The spread of a behavior or attribute
Requirements
Relationship to previous adopter increases a receiving node's propensity to adopt
Considerations
How do ties form?
"For everyone who has will be given more, and he will have an abundance. Whoever does not have, even what he has will be taken from him." (Matthew 25:29)
P(X=x) ~ x^(-alpha)
Nodes are of degree greater than or equal to x
P (X=x) is the probability of observing a node with degree x or greater
alpha is the scalar
(Barabási and Albert 1999)
Focus upon positions or "roles," not actors
Comprised of
Potential hypotheses
The following examples from Wasserman and Faust (1994:423)
Cohesive Subgroups
Center-Periphery
Centralized
Hierarchy
Transitivity
Which hypotheses do we have regarding the blocked structure of our class?
Time 1, Reduced Block Model, Vertices Scaled by Number of Students in each Block
Time 2, Reduced Block Model, Vertices Scaled by Number of Students in each Block
Time 3, Reduced Block Model, Vertices Scaled by Number of Students in each Block
Watts and Strogatz (1998)
Properties
Inspired by Milgram's "small world" experiment
Watts and Strogatz (1998)
Reality somewhere between
Picture courtesy of Arpad Horvath
Key questions as time proceeds:
References
(STERGM)
Observed t1
Observed t2
Observed t3
Simulated t2
Simulated t3
Simulated t4
Daredevil's degree centrality?
4
Minimum component size?
2
Edge list of Luke Cage's component?
(Cage, Fist), (Fist, Wing), (Wing, Knight), (Fist, Knight)
Density of Luke Cage's component?
4 / (4 * 3 / 2) = 0.667
Three triads with brokerage?
Three triads with transitivity?
Path from Foggy Nelson to Gabe Jones?
What is its geodesic distance?
How many cycles in the Hulk's component?
8
Which cutpoint, if removed, would produce the two largest components?
Which cutpoint, if removed, would produce the greatest number of components?
Describe the Asgardian subgraph in network terms?
Produce a research design with the following elements
Ideas?
FiveThirtyEight.com Predicts the 2012 US Presidential Election
FiveThirtyEight.com
Ideas?
What are some typical data problems?
(Assuming the operationalization matches the conceptualization.)
Observed = 0.14 (orange)
95% Bootstrapped Confidence Interval = [-0.36, 0.62]
What does it mean if a correlation confidence interval includes both positive and negative values?
Why worry about it?
Techniques
How does respondent error enter a dataset?
How does interviewer error enter a dataset?
Practically all data contain some error
Are your findings robust against it?
A way to find out
What are some forms of respondent error in our peer communication networks?
I propose the following peer selection error
Let's focus on transitivity
0% Rewired Edges
5% Rewired Edges
10% Rewired Edges
25% Rewired Edges
Is transitivity on our network at time one robust against random selection error?
What if your measurement results from structural properties?
Constant parameters
Varying parameters
One thousand simulations for each time point
Time 1, Observed
Time 1, Simulated
Time 2, Observed
Time 2, Simulated
Time 3, Observed
Time 3, Simulated
Interpretation?
Does the model adequately produce the outcome of interest?
Linear Regression Equation
y = B * x + e
It expresses a relationship that can be simulated!
Exercises in Theory
Portions of the next few slides draw from Macy and Willer (2002)
Four Common Assumptions
Simplicity and generality are key to good models.
Why is a particular model less theoretically useful?
Two major questions
Common explanatory factors
Quality models should...
Alexei Vazquez. 2003. "Growing Network with Local Rules: Preferential Attachment, Clustering Hierarchy, and Degree Correlations." Physical Review E 67, 056104
Network Growth
Vazquez (2003)
Parameters
Properties
Vazquez (2003)
We're going to add one slight modification for added realism...
Randomly rewire edges with probability p.
Implications:
Mark Granovetter. 1978. "Threshold Models of Collective Behavior." American Journal of Sociology 83:6:1420-43.
Outcome of interest: binary decisions
Examples
Mark Granovetter. 1978. "Threshold Models of Collective Behavior." American Journal of Sociology 83:6:1420-43.
Two main ideas
Mark Granovetter. 1978. "Threshold Models of Collective Behavior." American Journal of Sociology 83:6:1420-43.
F(thresholdi, xit) = Decision of i to engage at time t
F(thresholdi, xit) = xit > thresholdi
xit = Engagedit / (Engagedit + Unengagedit )
Engagedit = EngagedPeersit * (PeerEffect - 1) + AllEngagedt
Unengagedit = UnengagedPeersit * (PeerEffect - 1) + Allunengagedt
Given these theories on network tie formation and decision-making, why do some collective action episodes escalate more quickly than others?
Code available here:
Steps:
Parameters
Findings
Implications: Positive density (or transitivity) effect
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.795 1.069 4.487 1.97e-05 ***
u 7.434 1.991 3.733 0.000317 ***
Multiple R-squared: 0.1245, Adjusted R-squared: 0.1156
Parameters
Findings
Implications: Transitivity has no effect aside from density
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.855 1.113 7.057 2.45e-10 ***
p -2.234 1.846 -1.210 0.229
Multiple R-squared: 0.01472, Adjusted R-squared: 0.004666
Parameters
Findings
Implications: Transitivity has no effect aside from density
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.594 1.294 5.094 1.73e-06 ***
u 3.880 1.651 2.350 0.0208 *
p -2.790 1.750 -1.595 0.1141
Multiple R-squared: 0.07746, Adjusted R-squared: 0.05844
Parameters
Findings
Implications: Peer effect does not affect growth rate.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.2278 1.5214 5.408 4.5e-07 ***
Peer Effect -0.4018 0.4621 -0.869 0.387
Multiple R-squared: 0.007653, Adjusted R-squared: -0.002473
How would you improve upon this model?
"Carte figurative des pertes successives en hommes de l'Armée Française dans la campagne de Russie 1812-1813" by Charles Joseph Minard
Image courtesy of Schutz
Grammar tells you the structure of a language. Graphics also have a similar structure.
Wickham. 2010. J Comp & Graphical Stats, p. 5-6
Wickham. 2010. J Comp & Graphical Stats, p. 7
"Facet" = "face"
Same plot as earlier, but with two facets
Wickham. 2010. J Comp & Graphical Stats, p. 8
Graphics typically have layers
One layer placed on top of another
Parts of a plot
What are some of the layers here?
What are some of the aesthetic mappings in the layers?
Which geometric objects did Minard use? Are they zero, one, two, or three dimensional?
Was the data transformed statistically? If so, how?
Which examples of scaling did Minard use?
Which coordinate systems did Minard use?
How many facets did Minard use?
Image courtesy of XKCD.
Image courtesy of Eric Fischer
McAdam (1983:739)
Tufte (2000:37)
Power of explanatory variable, time and space, could only be descriptive.
Who is doing it, why, and policy implications are absent.
E.g., scatter plots and bubble plots
Zeeman 1976:67 reproduced in Tufte 2001:50
Showing an Historical Path
Tufte (2000:48)
Appropriate occaisions for a pie chart:
Absolutely never.
Appropriate occaisions for a 3D pie chart:
Only if you want to convey less information than a 2D pie chart. (Never)
[T]he only worse design than a pie chart is several of them
-Tufte
C looks big, but the angle is the smallest.
B and D have the same angle, yet the 3D perspective makes D larger.
Effect size shown in graphic / Effect size in data
Effect size = |(second value - first value) / first value|
Keep the "ink" that represents data
Reduce the ink that doesn't
represent information about the data
introduce new information about the data
Graphics should represent the substance of the data and nothing else
Quality academic writing should eliminate all unnecessary words. Likewise, quality graphics should eliminate unnecessary markings.
Edit your graphics like you would edit a sentence.
The interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. The purpose of decoration varies — to make the graphic appear more scientific and precise, to enliven the display, to give the designer an opportunity to exercise artistic skills. Regardless of its cause, it is all non-data-ink or redundant data-ink, and it is often chartjunk.
Tufte 1983, p. 107
Image courtesy of "self." (Not me!)
Image courtesy of Fibonacci.
Image courtesy of World24.
Tuft (2000:117)
What could it be useful for?
Mobilize every graphical element, perhaps several times over, to show the data.
-Tufte 2001:139
Data density of a graphic = #data frame entries / graphic area
(Tufte 2001:162)
Subplots within plots can often be helpful.
For non-data-ink, less is more.
For data-ink, less is a bore.
-Tufte 2001:1975
Do you agree or disagree?
Often good advice, though there's a danger of overplotting data and showing too many variables at once.
Image courtesy of Steve Jurvetson
Image courtesy of Christopher
Wilkinson et al. 2005:260 from Cleveland 1985
As evaluated without context
Healy and Moody 2014:121
(coordinates)
Image courtesy of Kami888
Look at all the white and grey, uninformative space!
The world's 71% ocean water in this graphic cannot tell us about diplomacy!
The location of countries isn't all that interesting, either.
We can do better sociology, да?
Maybe it's about the economy, a change in a country's GDP per capita, and it's level of democracy?
Axes are on a log scale (featuring small histograms).
Upper half are growing economies
How could this plot be improved?
What can we infer from this plot?
Things I don't like about it
What does this plot clarify?
How can it be edited further?
Which conclusions can we reach?
Can we improve upon it?
Add a time dimension
Redundant labeling.
Personal Favorites
Vector vs. Raster Image Format
Methods Guiding Theory
Every Day Methodological Proliferation
Abbott. 1988. “Transcending Generalized Linear Reality.” Sociological Theory.
Issues
X(t) = X (t - 1) * B + U
y = X * b + u
Abbott. 1988. “Transcending Generalized Linear Reality.” Sociological Theory.
y = X * b + u
X(t) = X (t - 1) * B + U
Better Reality Models
Savage and Burrows. 2007. “The Coming Crisis of Empirical Sociology." Sociology.
Savage and Burrows. 2009. “Some Further Reflections on the Coming Crisis of Empirical Sociology.” Sociology.
Sociologists no longer have a monopoly on social data.
Who collects the most data? How?
Savage and Burrows. 2007. “The Coming Crisis of Empirical Sociology." Sociology.
Savage and Burrows. 2009. “Some Further Reflections on the Coming Crisis of Empirical Sociology.” Sociology.
Survey Problems
Savage and Burrows. 2007. “The Coming Crisis of Empirical Sociology." Sociology.
Savage and Burrows. 2009. “Some Further Reflections on the Coming Crisis of Empirical Sociology.” Sociology.
In-Depth Interview Problems