Researching Big Data in

Indian Governance

 

A Case Study by

The Centre for Internet & Society

Bangalore, India

Elonnai Hickok . Amber Sinha . Vanya Rakesh . Scott Mason . Vipul Kharbanda . Sunil Abraham

 

Research Objectives

  • Identify methodologies for researching Big Data - a nascent topic in India.
  • Examine the differences, if any, between 'data' driven governance and 'Big Data' driven governance in a Quantified Society.
  • Identify which aspects of quantification affects policy and rights.
  • Explore the reality and potential of Big Data in governance in India. 
  • Identify the potential harms and benefits of Big Data in India.
  • Provide recommendations for anticipatory regulation.   

Scope

The use of Big Data in Governance in India is still emerging.  The stage at which 'Big Data' is relevant differs. For some, 'Big Data' is a tool that has been recognized as being important. For others, 'Big Data' analytics and techniques are a means to operationalize a scheme. While some schemes have not explicitly cited 'Big Data' but are structured in such a way that the use or generation of 'Big Data' is a potential.  

This case study focuses on the following: 

  • The Aadhaar scheme:  Implemented since 2010, a 12 digit identity number based on biometrics that can be used to authenticate individuals for delivery of benefits. 
  • Digital India: In the process of implementation, a governance campaign comprised of 9 pillars to transform India's economy and empower the citizen. Within these 9 pillars, this case study reviews 32 schemes. 
  • Smart Cities: In the process of conceptualization, a five year project to develop 100 smart cities across India to drive economic growth and improve quality of life. 

Research Questions

  • How is Big Data being used in Governance in India? 
  • What bodies/companies are driving the use of Big Data in Governance in India? 
  • What are the assumptions about the use of Big Data in Governance in India? 
  • How does Big Data pertain to a scheme, what inferences are drawn from the analysis and what policy pivots are driven by the inference? Can these inferences be clearly distinguished as data driven or big data driven? Does this distinction matter?  
  • What has been the public dialogue around a scheme in the context of big data, rights, and governance? 
  • How are India's data protection standards impacted by Big Data?  
  • Are there other legislation/policy besides privacy impacted by Big Data? 
  • Broadly what type of 'legal hurdles' could Big Data pose?

Research Methodology

  • Literature review 
  • Positive and negative mapping of data flows in each scheme
  • Mapping of scheme objectives and promises
  • Cradle to grave analysis of each scheme
  • Analysis of media reports, government notes, press releases, legislation, conference inputs, contracts, tenders, terms of service, and policy 
  • Interviews with experts and site visits 
  • Review of government websites' privacy policies
  • Mapping of domestic and foreign companies and technologies involved 
  • Right to Information Requests​ 

Adam Thierer on Permissionless Innovation

Text

http://techliberation.com/2013/03/04/who-really-believes-in-permissionless-innovation/

Qualifying Big Data for the Case Study 

What exactly is 'Big Data' is a contested question. Thus, this case study reviews schemes that are: 

  • Self Identified: Scheme policy documents describe the use of Big Data analytics and techniques. 

  • Publicly Identified: Described in publicly available third party sources as a scheme using Big Data or as Big Data being a critical component of the scheme. 

  • CIS Assessed: Schemes that indicate the use or generation of big data through aspects of the dataflow and that will enable a quantified society. 

Defining Big Data

3 Vs

  • Volume

  • Velocity

  • Variety

Defining Big Data

5 Vs

  • Volume

  • Velocity

  • Variety

  • Veracity

  • Value

Defining Big Data

7 parameters

  • Volume

  • Velocity

  • Variety

  • Exhaustiveness

  • Granularity

  • Interoperability

  • Scalability

Defining Big Data

Big Data as technology

  • Any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

  • open source technologies such as Hadoop and other NoSQL ways of storing and manipulating data

Defining Big Data

Big Data as data distinctions

  • Transactions, interactions and observations
  • Process-mediated data, human sourced information and machine sourced

The Importance of Data Flow 

Mapping out the flow of data in each scheme is important in understanding:

  • If Big Data is or potentially being generated and used. 
  • Where 'data gaps' could be in the research i.e, what are areas that are opaque or not accessible. 
  • Identify data flow processes and identify potential benefits or harms.
  • The impact that each aspect of data flow could have on rights or governance. 

Data Matrix

Mapping three aspects of each scheme

  • The data flow in each scheme
  • The actors involved
  • The dimensions of each scheme

Data Matrix

Data Flow and Actors Involved

  • Kind of data
  • Nature of Consent
  • Collection
  • Storage
  • Security
  • Analysis
  • Sharing with other agencies
  • Deletion

Data Matrix

Scheme Dimensions

  • Scope of each scheme
  • Objectives
  • What is being quantified?
  • Technology in use
  • Legislation
  • Ministry/Departments

 From Data to Big Data 

 

  • What is publicly available about the data flow of a scheme? What is not? Does this impact rights? 
  • Is it clear whether  big data is being generated or used? 
  • What aspects of the data flow within a scheme drive/ impact inferences drawn?
  • What aspects of the data flow within a scheme have the potential to impact rights? Which rights could be affected? 

Thank you!