Researching Big Data in 

Indian Governance

 

A Case Study by

The Centre for Internet & Society

Bangalore, India

 

 

 

 

Research Objectives

  • Explore methodologies for researching Big Data - a nascent topic in India.
  • Explore the reality and potentiality of Big Data in governance in India
  • Explore the potential harms and benefits of  Big Data in India and provide recommendations for anticipatory regulation.   

 

 

Scope

The use of Big Data in Governance in India is still emerging  The stage at which 'Big Data' is relevant differs. For some, 'Big Data' is a tool that has been recognized as being important. For others, 'Big Data' anayltics and techniques are a means to operationalize a scheme. While some schemes have not explicitly cited 'Big Data' but are structured in such a way that the use or generation of 'Big Data' is a potential.

Based on either the potential use or generation of Big Data or a public statement on the use or generation of Big Data - the case study focuses on the following: 

  • The Aadhaar scheme:  Implemented since 2010, a 12 digit identity number based on biometrics that can be used to authenticate individuals for delivery of PDS. 
  • Digital India: In the process of implementation, a governance campaign comprised of 9 pillars to transform India's economy and empower the citizen.
  • Smart Cities: In the process of conceptualization, a five year project to develop 100 smart cities across India to drive economic growth and improve quality of life. 

Research Questions

  • What are the objectives/promises of a scheme?
  • Are there assumptions made within a scheme?  
  • How does Big Data pertain to a scheme? Is it generated or used or both? 
  • What is the data flow within a scheme? Specifically, where and how is information collected? what is the mechanism for consent? and how is information shared or disclosed? 
  • What has been the public dialogue around a scheme? 
  • What are the applicable legislation/policy? 
  • Does the government department host a privacy policy for the scheme? 
  • Are private companies involved in implementing the scheme? If yes, are these foreign or domestic? If yes, is there a clear data policy for these organizations? 
  • Algorithmic decision making - Issues
  • Use and reuse

Research Methodology

  • Literature review 
  • Analysis of media reports, government notes, press releases, legislation, conference inputs, contracts, tenders, and policy 
  • Interviews with experts and site visits 
  • Review of government websites
  • Right to Information Requests 

Data Flow 

Qualifying Big Data for the Case Study 

Self Identified: Scheme policy documents describe the use of Big Data analytics and techniques. 

Publicly Identified: Described in publicly available third party sources as a scheme using Big Data or as being a critical component of the scheme. 

Potentially Identified: Consent mechanism, infrastructure, size of population serviced, and sharing of data or more generally schemes that will enable a quantified society. 

Data Flow Research 

Towards this the case study seeks to identify the following in the context of Big Data and Governance in India:

  • Access and consent

  • Generation and analysis

  • potential and present uses and reuse 

  • promises and assumptions

  • policy implications

  • public perception 

  • potential impact on citizens, society, and governance 

  • potential regulatory interventions and solutions

The Importance of Data Flow 

Mapping out the flow of data in each scheme is important in understanding:

  • If big data is or potentially being generated and used. 
  • Where 'data gaps' could be in the research i.e what are areas that are opaque or not accessible. 
  • Identify good data flow or poor/inadequate data flow processes and identify potential benefits or harms.

Data sources

  • Where is data being collected from? 

Consent

Consent is a mechanism that can indicate indiscriminate sharing and re-purposing of the data that might happen  while the lack of consent or minimal consent can also be an indicator of inadequate policy. 

The form of consent taken in different schemes varies and can include:

  • Explicit: Consent is taken for each collection and/or use
  • Implicit: Consent is understood to be given when entering a space or engaging with a service.
  • Generic Consent is taken for initial collection, but consent for future uses is not taken. 

Collection

The way the data is collected could have a bearing on the size of the data that is collected, how it can be analysed, shared and used. Whether the provision of data is mandatory, voluntary or quasi thereof also raises questions of citizens rights and agency in the use and re-use of the data.  

Proactive
Reactive
Ongoing
 Mandatory
 Voluntary
Quasi

Data Ownership and liability 

Data ownership is important in identifying forms of redress available to the individual and the liability of those collecting and using the data. 

  •  Big Data can complicate the issue of data ownership as data changes hands and new insights are derived.
  • In governance, the involvement of public private partnerships can complicate the question of data ownership and liability.
  • In a context like India, where data protection standards do not clearly extend to public bodies, questions of ownership and liability are important in understanding the rights of individuals in relation to their data. 

Type of data

The type of data collected and the source of that data is important in understanding the potential implications for individuals rights including privacy as it is used and re-used

  • Data or metadata
  • Quantitative or Qualitative
  • Primary or secondary
  • Direct or indirect

Veracity of the Data 

  • Bullet One
  • Bullet Two
  • Bullet Three

Storage

Aspects of the storage of data can impact citizens privacy 

  • Duration 
  • Format 
  • Security 

Analysis

Analysis can impact privacy, discrimination, and marginalization 

  • Method used to reach a conclusion 
  • Data used to reach a conclusion
  • Use of such conclusion 

Sharing 

The way in which data is shared and retrieved can result in convergence 

  • Seeding  
  • Merging 
  • One time disclosure 

Use

  • Limited use
  • Re-purposed 

Deletion

  • By the individual
  • By the department 
  • Completely deleted  
  • Partially deleted 

Data updation

  • Frequency 
  • Source and veracity of data 
  • Bullet Two
  • Bullet Three

Pivots for Governance

  • Bullet One
  • Bullet Two
  • Bullet Three

Size of information collected 

Text

Policy and Big Data 

Potential Research Methods 

  • Bullet One
  • Bullet Two
  • Bullet Three

Thank you! 

Researching Big Data in Indian Governance

By Elonnai Hickok

Researching Big Data in Indian Governance

  • 1,469