Big Data in Indian Governance: Preliminary Findings 

 

A Case Study by

The Centre for Internet & Society

Bangalore, India

 

 

 

 

Preliminary Findings through

  • Literature review
  • Review of news items
  • Review of policy documents
  • Conference participation
  • Review of relevant legislation

Areas of focus in the Literature Review

Benefits of Big Data 

  • Decision-Making

  • Efficiency and Productivity

  • R & D and Innovation

  • Personalisation

  • Transparency

Impact of Big Data on Consent 

For many the principle of consent has become unworkable in an age of pervasive data collection. Specifically within the literature the following problems with the consent have been identified.

  • Cognition
  • Opt-in/out
  • Counter-productive
  • Structural problem

 

 

Impact on Privacy

  • Re-identification

  • Collection Limitation and Data Minimization

  • Purpose Limitation

  • Access and Correction

  • Notice

  • Opt In-Out

  • “Chilling-Effects”

  • Dignitary Harms 

 Impact on Digital Divides 

  • Anti-Competitive

  • Research and Journalism – Big Data can create inequalities in access to data for researchers and journalist leading to inability to replicate/verify findings

  • Global Inequality

 

 Impact on Security 

  • Data Dispersion

  • Honey Pot

  • Size of the data  

Impact on individuals

  • Injudicious or discriminatory outcomes of algorithmic decisions

  • Lack of Transparency around the configuration of algorithms contributes to discriminatory outcomes 

Impact on Epistemological and Methodological Implications 

  • Obfuscation – Key insights can become obscured by the sheer quantity of correlations and relationships identified

  • “Apopheniaa phenomena whereby analysts interpret patterns where none exist, ‘simply because enormous quantities of data can offer connections that radiate in all directions’ (Crawford)

  • From Causality to Correlation

  • “End of Theory”/The Data does NOT speak for itself

 

Big Data and Governance in India: Overview 

 

Big Data in Indian Governance

Big Data is still in nascent phases in India. Ways in which Big Data is being envisioned and used in Governance include: 

  • Informing policy and decisions:

    • Ex: mygov.in is a crowd sourcing platform that leverages big data technologies to gather data from sources such as social media to inform policy decisions. 

  • Improving delivery of government services: 

    • ​EX: The Department of Electronics and Information Technology has published an IoT policy recognizing the importance of Big Data in the delivery of government services.

  • Operationalizing schemes: 
    • Ex: The Aadhaar scheme leverages Big Data technologies to process and handle the size of data that is collected.  

Bodies Driving Big Data 

  • The Department of Science and Technology: Houses a 'Big Data Division' that is focusing on building domestic capacity and skill and driving the use of Big Data in government schemes.  
  • The Department of Electronics and Information Technology: Refers to the potential and importance of 'Big Data' and has sponsored conferences focused on Big Data. 
  • Industry body NASSCOM: Has issued a report on the importance and potential of Big Data in India and is in the process of setting up 20 "Centres of Excellence" for building analytics capacity. 
  • The UIDAI: The Unique Identification Authority if India is leveraging big data technologies within the scheme. According to technical documents published by the UIDAI, Big Data “refers to data that is many orders of magnitude larger than traditional data. This size and nature of data makes the traditional database methodologies and technologies obsolete.”
     

Digital India Overview

  • Digital India is Government of India’s flagship programme to enhance inter-operability of government services and departments and to transform India into a digital knowledge economy.

  • The programme promises transformation by focusing on digital literacy, resources and collaborative digital platforms in Indian languages.

  • The goal is to ensure opportunities for economic security and earn a better living.

  • The initiative will focus on e-Governance policy initiatives by using ICT to offer solutions, digitally empower the citizens and provide government benefits and services transparently.

  • The Nine pillars of Digital India involves use of technology for digital revolution areas like Broadband connectivity, electronic delivery of services by way of e-Kranti initiative, encouraging Open data platform, etc.

  • Other programmes include Digital Locker, eSign framework to enable use of digital signature with the help of Aadhar card, the Digitize India Platform to digitize records of the people for quick delivery of services, etc.

Aadhaar Overview

  • The UID project was conceived by the Planning Commission under the UIDAI (established in the year 2009).

  • The objective of the scheme has been to issue a 12-digit unique identification number by the Unique Identification Authority of India (Aadhar card/number), which can be authenticated and verified online.

  • The purpose of unique identification for each resident in India is to be used for delivery of welfare government services in an efficient and transparent manner, along with using it as a tool to monitor government schemes.
  • It was conceptualized as a platform to facilitate identification and avoid fake identity issues and delivery of government benefits based on the demographic and biometric data available with the Authority.
  • The Aadhaar number forms a crucial part of the vision for the  Digital India programme.

  • Concerns regarding privacy clouded this project due to collection of personal biometric information at such a huge scale,without any legislative backing

100 Smart City Overview

  • In light of the shift towards urban transformation due to massive influx of migrants from  villages in India, the Indian Government envisioned building 100 smart cities across the country.

  • Initially, the Mission aims to cover 100 cities across the country (which have been shortlisted on the basis of a Smart Cities Proposal prepared by every city) and its duration will be five years (FY 2015-16 to FY 2019-20).

  • In August this year, 98 smart cities across India have been unveiled for this Project.

  • The Mission will be operated as a Centrally Sponsored Scheme (CSS) and the Central Government proposes to give financial support to the Mission to the extent of Rs. 48,000 crores over five years (on an average Rs. 100 crore per city per year).

  • Smart Cities Mission aims to drive economic growth in the country, make the cities livable, inclusive and improve the quality of life.

  • Big data and analytics will play a predominant role in such transformation by way of cloud , mobile technology and other social technologies.

Overview of Promises and Objectives  

  • 18 Efficient service delivery
  • 16 Accessibility
  • 13 Integration and data consolidation
  • 11 Automation and Monitoring
  • 8 Transparency and Accountability
  • 7 Interoperability and common standards
  • 2 Political and social empowerment 
  • 2 Reduction of fraud 
  • 2 Data driven decision making   
  • 2 Conclusiveness
  • 1 Digital Security 
  • 1 Universal Identity  
  • 1 Financial inclusion  

Overview of Assumptions 

Assumptions that we have seen about big data include: 

  • The data ecosystem in a scheme is accurate and thus the results are accurate   
  • Data driven decisions will allow for targeted and accurate implementation
  • Data driven decisions will save money 
  • Data driven will decisions enable optimal choices for departments and individuals 

Overview of potential challenges to Big Data in Indian Governance

 

  • Veracity of the data
  • Fluidity of the data 
  • Lack of digitization of existing data 
  • Lack of IT infrastructure for some areas of the country
  • Lack of interoperable standards 
  • Lack of a harmonized privacy legislation that is applicable to public and private bodies resulting in a wide range of data practices.

Data Flow Examples 

Data sources: Aadhaar 

 

  • Data is collected from individuals at the point of enrollment.
  • Transaction data is collected each time the number is used to authenticate.
  • The UID potentially has access to data in databases that the UID number is seeded with. The UID leverages data from third party sources such as the census.  

Consent: Aadhaar

The consent taken at the time of enrollment allows the UIDAI to share provided information 

Upon enrolling for an Aadhaar number, individuals have the option of consenting to the UIDAI sharing information in three instances:

  • I have no objection to the UIDAI sharing information provided by me to the UIDAI with agencies engaged in delivery of welfare services.”

  • I want the UIDAI to facilitate opening of a new Bank/Post Office Account linked to my Aadhaar Number. I have no objection to sharing my information for this purpose”

  • I have no objection to linking my present bank account provided here to my Aadhaar number”

Sharing: Aadhaar seeding 

  • In the UID scheme, data points within databases of service providers and banks are being organized via individual Aadhaar numbers through a process known as ‘seeding’. 

  • The process of 'seeding' Aadhaar is meant to facilitate extraction, consolidation, normalization and matching of data so it can be queried by Aadhaar number.
  • The seeding process itself can be done through manual/organic processes or algorithmic/in-organic processes.
  • To facilitate the seeding process, the UIDAI has developed an in house software known as Ginger. Service providers that adopt the Aadhaar number must move their existing databases onto the Ginger platform, which then organizes the present and incoming data in the database by individual Aadhaar numbers.

Collection: Aadhaar 

  • Proactive 
  • Officially voluntary, but mandatory by default 
  • Bullet Two
  • Bullet Three

Data Ownership and liability 

  • Bullet One
  • Bullet Two
  • Bullet Three

Type of Data: Aadhaar 

  • Bullet One
  • Bullet Two
  • Bullet Three

Veracity of Data 

  • Bullet One
  • Bullet Two
  • Bullet Three

Storage 

  • Bullet One
  • Bullet Two
  • Bullet Three

Analysis 

  • Bullet One
  • Bullet Two
  • Bullet Three

Title Text

  • In the context of the government linking data, such “relating” can be useful - enabling the government to visualize a holistic and more accurate data and to develop data informed policies.
  • Yet, allowing for disparate data points to be merged and linked to each other raises questions about privacy and civil liberties - as well as more intrinsic questions about purpose, access,  consent and choice.

Size of information collected 

  • Aadhaar:  As of 30th October, 2015 UIDAI has generated more than 92.68 crore Aadhaar numbers
  • Digital India **
  • Smart Cities **

Examples of Initial Gaps in the data flow 

  • Collection
    • Meta Data: For ex: Aadhaar has unclear generation and use of metadata generated from transactions 
    • Proof of concept studies: For ex: Aadhaar has unclear distinction between data collected in a pilot and data collected during actual implementation
    • Secondary sources: For ex: Aadhaar states that it uses secondary sources but it is unclear if and which secondary sources data is collected from​
  • ​Ownership 
    • ​Public vs. private: In public private partnerships - unclear who owns, has access to, and is liable for the data collected. 
  • Storage: 
    • Format of data: Unclear the format of data stored 
    • Who stores the data: 5 out of 34 schemes publicly display with whom data is stored with. 

Policy and Big Data 

Digital India Policy Ecosystem 

Amber to condense into numbers the different policy involved in the schemes

Schemes and Privacy Policies 

  • Digital India: 17 out of 34 initiatives have clear privacy policies on their websites. Some policies provide that information collected can be divulged to any governmental organisations or law enforcement agencies.
  • Smart Cities: Smart Cities policy documents do not refer to the need to develop privacy policies. 
  • Aadhaar: The UID has organizational privacy policies and recommendations of privacy standards for enrolling agencies.  The court has established that no biometric data can be shared except on receipt of a court order. 

Title Text

  • Bullet One
  • Bullet Two
  • Bullet Three

43A and Big Data 

Areas in which India's current data protection standards would not be adequate in a 'big data' scenario include:

  • Scope 
  • Definition of PI and SPI 
  • Consent 
  • Notice of collection 
  • Access and correction 
  • Security 
  • Data Breach 
  • Opt in and out 
  • Disclosure of Information 
  • Privacy Policy 
  • Remedy 

Big Data and Potential Legal Hurdles 

There are potential legal hurdles with the collection and use of different types of digital data. For example

  • s.69 of the IT Act and access to GPS data for smart city traffic management
  • Anti-competition law? 
  • Health Regulations
  • Financial Regulations? 

Public Dialogue 

Smart Cities 

  • The timeline for the implementation the smart city initiative is too fast for what it seeks to achieve

  •  In the smart city scheme, technology is being relied upon to 'smooth over' city level problems.

  • The Smart City initiative assumes that the technology is neutral and the reality of urban data politics are not being considered

  • The Smart City initiative raises questions of socio-spatial consequences are raised by the S

  • The smart city initiative has not considered the need for interoperable standards 

  • There is a lack of inter-departmental and organizational cooperation, which is needed

  • Smart cities risk exclusion and marginalization

  • Smart Cities are an example of a western practice being imposed in the Indian context

  • Smart Cities represents top down application of technology 

  • Smart Cities bring together open data and big data 

Aadhaar 

  • Aadhaar can enable function creep and convergence 
  • Aadhaar could be used to profile or surveil individuals 

Digital India 

  • Bullet One
  • Bullet Two
  • Bullet Three

Initial

Observations 

  • Transparency of data flow is critical: For citizens to understand how and in what way their data is being used within a scheme, beyond individual notice of collection through a privacy policy, a comprehensive data flow available to the citizen is critical.  

  • Public dialogue is rights based: The initial public dialogue in India appears to have raised concerns of privacy, surveillance, convergence, marginalization, discrimination, and equality that could come out of these projects - but have not raised concerns of anti-competitive practices. 

  • Lack of legal framework for use and re-use: The use and re-use of data for governance purposes is not always being collected within a legal 

  • Data is being equated as the truth and services are creating project specific ecosystems of 'truth': For example, the UIDAI has set up a web enabled Analytics portal which functions as a common data source and serves as a 'single source of truth for the organization'

  • Big Data in governance requires public private partnerships. This complicates issues of liability and data ownership and creates a 'black box' around data practices of both the government and private companies

Initial Observations 

  • New schemes that leverage ICT to deliver government services are replacing schemes backed by rights based legislation. 
  • Characteristics associated to Big Data - such as convergence - enables schemes to expand in scope: For ex -  there is a question of the scope of the UIDAI’s mandate and the role that seeding plays in fulfilling this i.e is the number an authentication of identity mechanism or is it to provide authentication of eligibility the latter for which seeding is actually necessary. 
  • The vision of the UID represents a new venture in governance for India – not only re-imagining and re-vamping how identity is issued and managed (moving from a disaggregated system of multiple identifiers and identity databases to a universal identifier) but establishing a new process - one that is data driven – for issuing a universal identity and serving as a road map for other data based governance initiatives in India.  

  • The lack of a harmonized privacy legislation results in adhoc standards developed partially by jurisprudence and not clearly adhered to or enforced. 

Questions and policy windows we are still pursuing 

  • Bullet One

  • Bullet Two

  • Bullet Three

Potential Research Methods 

  • Bullet One
  • Bullet Two
  • Bullet Three

Thank you! 

Big Data in Indian Governance. Preliminary Findings

By Elonnai Hickok

Big Data in Indian Governance. Preliminary Findings

  • 1,219