Big Data in

Indian Governance:

Preliminary Findings

A Case Study by

The Centre for Internet & Society

Bangalore, India





Elonnai Hickok . Amber Sinha . Vanya Rakesh . Scott Mason . Vipul Kharbanda . Sunil Abraham

Images by Pooja Saxena


  • Literature review
  • Review of news items
  • Review of policy documents 
  • Conference participation
  • Review of relevant legislation
  • Partial mapping of data flow 

Bodies Driving Big Data 

  • The Department of Science and Technology: Houses a 'Big Data Division' that is focusing on building domestic capacity/skill and driving the use of Big Data in government schemes.  
  • The Department of Electronics and Information Technology: The Department has published a draft IoT Policy which references the importance of Big Data. The Department has sponsored conferences focused on Big Data.  
  • NASSCOM: Industry body, has issued a report on Big Data in India and is in the process of setting up 20 "Centres of Excellence" for building analytics capacity. 

Big Data in Indian Governance

Big Data is still in nascent phases in India. Ways in which Big Data is being envisioned and used in Governance include: 

  • Informing policy and decisions: Ex - is a crowd sourcing platform for citizen opinion and input that leverages Big Data technologies to inform policy decisions. 

  • Improving delivery of government services: ​Ex - Aadhaar and Digital India seek to improve the delivery of government services through means such as reducing fraud, making delivery more efficient, etc. ​

Overview of Assumptions 

Assumptions that have been made about big data from policy documents, news items, and conference presentations include: 

  • The data ecosystem in a scheme is accurate and thus the results are accurate.  
  • Data driven decisions will allow for targeted and accurate implementation.
  • Data driven decisions will enable optimal choices for departments and individuals.
  • There will be no bad faith actions by any actor and no unintended consequences.
  • The algorithms used for data mining and to arrive at decision-making are neutral and will not lead to discrimination

Overview of potential challenges of using Big Data in India 

  • Veracity of data and collection errors 
  • Fluidity of data 
  • Incomplete data 
  • Lack of digitization of existing data 
  • Lack of IT infrastructure for some areas of the country
  • Lack of interoperable standards 
  • Absence of a harmonized privacy legislation that is applicable to public and private bodies
  • No semantic uniformity between government departments
  • Technology challenges related to indic languages
  • Lack of machine readability of data 
  • Lack of training and education of people (including government officials) in rural areas to use technology services that will collect data

Big Data and Governance in India: Overview 

Digital India Overview

  • Digital India is Government of India’s flagship programme to enhance inter-operability of government services and departments and to transform India into a digital knowledge economy.

  • The initiative focuses on e-Governance policy initiatives by using ICT to offer solutions, digitally empower  citizens and provide government benefits and services transparently.

  • The Nine pillars of Digital India involve the use of technology in areas like Broadband connectivity, electronic delivery of services by way of e-Kranti initiative, encouraging Open data platform, etc. Other programmes include Digital Locker, eSign, and 

  • Each scheme in Digital India has its own corresponding policy and/or legislation. The 33 schemes are backed by 46 different policies and legislation. 
  • 23 additional schemes have been announced but we do not have sufficient information to include them in the study yet

 Digital India and Big Data 

  • Third party sources have referenced the use of Big Data in Digital India schemes such as
  • Broadly Digital India seeks to establish schemes that can contribute to a quantified society. 
  • Aspects of the data flow in various schemes could potentially enable the use and generation of Big Data. 
  • The size of data varies across schemes. Most initiatives also plan to digitize and leverage the information collected by analog means in past decades. For instance, under IncomeTaxIndia, over 12 crore PAN numbers are registered and 5.7 crore passport holders are registered under Passport Seva.

Digital India: Data Flow 

  • Data Sources & Access: In process of researching
  • Consent: All 33 schemes take consent, but the form and comprehensiveness of this consent varies. 
  • Collection & Type of data: SPI and PI is collected by most schemes directly from the individual. 
  • Storage: All schemes are silent on the duration and format for which data is stored. 
  • Analysis and Use: Only 2 schemes state that data may be re-used. 
  • Sharing: 20 out of 33 initiatives have clear privacy policies on their websites. Some policies provide that information collected can be divulged to any governmental organisations or law enforcement agencies. 
  • Security: In process of researching
  • Deletion: All schemes are silent on deletion
  • Data ownership: 22 schemes specify that ownership of the data is with the individual. 7 schemes specify that ownership of the data is with the government


Data Matrix

Mapping three aspects of each scheme

  • The data flow in each scheme
  • The actors involved
  • The dimensions of each scheme

Data Matrix

Data Flow and Actors Involved

  • Kind of data
  • Nature of Consent
  • Collection
  • Storage
  • Security
  • Analysis
  • Sharing with other agencies
  • Deletion

Data Matrix

Scheme Dimensions

  • Scope of each scheme
  • Objectives
  • What is being quantified?
  • Technology in use
  • Legislation
  • Ministry/Departments

Digital India Public Dialogue - Benefits

  • Cloud computing will be essential in growing India's economy
  • Cloud computing will enable access to digital government resources and will serve as a platform for rapid digital transformation
  • Digital India is a part of a trend in governance which focuses on creating platforms and large datasets based on the collection and analysis of large amounts of data. 

Digital India Public Dialogue - Harms


  • There is a gap between the objectives of Digital India,  and the actual solutions that will meet such objectives 
  • Modernizing of Government data centres is being done by commercial entities - particularly foreign entities 

  • The lack of interoperability between data sets will stand as a barrier to the success of Digital India and should be a national priority 

Aadhaar Overview

  • The UID project was conceived by the Planning Commission under the UIDAI (established in the year 2009).

  • The objective of the scheme has been to issue a 12-digit unique identification number by the Unique Identification Authority of India (Aadhar card/number) to residents, which can be authenticated and verified online.

  • The purpose of unique identification for each resident in India is its subsequent use for delivery of welfare government services in an efficient and transparent manner, along with using it as a tool to monitor government schemes.
  • It has been conceptualized as a platform to facilitate identification and avoid fake identity issues and delivery of government benefits based on the demographic and biometric data available with the Authority.
  • The Aadhaar number forms a crucial part of the vision for the  Digital India programme.

  • A draft legislation has been proposed, but not passed. The UIDAI has organizational policies, MOUs, and recommendary policies in place. 

 Aadhaar and Big Data

  • Big Data technologies, such as Hadoop, are being employed by the UIDAI to operationalize the scheme.
  • Due to the fact that the UID seeks to be a universal identifier for transactions, the use of the Aadhaar number can generate 'Big Data'. 
  • The UIDAI has set up a web enabled Analytics portal which functions as a common data source and serves as a 'single source of truth for the organization'.
  • The UIDAI has provided a definition of 'Big Data' and according to technical documents published by the UIDAI, Big Data “refers to data that is many orders of magnitude larger than traditional data. This size and nature of data makes the traditional database methodologies and technologies obsolete.”
  • As of 30th October, 2015, UIDAI has generated more than 92.68 crore Aadhaar numbers.

Aadhaar Data Flow 

  • Data Sources & Access: The UID seeks to be a universal identifier and collect inputs from every resident in India. 
  • Consent: The form of consent taken in Aadhaar can enable broad sharing of collected data and potential re-use. 
  • Collection and Type of dataPI and SPI is collected directly from individuals at the time of enrollment and meta data is collected from each transaction. 
  • Storage: Data is stored in a centralized database (CIDR). It is unclear if this database stores the meta data of each transaction. It is unclear if and how private organizations involved in the scheme store data. 
  • Analysis: Data generated by the use of the Aadhaar number could be used for purposes beyond authenticating individuals. 
  • Sharing: The seeding of Aadhaar numbers into service delivery databases could enable convergence and re-use. During authentication, only a yes or no response is shared. The courts clarified that data can only be shared on receipt of a court order. 
  • Security: Data is encrypted during transmission and storage. 
  • Deletion: Unclear 
  • Data Ownership: The Aadhaar scheme relies on private organizations for implementing many aspects of the scheme. It is unclear who owns the data collected and collected/processed with technologies developed  by private organizations.

Aadhaar Public Dialogue - Benefits  

  • Aadhaar can transform the way the government carries out its duties
  • Systematic reforms can be completed via Aadhaar
  • Aadhaar is a good example of applying Big Data technology to address problems associated with identity systems
  • Aadhaar can act as single identifier across range of services

Aadhaar Public Dialogue - Harms  

  • Aadhaar could be used to profile or surveil individuals 
  • Aadhaar will only serve as another form of identity to India's many identifiers 
  • Aadhaar can enable function creep and convergence 
  • Questions about ownership of data being collected and lack of clear information about the Public Private Partnership.
  • The involvement of foreign firms in the development and implementation of the scheme raises questions of national security 
  • UIDAI may face operational challenges and issues with the accuracy of data 



100 Smart Cities Overview

In light of the shift towards urban transformation due to massive influx of migrants from  villages in India, the Indian Government envisioned building 100 smart cities across the country over the span of five years..

  • In August this year, 98 smart cities across India have been unveiled for this Project.

  • The Mission will be operated as a Centrally Sponsored Scheme (CSS) and the Central Government proposes to give financial support to the Mission to the extent of Rs. 48,000 crores over five years (on an average Rs. 100 crore per city per year).

  • Smart Cities Mission aims to drive economic growth in the country, make the cities livable, inclusive and improve the quality of life.

  • No specific enabling legislation or policy has been proposed but a draft concept note and a mission document have been published. 

Smart Cities and Big Data 

  • Data and analytics will play a predominant role in such transformation by way of cloud , mobile technology and other social technologies and such technologies can contribute to a quantified society. 
  • Draft concept note and the mission statement and guidelines for smart cities by the Ministry of Urban Development does not explicitly reference Big Data or privacy.

Smart Cities Public Dialogue - Benefits

  • In the Smart City scheme, technology is being relied upon to 'smooth over' city level problems.

  • Smart Cities bring together open data and Big Data.

  • Smart government by enabling e-governance, improved models for future development, better decision-making, efficient service delivery, and making the government more transparent, participatory and accountable.

  • Smart people by creating a more informed citizenry and fostering creativity, inclusivity, empowerment and participation.

Smart Cities Public Dialogue - Harms

  • The timeline for the implementation the Smart City initiative is too short for what it seeks to achieve

  • The Smart City initiative assumes that the technology is neutral and the reality of urban data politics are not being considered

  • The Smart City initiative raises questions of socio-spatial consequences

  • The Smart City initiative has not considered the need for interoperable standards 

  • There is a lack of inter-departmental and organizational cooperation, which is needed

  • Smart Cities risk exclusion and marginalization

  • Smart Cities are an example of a western practice being imposed in the Indian context

  • Smart Cities represent a top-down application of technology 

Overview of Objectives 

  • 19 Efficient service delivery
  • 16 Accessibility
  • 19 Integration and data consolidation
  • 13 Automation and Monitoring
  • 8 Transparency and Accountability
  • 6 Interoperability and common standards
  • 2 Political and social empowerment 
  • 2 Reduction of fraud 
  • 5 Data driven decision making   
  • 2 Conclusiveness
  • 1 Digital Security 
  • 1 Universal Identity  
  • 1 Financial inclusion  

Stated Objectives

Schemes & Objectives

Policy and Big Data 

Schemes and Privacy Policies

Schemes and Privacy Policies

Schemes and Privacy Policies

Schemes and Privacy Policies

43A and Big Data​ 

Areas in which India's current data protection standards could be insufficient in a 'Big Data' scenario include:

  • Scope: Limited to only body corporate
  • Definition of PI and SPI: Limited to a defined list
  • Consent: Required to be in writing  
  • Notice of collection: Not practical when multiple businesses are involved
  • Access and correction: Difficult to implement in the context of Big Data 
  • Purpose Limitation: Overlooks the potential re-use of information 
  • Security: Does not address device security 
  • Data Breach: Does not address large scale data breaches 
  • Opt in and out: Not feasible in the context of Big Data
  • Disclosure of Information: Does not address sharing through networked devices and does not address sharing of anonymized or aggregated data. 
  • Privacy Policy: Requires only a single overarching policy on the website and does not 'follow data'  
  • Remedy: Hinges on whether reasonable security practices were followed 

Big Data and Potential Legal Hurdles in India

There are potential legal hurdles in the collection and use of different types of digital data. For example

  • IT Act 
  • Health Regulations
  • Financial Regulations

Literature Review 

Benefits of Big Data 


  • Decision-Making - Big Data is providing governments and businesses with unprecedented opportunities to create new insights and solutions; becoming more responsive to new opportunities and better able to act quickly - and in some cases preemptively – to deal with emerging threats.

  • Efficiency and Productivity - By providing the information and analysis needed for organisations to better manage and coordinate their operations; Big Data can help to reduce waste, leading to the better utilization of scarce resources and a more productive workforce. (Kshetri, 2014)

  • R & D and Innovation - Big Data can help businesses to gain an understanding of how others perceive their products or identify customer demand and adapt their marketing or indeed the design of their products accordingly. (Tucker and Welford, 2014)

  • Personalisation - By enabling companies to generate in-depth profiles of their customers, Big Data allows businesses to quickly and cost-effectively adapt their services to better meet customer demands. (Tucker and Welford, 2014)

  • Transparency - Advances in data analytics can give consumers and citizens the knowledge to hold governments and businesses to account, as well as make more informed choices about the products and services they use. ( Brown, Chui, and Manyika, 2011)


Impact of Big Data on Privacy

  • Re-identification – Big Data potentially allows for the re-identification of anonymized user data by cross-referencing multiple datasets. (Tene and Polonetsky, 2013)

  • Collection Limitation and Data Minimization – The proliferation of internet enabled devices as well as Big Data’s inherent need to collect as much data as possible, is making these principles of privacy obsolete (Barocas and Selbst, 2015)

  • Purpose Limitation – Big Data increasingly requires data to be processed several times for a variety of different purposes undermining this principle of privacy. (Article 29 Data Protection Working Party, 2014)

  • Access and Correction – The real-time generation and analysis of Big Data is challenging the principles of user access and correction. (Article 29 Data Protection Working Party, 2014)

  • Notice – As a result of Big Data practices relying on vast amounts of data from numerous sources and the re-use of that data - the principle of notice is changing. (Tene and Polonetsky, 2013)

  • Opt In-Out – The proliferation of internet-enabled devices, their integration into the built environment and the real-time nature of data collection and analysis means that the opting out of data collection is becoming more difficult. (Oxford Internet Institute, 2015)

  • Chilling-Effects” – the normalization of large scale data collection risks producing a widespread perception of ubiquitous surveillance, thereby generating so-called ‘chilling effects’ on user’s behavior and free speech. (Matthews and Tucker, 2015)

  • Dignitary Harms – the automated nature of Big Data analytics possess the potential to reveal personal or sensitive insights about users.


Impact of Big Data on Consent 

For many the principle of consent has become unworkable in an age of pervasive data collection. Specifically within the literature the following problems with the consent have been identified.


  • Cognition

    • Failure to read/access terms of use policies (inaccessible, click-through etc.)

    • Failure to understand terms of use policies (illiteracy, complexity of legal terminology etc.)

    • Failure to fully anticipate or comprehend the potential long-term consequences of providing consent.

  • Opt-in/out

    • Binary nature of consent

    • Effectiveness of opt-out?

  • Structural Problems

    • Scale (data minimization, collection limitation)

    • Aggregation

    • Purpose Limitation

  • Counter-productive?

 Impact on Digital Divides 


  • Anti-Competitive – The inevitable inequalities in access to user data between start-ups and large well established companies risks leading to a reduction in competition. (Newman, 2014)

  • Research – Big Data can create inequalities in access to data for researchers and journalist leading to the inability to replicate experiments or verify findings. (Boyd and Crawford, 2012)

  • Global Inequality – lower levels of connectivity, poor information infrastructure, under-investment in information technologies and a lack of skills make it far more difficult for the developing world to fully reap the rewards of Big Data, thereby potentially deepening global economic inequality.


 Impact on Security 

  • Data Dispersion – The duplication and dispersion of data across many different data repositories in order to optimize query processing, makes it more difficult for organizations to locate and secure all items of confidential information.

  • Honey Pot – The larger the quantities of confidential information stored by companies on their databases the more attractive those databases appear to potential hackers



Impact on individuals or groups


  • Injudicious or discriminatory outcomes - faults in the programming of Big Data algorithms or discriminatory assessment criteria can have potentially discriminatory effects, reinforcing existing social inequalities. (Robinson and Yu, 2014)

  • Lack of Transparency - Given their importance algorithms are closely guarded by companies and often classified as trade secrets, meaning there is very little transparency or accountability regarding chronic lack of accountability and transparency in terms of how Big Data algorithms are programmed or what criteria are used to determine outcomes. (Barocas and Selbst, 2015)


Epistemological and Methodological Implications 


  • Obfuscation – The sheer quantity of correlations and insights identifiable within data sets can sometimes risk obscuring key insights. (Boyd and Crawford, 2012)

  • Apophenia” – a phenomena whereby analysts interpret patterns where none exist, ‘simply because enormous quantities of data can offer connections that radiate in all directions’ (Boyd and Crawford, 2012)

  • From Causality to Correlation – Big Data’s emphasis on correlative analysis risks leading to an abandonment of the pursuit of causal knowledge in favour of shallow descriptive accounts of scientific phenomena (Boyd and Crawford, 2012)

  • End of Theory”?/The Data does NOT speak for itself – suggestions that ‘the data speaks for itself’ neglects domain specific knowledge leading to interpretations which fail to embedded the results within wider scientific debates or knowledge. (Kitchen, 2015)

  • N=all’ – Whilst Big Data may seem to be exhaustive in its scope, it can be considered to be so only in relation to the particular ontological and methodological framework chosen by the researcher. No data set however large can fully account for all information relevant to a given phenomenon, in particular unquantifiable and undatafiable variables.

  • Correlation is enough’ – For many the use of Big Data analytics signals a worrying transition from deductive to inductive reasoning. Although Big Data can demonstrate interesting correlations these patterns alone are not enough to provide an explanatory account of the phenomena.

  • The Data speaks for itself’ – Despite claims that Big Data can be interpreted by anyone and that all correlations are inherently meaningful, without domain experts to contextualise the results the predictive and explanatory utility of big data is nonetheless limited, and can sometimes lead to spurious conclusions.

Initial Observations

Initial Observations 

  • Beyond privacy policies, transparency of data flow is critical.

  • For some schemes there is lack of legal framework for collection and use of data.

  • Broadly data is being equated as truth and seen as the solution

  • Public private partnerships complicate issues of liability and data ownership.

  • Public Private partnerships create a 'black box' around data practices of both the government and private companies.

  • Because India's data protection standards do not apply to the public sector, transparency of the public private relationship is critical. 

  • Some schemes that leverage ICT to deliver government services are replacing or superseding schemes backed by rights based legislation. 
  • The way in which consent is taken and data is shared is enabling governance schemes to expand in scope

Initial Observations 


  • All initiatives under Digital India are silent on the issue of re-use of data collected. It is however, clear from the objectives of a few initiatives that data is intended to be shared and re-used
  • In the absence of laws on data minimization and purpose limitation, this gap allows data to be used indiscriminately for any purpose
  • Of the 34 initiatives, 31 initiatives are engaged in data collection. Of these 31 initiatives, 22 have been implemented fully or partially. None of these 22 initiatives have mechanisms provided for deletion of personal record by individuals
  • Of the 34 initiatives covered, 12 initiatives allow for some updation of data collected. In most cases, this updation facility is provided for only in cases where it is required for monitoring and governance

Research methods, questions, and policy windows to be pursued 

  • Creating comprehensive 'dataflows' for each scheme 
  • Positive and negative mapping of how a citizen is impacted if they do or do not engage with a scheme. 
  • Big Data use by private players and how it could influence public policy.  
  • Outreach to relevant departments 
  • Interviews with experts    




  • Tene, O., &Polonetsky, J. Big Data for All: Privacy and User Control in the Age of Analytics, 11 Nw. J. Tech. &Intell. Prop. 239 (2013)
  • Tucker, Darren S., & Wellford, Hill B., Big Mistakes Regarding Big Data, Antitrust Source, American Bar Association, (2014). Available at SSRN:
























  • pg.15













  • Usha Ramanthan. Decoding the Aadhaar judgment: No more seeding, not till the privacy issues is settled by the court. The Indian Express. August 12th 2015. Available at:

  • UIDAI. Approach Document for Aadhaar Seeding in Service Delivery Databases. Version 1.0. Available at:

  • UIDAI. Standard Protocol Covering the Approach & Process for Seeding Aadhaar Numbers in Service Delivery Databases. Available at:



  • Amy Liu and Robert Puentes, Delivering on the Promise of India’s Smart Cities, January 2015, Available at :

  • 2014 revision of the World Urbanization Prospects, United Nations, Department of Economic and Social Affairs, July 2014,  Available at :

  • NICO TILLIE AND ROLAND VAN DER HEIJDEN, Rotterdam's Smart City Planner: Using Local and Global Data to Drive Performance,March 2015, Available at :

  • IBM Smarter Cities,

  • UN Data Revolution Report,





  • Report by NASSCOM and Accenture, Integrated ICT and Geospatial Technologies Framework for 100 Smart Cities Mission, The report is available at the following link:

  • Anant Maringanti, Partha Mukhopadhyay, Data, Urbanisation and the City, Economic & Political Weekly EPW may 30, 2015 vol l no 22, Available at :,_Partha_Mukhopadhyay.pdf

  • PARTHA MUKHOPADHYAY,The un-smart city,

  • Economic Times, Modi government announces 98 smart cities; UP gets maximum number at 13, Aug 27, 2015, Availabe at :

  • Rex Dong, Building smarter cities with data, September 18, 2015, Available at :

  • Wayne Rash, Smart Cities Require IoT Data to Boost Efficiency, Sustainability, September 14, 2015, Available at : Smart Cities Require IoT Data to Boost Efficiency, Sustainability

  • Bernard Marr, How Big Data And The Internet Of Things Create Smarter Cities,MAY 19, 2015 , Available at :

  • Singapore Business News, Making smart cities safer, 29 September 2015, Available at :

  • BW Smart Cities, Protect the Connected – Smart Cities, Data Analytics and Privacy in India,March 27, 2015 , Available at:

  • Sandeep Singh, Smart Cities: Governance first, Jun 27, 2015, Available at :




  • Shubhendu Parth, Internet of Things has a large role to play in smart cities,October 29, 2014, Available at:

  • Mathew Idiculla, Crafting “smart cities”: India’s new urban vision,22 August 2014, Available at :

  • The IT Law Community, The Promise and Perils of Smart Cities, Available at :

  • Ramesh Mamgain, The road to Smart Cities: Data management,August 21, 2015, Available at :

  • Rob Kitchin, Data-driven, networked urbanism,10th August, 2015, Available at :

  • Ellis Booker, Cities get smart with Big Data, September 25, 2014, Available at :

  • Jonathan Bright, How big data is breathing new life into the smart cities concept, July 23, 2015, Available at :

  • Open Source Consortium for Smart Cities India ,Available at

  • Devika Kohli, How Smart Cities Will Force The Poor Out, Jul 06, 2015, Available at :

  • Urban planner: 'Smart cities' are problematic, Available at :

  • Nalaka Gunawardene,Big data can make South Asian cities smarter,March 31, 2015, Available at :



  • Smart Cities, Mission Statement and Guidelines, Ministry of Urban Development, Government of India, June 2015, Available at :

  •, Smart Cities Mission: A step towards Smart India, Available at :

  • Draft Policy on Internet of Things-2015, Department of Electronics & Information Technology(DeitY) Ministry of Communication and Information Technology Government of India,

Thank you! 

Made with