As UK Gov’s promise to release the contact tracing app to the population is upon us, the conundrum about its capabilities, security & privacy preserving and value in addressing pandemic is still unclear. Whilst startups are racing to build a digital ID apps and systems, aka ”immunity passports”, to help with verification of the health status of individuals, it is unclear how and if the data ethics (i.e. acquire, store, process and consume data) fundamentals, data protection and privacy will be enforced and adhered to. Have they agreed on the actual common interoperability standard for these apps? Are the proposed solutions compatible and can allow health status verification and tracking even though they are provided by different vendors? Will it work beyond the UK borders and assist travellers? These concerns are paramount for users who contract the virus, as they are generally required to release all their data.
Practicality of the solution (a wet dream for intelligence agencies)
The response from the science and technology communities for pseudonymous contact tracing has been overwhelming. A consortium of academics and business stakeholders have been collaborating to provide a practical framework for apps and solutions to be built on top. In a fast-moving pandemic, it becomes a race against time to protect not just a handful of people but whole countries.
How Anonymous is it?
Devising solutions of contact tracing within the hypothetical/near real-world environment by academia and tech industry is not quite the same in reality. But first things first - it isn’t anonymous, despite the ambitious privacy-preserving proposals. Doctors must notify health authorities as soon as a verifiable Covid-19 infection is detected along with the list of people this person was in contact with. Then authorities would get in touch with all of the contacts from the list and so on. What happens if the contacts of the infected refuse to reveal their names/contact details to health authorities? Or if the infected doesn't get a permission to share the names of his contacts?
Volume and Variety of Personal Data Collected
The claim that contact tracer apps will not collect your location data may be true but doesn’t mean it wouldn’t collect the whole raft of other personal data - e.g. public transport ticketing, credit card records like they do in Singapore. This must be controlled and accessed only in exceptional circumstances.
App Uptake / App Misuse
To be useful in tracking Covid-19, the apps would have to be taken up by at least 60 per cent of the population as recently published in Science Magazine. Bearing in mind this is a voluntary app - how do you incentivise people to install the app? I would imagine majority of us would be law-abiding and use the app within the given guidelines but the app (nor the proposals) wouldn’t be able to stop individuals, malicious actors, trolls or even governments to misuse the app and spread panic. What would stop little Alex to report his fake Covid-19 symptoms and get the whole school sent home or a dog owner to tie a phone to a dog and let it run amok around the park? The last thing we want is the situation where worried healthy people (concerned by the alerts issued in the app) start phoning up already severely stretched 999 service.
UK is still struggling with capacity to test wider population, exceptions are celebrities, NHS staff, or people admitted to a hospital. I know several people here in the UK who were in contact with persons with confirmed Covid-19 infections, had all the symptoms and when they called the hospital they were refused to get examined or being tested. Therefore all these data collection with the app is totally useless if no-one cares. If you do get lucky, the turnaround times to get the results back is 3-5 days on average. Not good enough…
App Software Updates to keep up with the changes
How do you force AppStore/Google Play app updates onto population of 60+ million people running iOS/Droid operating systems over 100+ device models using something as unreliable as Bluetooth wireless communication? How do you resolve any technical issues across this vastly inconsistent device ecosystem?
Bluetooth Issues, False Alarms and Crypto Keys
The typical contact tracing app works by using a smartphone’s Bluetooth Low Energy (BLE) feature to track the proximity of other phones. The Classic Bluetooth was connection-oriented and used to maintain a link even if there was no data flowing exposing the device to an unnecessary security risk and severely decreasing battery life. BLE allows ID exchanges over bluetooth for a short time - sufficient for the intended handshake - then they default into background mode, not allowing further ID exchanges. This poses a question - will the app design rig the secure-by-design system and allow BLE to be active all the time, exposing individuals to security risk and killing the phone battery (due to bluetooth’s continuous broadcast) in the process?
Bluetooth travels efficiently through walls and obstacles. What happens in the place of work, in the post-pandemic world, when Bluetooth beacons start picking up signals from nearby offices, meeting rooms from people on the other side of the wall? This would undoubtedly lead to a flood of false alarms.
As Bluetooth is turned on and broadcasting messages all the time, anyone can see a Bluetooth device is around. We can boost a normal Bluetooth received with a good antenna to be able to watch Bluetooth devices in a large area. A typical iOS smartphone setup is to advertise owner’s name through Bluetooth broadcast (e.g. Dragan’s iPhone). Given that, with DP3T ephemeral identifiers (explained below) and ability to recognise a user by his explicit Bluetooth beacon, we can easily keep a local mapping to deanonymise the target user.
In terms of privacy, the typical app assigns each handset a persistent identifier (PUID) that is used to create ephemeral IDs (EBIDs) for the handset that change periodically. These are created by encrypting the PUID with a global broadcast key that is renewed periodically. After X number of days/weeks, the key is deleted. It’s the ephemeral EBIDs that are broadcast by the phone, and the EBIDs of other phones in close proximity that are recorded. Once a patient is diagnosed, with the patient’s consent and authorisation from a health authority, the app uploads all the EBIDs recorded over the prior X number of days/weeks to the backend server, along with time of contact, Bluetooth metadata and some other information. The backend server then uses the global broadcast keys to decrypt the EBIDs, revealing the PUID (and therefore the pseudonymised identity) of all the devices that were close to the infected person in the specified date range.
Let’s step back a little to understand implications of sending/keeping sensitive data on the backend server. The participants in the DP3T ecosystem are:
- users using a communication device (i.e. a smartphone with the required app installed)
- backend server, and
- (health) authority
Backend Server-to-App Channel
DP3T document is ambiguous as for whether the backend server should be trusted:
“This backend server is trusted to not add or remove information shared by the users and to be available. However, it is untrusted with regards to privacy (i.e., collecting and processing of personal data). In other words, the privacy of the users in the system does not depend on the actions of this server. Even if the server is compromised or seized, privacy remains intact.” - (https://github.com/DP-3T/documents)
Based on the above, since adding or removing information on the server has privacy consequences, we gather the server should not be trusted. Some of the concerns:
- The backend server integrity state verification to ensure no crypto keys are maliciously deleted does not exist.
- The “authorisation scheme” by which the authority will be publishing the crypto keys is not clear.
- What would vouch that data transferred over the communication channels, stored on devices and backend servers, or published by authorities is not tampered with and verifiable integrity-wise?
- The backend server is populated by the app not by the authority (which would’ve been a more secure solution to prevent forged data sets)
Encryption of the channel is essential but more importantly would require thorough authentication and authorisation between these two participants as well as data integrity verification that the data reported by the app (e.g. possible Covid-19 exposure/infection) is genuine and not tempered with.
Wider issues - Trust over Truth
> Is anonymous really anonymous?
We trust our Healthcare Providers. In the past we had semi-data-breaches where NHS naively believed that they were selling anonymised data without a chance of re/de-identification of data subjects. A data-sharing agreement shown that Google DeepMind's collaboration with the NHS goes far beyond what it has publicly announced and gave Google access to information on millions of NHS patients. As proven by many examples before there’s a great probability of linking the ‘anonymised’ data back to the data subject with only 3 simple data identifiers.
We trust our Telecoms Operators. Current mobile telecommunication infrastructures already enables network operators to track people by their cell phones and to identify proximity, such as firms X-Mode and Tectonix that specialise in the geo-tracking of smartphones. This means that billions of people currently use devices by which they can be tracked. Network operators are just forbidden by law to keep or disclose this information and feeling safe about one’s privacy requires assuming that network operators do not betray the law. Recently Vodafone launched a 5-step plan to help counter Coronavirus making a statement that wherever technically possible, and legally permissible, Vodafone will be willing to assist governments in developing insights based on large anonymised data sets. As pointed out above, sharing pseudo-anonymous data is lack of duty of care is unacceptable and should be sanctioned more severely and consistently by the authorities in the future.
We trust our Governments. Technologists and vendors keep forgetting that the designing an app and confirming that can work in principle is not enough - it has to be bounded in real-life scenarios and addressing the actual requirements. Otherwise this would be similar to a concept of designing a horse carriage to compete in a F1 Grand Prix. Perhaps they been influenced by the “there’s-an-app-for-that” logic we’ve been seeing over the last couple of decades. One would hope that the World Health Organisation or some overarching consortium would be spearheading this initiative and define truly global approach to Covid-19 or any similar pandemic situation in near future. Wasn’t SARS and MERS back in 2003 and 2015 respectively enough to teach us a lesson and take things more seriously?
In 2006/7 I led a project for a major oil & gas client with objective to address the risk of SARS pandemic for their work force worldwide. Health and safety in this industry vertical is typically a paramount, however if enterprises have been working on a long term solution, should’t our Governments stop disregarding this chronic issue, put it on a risk register, prioritise over anti-terrorism and paid at least some attention? Covid-19 and its variants are not going away anytime soon so we better make things manageable for business and society.
We trust our beloved Apples and Googles. We sometime forget that these guys run business on data and through a frequent reminders of gross neglect of people’s privacy we get ourselves reminded that free and excessively commoditised service for the greater good comes at the price. Regardless of whether we take the route of a centralised or decentralised data collection approach in combating Covid-19, this coupled with mobile telcos, Googles and Apples of this world and their ability to track us and our behaviours could give dark forces opportunity to build a totalitarian wet dream - “social graph” for everyone who downloads the app, trivially figure out who has been in close to proximity to whom, and when.
Universally biggest challenge to all organisations?
Lack of confidence in the quality of your data and, more importantly, your ability to safeguard that data by preventing data theft and leakage, protecting the integrity of data and data-driven business processes and managing privacy risks.
Data (digital asset) security has been an elusive target, impossible to consistently maintain in the world of complex digital supply chains where trust is implied and truth obscured. Transparency and truth on how personal data is collected, processed and shared is the only way to foster adoption for pandemic contract tracing solutions (or any solution that touches confidential/personal data). Technology solution must respect data protection, security and privacy and all times and be architected to sustainably perform in a ever so evolving business, legislative and technology landscape to protect key digital assets. Efficacy and practicality of enforcing encryption as a primary means of ensuring integrity is not just inappropriate, it is irresponsible. The objective is to strike the balance between two seemingly opposite digital asset management strategies - defensive, that demands control and security, and offensive, that demands flexibility and rapid response to market changes.
The following 6 properties must form the foundational pillars for delivering accountability, transparency and truth over trust. Generating industrial-scale immutable, empirically verifiable metadata that proves these 6 properties of the underlying data is probably the most effective way of achieving security at scale. These define data security as a tangible, not abstract, quality that can be empirically proven, universally maintained and guarantees adaptability to organisational/market changes and compliance with the current and future regulations.
- Individual's control over personal data - Bring the control back to data owners. Formally and empirically opting-in (whatever the scheme is) with clear understanding of what you getting yourself into. A single pane of glass into the PII data universe belonging to an individual with options to see where your data was used, moved to, or acquired and deal with any aspect of a GDPR subject access request (or any other compliance matters). Provide an effective option to irreversibly purge data subject's records when deemed necessary. This is where blockchain can help by providing an immutable registration and attestation substrate to prove data ownership and irrevocably link identities to datasets.
- Data Traceability - ensure ability to track a data construct back to the construct it was derived from as a more concrete instantiation. For example this would include a physical column may trace back to a logical attribute, which in turn may trace to a business term, and that traces back to a concept. Note that Data Traceability is very often confused with Data Lineage and Data Provenance.
- Data Provenance - maintain the record of data in sufficient detail to allow reproducibility of a specific dataset. Maintain time-stamped data transactions, internally within the boundaries of an organisation as well as externally, across the ecosystem of partners and service providers with an immutable, empirically provable audit trail available to all of the stakeholders to verify and confirm its authenticity. Data Provenance can be used for many purposes, such as understanding how data was collected so it can be meaningfully used, determining ownership and rights over an object, making judgements about information to determine whether to trust it, verifying that the process and steps used to obtain a result complies with given requirements, and reproducing how something was generated.
- Data Integrity - Data Integrity is probably the most underrated security property that alone could’ve prevented 99% data breaches happened in the past. Modern digital businesses operate with large volumes of data (digital assets) that contain sensitive personal, financial and operational information. Digital Supply Chains collect, store and manage digital assets that nowadays stretch over multiple organisational borders that demand extension and enforcement of security controls beyond the perimeter. Despite having sophisticated security technologies at our disposal, sadly, none of them address data integrity in a provable and scalable manner.
- Data Anonymisation - truly and irreversibly achieve data anonymisation without any chances to re-identify or de-identify data subjects.
- Secure Data Sharing / Secure Data Transactions - deliver seamless data exchange with external parties, traversing multiple borders and systems without loosing enforcement of security controls wherever data may be residing. Create robust mechanism for a transparent, independent verification of data (by your partners, service providers, developers, or regulators) without disclosing the data itself (e.g. confirming a medical record/file security compliance or its travels without revealing the medical record/file specifics). Big data analytics, Cloud APIs, Serverless Computing, Machine Learning and Artificial Intelligence demand new data governance models that consequently require a new innovative approach to security.