Chapter 15 The Necessity of Standards for the Open Social Web

By Harry Halpin

For the first time in human history, the majority of our social communication – from our beloved photographs to our most intimate of chatter – is being captured by digital platforms, most of them closed and proprietary “social” platforms like Facebook and Google. This is a relatively recent but profound shift in the historical playing field, a massive digital accumulation and colonization of social life seemingly without bounds.

This epochal shift is dramatic, but not especially surprising. The adoption of new languages and linguistic techniques – the process of grammatization – has long been at the center of profound political transformations.[1] In Europe, the great transition from feudalism to capitalism was intertwined with the transition in governance from monarchy to the nation-state – a transformation that itself was grounded in the spread of literacy amongst the former peasants.

For generations, Latin could be read only by the clergy, a fact that served to marginalize the vast majority of the population. Yet with “industrialization of language” made possible by the printing press, literacy escaped the confines of the church and spread across the wider social realm. This social reality enabled many more people to engage in the “Republic of Letters” that constituted the foundation of the Enlightenment. Let us not forget that something as simple as the mass production of a Bible, in no small part due to Gutenberg’s invention, was enough to cause religious wars throughout Europe.

Yet mass literacy also paved the path for a new civilization: Whoever thought that the formerly illiterate masses would self-organize the French Revolution? Bernard Stiegler points out the striking parallels of the digital revolution for our times. We live at the very brink of a similar cataclysm, Stiegler argues, as our very language – and eventually political and economic institutions – is digitized before our very eyes.[2] The algorithms and code that increasingly shape and govern our social existence are mostly controlled by a small corporate oligopoly that, hand in hand with the secret state revealed by Snowden, has established a regime of exploitation and domination based on centralized control of digital communication.

Hope is not lost, however: A new vernacular of our digital age – open standards – offers enormous potential for assuring that freedom and innovation can flourish. Open standards are critical because they prevent anyone from gaining unfair advantages or control over commerce, politics and other realms of life. Anyone is able to create new code and to access data, much as people during the Enlightenment acquired the capacity to access knowledge in their native languages.

The social impact of a new, accessible language of code is not hard to imagine. The Internet and Web are living examples of the catalytic power of open standards: TCP/IP and HTML serve as the literal building blocks of the Internet and Web, respectively, allowing any computer to join the Internet and any person to create a Web page. When anyone is empowered to contribute – not just credentialed “professionals” authorized by centralized, hierarchical institutions – the result is an explosion of creativity that can even overthrow governments: Witness Tahrir Square. However as witnessed by post-revolutionary Egypt, the hard problem is perhaps not the overthrow of pre-existing institutions, which seems to come about all too easily, but how a genuinely new social – and today, digital – realm can arise without domination and exploitation.

 Why Open Standards Matter

Large institutions are increasingly using Big Data to assert institutional control over our personal information and, in turn, what we can read, think, create and organize with others: The question is how to take that power back without losing its myriad advantages. To prevent the centralization of our data in the hands of a neofeudal digital regime and all the dangers that this entails, we urgently need to construct a new ecosystem of open standards to allow secure forms of digital identity that everyone from individuals to institutions can deploy without being “locked-in” to existing players. (See Chapter 13, “The ID3 Open Mustard Seed Platform,” by Thomas Hardjono et al.)

These new open standards not only be limited to providing the functions of the current regime of centralized social networking providers (Facebook, Twitter, LinkedIn, etc.), but go further in empowering individuals to control their own digital identities and digital communications. Simply using these platforms “as is” will not enable a flowering of innovation because much of the core control over identity – and thus control over how people may interact – will remain in the hands of a few centralized players who control username, passwords, personal data, metadata and more. These players naturally wish to control how personal data will be used because so much of their current institutional sovereignty and revenues depend upon it.

Why shouldn’t users be able to choose – and even create their own – self-sovereign digital identities? Why shouldn’t identity creation and verification be based on open standards like the Internet? This is surely the best guarantor against abuses of the data. To achieve this vision, every element of a decentralized identity ecosystem would have to embrace standard protocols to communicate with other systems, much as all Internet and Web users must adhere to the TCP/IP and HTML protocols, respectively. Otherwise, users would be locked-in to their own system and unable to communicate with the rest of the Web. Ideally, even large social networking sites such as Twitter and Facebook would likely choose to use open protocols. If this were to be the case, those using open standards could even take advantage of the “network effects” of the tremendous numbers of users on these commercial platforms, while still having the privacy and power of controlling their own digital identity.

Based on the precedents of the Enlightenment, the Internet and the Web, there is a great likelihood that open standards for data would unleash a new digital enlightenment whose transformative effects we can only speculate about. It is clear that, faced with problems whose structures and complexity are difficult to grasp – global climate change, the financial crisis and the spread of failed states – we desperately need to harness the potential power of an interconnected world. Open standards for identity are the first step.

 The Vital Role of Open Standards Bodies

Open standards such as TCP/IP and HTML serve as the literal building blocks of the Web and Internet, allowing any computer to join the Internet and any person to create a webpage. These standards were created by bodies such as the Internet Engineering Task Force (IETF) and World Wide Web Consortium (W3C), which rely on a consensus-making process in an open and evolving group of members. The processes followed by open standards bodies are quite different from those used by conventional standard-setting bodies at the national level such as the American National Standards Institute (ANSI) or international bodies such as the International Telecommunications Union (ITU). In contrast to the standards-setting bodies of the Internet, these pre-Internet standardization bodies normally use formal processes to adopt standards via majority votes by representatives of a closed group, such as nation-states.

This process – an open multistakeholder process – is never simple but seems to work remarkably well for technical standards. In light of the great success of the TCP/IP stack of protocols over the alternative ITU-backed network stack OSI (Open Systems Interconnection), an open multistakeholder process has proven itself to be superior to traditional processes in creating effective, widely accepted open standards. Perhaps most interesting is that multistakeholder standards bodies allow individual or institutional participation based on informality and merit, and not on the basis of political credentials or government roles. In the words of first Chair of the Internet Architecture Board David Clark: “We reject kings, presidents and voting. We believe in rough consensus and running code.”[3]

The Internet has achieved its stunning technical interoperability and global reach by bringing to the table a complex social network of interlocking and sometimes even institutional adversaries, ranging from open source projects to companies such as Google and telecommunications providers. These institutions work together by agreeing to use a number of standardized protocols that are “loosely connected” (allowing for modest variations) and to respect the rather vaguely defined principles of Web and Internet architecture.[4] A mixture of hackers, government bureaucrats and representatives of corporations create and maintain these protocols via a small number of interlocking standards bodies such as the IETF and W3C. Through their technical protocols, these standards bodies play a vital role in defending the often-implicit guiding principles of the Internet and Web, such as net neutrality and the “end-to-end” principle, which are widely deemed responsible for the Internet’s astounding growth.

When the Internet was first being built, the Internet Engineering Task Force functioned as an informal network of graduate students who posted “Requests for Comments” (RFCs) for early Internet protocols. Frustrated with the large number of incompatible protocols and identification schemes produced by the IETF, Tim Berners-Lee had the vision of a universal information space that he called the World Wide Web.[5] He built the Web as a quick prototype while working part-time at the European Organization for Nuclear Research, known as CERN. Berners-Lee sent the core draft specifications for his Web prototype (based on protocols known as URL, HTML, HTTP) to the IETF as “experimental” specifications despite the rejection of his original academic paper by the 1991 ACM (Association of Computing Machinery) Hypertext Conference.

The IETF declined to approve a “Universal” or “Uniform” resource identifier scheme (URIs), and large corporations started entering the IETF, potentially compromising the integrity of the standard-setting process. This prompted Berners-Lee to establish his own standards-setting body, the World Wide Web Consortium (W3C) to manage the growth of the Web. With offices in Europe, the United States, Japan, and even China, as well as a paid neutral technical staff of over seventy employees and a more formal process than the IETF, the W3C has managed to fend off monopolistic control of Web standards and foster the development of the Web, including the adoption of such technologies as HTML5.

The W3C and IETF now work closely together. In conjunction with ICANN, these standard-setting bodies serve as the core of the multistakeholder process of Internet governance described in the “OpenStand” principles (www.open-stand.org).

      The standardization process is not only concerned with the technical development of the standards, but with fostering an hospitable environment for patents. In the United States, unfortunately, software patents have become so expansive in scope that even free and open source software is sometimes accused of infringing on a patented idea, triggering mandatory licensing fees to patent-holders. Since patented software has become a large industry, it is important for any open standard to be freely useable by developers without fear of licensing fees or patent trolls.

Open standards bodies such as the W3C are committed to standards policies that allow both commercial and open source software to use the standards and still interoperate. For example, W3C’s licensing commitments to HTML5 allow both commercial closed-source browsers such as Internet Explorer and open source browsers such as Mozilla to render the same web-page in a similar fashion to users. This ensures that users are not stuck viewing the Web with a particular browser and that patent claims will not impede the future development of the Web.

The IETF deals with patents through what has been called the “Note Well” agreement. The general point is that “in all matters of copyright and document procedures, the intent is to benefit the Internet community and the public at large, while respecting the legitimate rights of others.”[6] However, the IETF does not guarantee royalty-free licensing via legally binding agreements.

Given the high level of corporate competition in the Web, the W3C constituted itself as a membership consortium so that these legal agreements can be made (while inviting participation by open source developers, academics, government experts, and small companies via its “Invited Expert” process). These agreements essentially bind existing patents to the W3C, allowing the W3C to act as a “patent war chest” for all patents related to the Open Web and as a guarantor that patents will be licensed royalty-free to developers everywhere.[7]

The “social web” – websites dedicated to social networking – is currently a very fragmented landscape. It has no central standards body and a bewildering number of possible protocols and phrases. In order for a new layer of social and identity protocols to be incorporated into the rest of the Web via open standardization, it would be necessary, in an ideal scenario, to establish a single set of standards for each step in how digital identity and social networking are currently managed in existing closed, centralized data silos, and then adapt them to an open and decentralized world.

 Open Standards and Digital Identity

Identity is the connection between descriptive data and a human or social institution. As such, identity essentially serves as the “digital name” of some entity. Particular ways of encoding that name are identifiers. Identity systems go far beyond simple natural language names such as “Tim Berners-Lee.” Berners-Lee has a phone number which, with an internationalized calling code and a USA phone number, would consist of 10 digits. These digits are not connected to the Internet in an obvious way. However, with the advent of the Internet, a number of new identification schemes has come into being, such as email addresses like timbl@w3.org or even Facebook accounts like “Tim Berners-Lee” (https://www.facebook.com/tim.bernerslee).

Interestingly enough, while one’s natural language “proper name” is in general registered and controlled by the government as a matter of law, identifiers ranging from phone numbers to e-mail addresses to Facebook accounts tend to be controlled by private institutions such as corporations. For evidence, simply look at what is after the “@” symbol in any email! This proliferation of identifiers that have no standard way of interoperating has led some technical observers to propose the establishment of an identity ecosystem in which the various identities and relevant data of persons and organizations could be integrated. This in turn would enable new services and more efficient transactions, while continuing to allow people the freedom to use pseudonyms or remain anonymous.

One strategy for achieving this vision is to chose a common identifier to bind together all of a user’s identities, which in turn would determine who controls the identifier. The earliest technical paper to really broach the question of user-controlled identity and personal data is the 2003 article, “The Augmented Social Network: Building Identity and Trust into the Next-Generation Internet,” by K. Jordan et al.[8] The authors proposed to “build identity and trust into the architecture of the Internet, in the public interest, in order to facilitate introductions between people who share affinities or complementary capabilities across social networks.” The ultimate goal was to create “a form of online citizenship for the Information Age.”

Although the paper was ambitious in scope and wide-ranging in a vision for revitalizing democracy, the concept was never taken to a standards body. Instead, an entrepreneur, Drummond Reed of a company called InterMinds, created a new kind of identifier called XRIs (Extensible Resource Identifiers). This protocol was designed to replace user-centric identifiers with a for-profit monopoly on identity controlled by Reed himself.[9] When Reed claimed on a public mailing list that there were patents on XRIs, Tim Berners-Lee called for them to be rejected, and the W3C intervened so that the proposed XRI standard was indeed rejected from the OASIS standards body (Organization for the Advancement of Structured Information Standards).[10] As an alternative, Berners-Lee supports the use of URIs (“Uniform Resource Identifiers,” previously known as URLs or “Uniform Resource Locators”) as identifiers not just for webpages, but for all sorts of things that could be connected to the Web. For example, Berners-Lee’s URI would be http://www.w3.org/People/Berners-Lee/card#i. The idea would be to use URIs to leverage the infrastructure of the Web to enable even more versatile functions and services. Yet very few people use URIs to address themselves, and standards that embedded URIs failed to build a decentralized social web.

In response to the lack of uptake of URIs as identifiers, developers considered e-mail addresses rather than URIs for portable identifiers. The reason is simple: Email addresses are very personal and users remember them naturally, unlike URIs. Email addresses are associated with a concrete set of embodied actions, namely checking and reading email inboxes, and so are more easily remembered. While both URIs and email addresses depend on the domain name system, users do not actually control their own email addresses; the owner of the domain name does. So for the email address timbl@w3.org, the W3C controls the domain name on which it is hosted.

In the case of services such as Facebook, Twitter and Google, the identity of the user is completely controlled by the corporation and the user has no rights over their digital identity – a power that is even more controlling than that exercised by nation-states (over passports, for example). Corporations exercise these powers over identity even though they do not own domain names indefinitely, but lease them from domain registrars who ultimately lease them from ICANN – which has the IANA (Internet Assigned Names and Numbers Authority) function to distribute domain names on lease from the U.S. Department of Commerce.

On the other end of the spectrum, there have been successful attempts to create a fully decentralized identifier system based on distributed hash-tables. But none of the solutions like Namecoin or telehash has been standardized and both require users to use an indecipherable cryptographic hash instead of a human-memorable identifier for their identity. While Tim Berners-Lee may not think timbl@w3.org is a great identifier, he would surely balk at using f4d8b1b7f4e3ec7449822bd80ce61165 as his identifier!

The main purpose of an identity ecosystem is to enable the use of personal data: that is, any data pertaining to a particular human being or institution under autonomous control. Currently, social networking “silos” such as Facebook, Google+ and Twitter mostly trade in low-quality social data, such as names and lists of friends – as well as shopping preferences. However, there have been moves towards enforcing higher quality standards and verified personal data, such as the use of a “real name” policy in Google+. Google+ and Facebook have also sought to link phone numbers as well as geolocation to identities in their proprietary silos.

Notwithstanding these gambits, high-value data such as credit histories and medical records are to a large extent still controlled by traditional institutions such as banks and hospitals. The thesis put forward by the World Economic Forum in reports such as “Personal Data: A New Asset Class” is that high-quality personal data currently “locked” away in traditional institutions could serve as a valuable input into data-driven innovation.[11] This in turn could enable a whole new realm of efficient transactions and community-driven social innovation.

The vision is that users should control their own data via personal data stores, also called “personal data lockers.” These personal data stores consist of attributes, such as full name, phone number, bank balance and medical attributes. Various systems can be used to double-check these attributes by various means, including machine-learning and background checks, all of which would be used to create verified attributes. By controlling their own data, users could then enter into contracts that would enable powerful services in exchange for their data. Users could also establish their own self-organized “trust frameworks” via algorithmically backed, legally binding agreements. (See Chapter 13, “The ID3 Open Mustard Seed Platform,” by Thomas Hardjono et al.)

The act of claiming and then using an identity can often be broken down into two distinct elements: authentication and authorization. The first step, authentication, is when some proof of identity is offered, often thought of a credential-like password or even some secret cryptographic key material in a smartcard. Services called identity providers require authentication to access personal data, and may also associate (possibly verified) attributes with an identity.

Note that authentication does not necessarily reveal any identifying characteristics and so may keep the authenticating entity anonymous. One such effective technique is “zero-knowledge proofs,” which allows a user to authenticate his or her identity without revealing a legal name or other attributes to the identity provider.[12] Of course, different kinds of identity providers or even the same identity provider may host different personas, and different levels of security may require different kinds of credentials to authenticate.

Right now the primary method of authentication is the username and password, but due a number of serious security breaches, this standard technique is likely to be improved. Current approaches tend to try to associate either some private cryptographic key material (such as that on smartcard or mobile phone) or even biometric data with a credential. Still, there are currently no open standards in this space for this approach. The W3C Web Cryptography API may eventually make high-value authentication an open standard by enabling lower-level cryptographic primitives.[13]

The second step of identification is authorization, which occurs after there has been a successful authentication of a user. With authorization, a user can authorize the transfer of attributes between services. The identity provider can provide personal data in the form of attributes to a relying party, the service that wants identity attributes. Consider the case of a user who wishes to log in to a new service and wants his or her profile – including name and picture – to show up in the new service. The user may authorize an existing identity provider such as Facebook to transfer identity attributes to the relying party. If not already logged into Facebook, this is typically done via having the user be redirected to Facebook and then authenticating to Facebook via a username-password using the proprietary Facebook Connect, and then asking the user to explicitly approve the relying party’s access attributes stored on Facebook. After the user accepts the transfer of personal data, they are redirected back to the then-personalized site.

From a privacy standpoint, the identity provider (Facebook in this example) observes all personal data transactions with all relying parties and so is able to build a map of the user’s web services. Worse, there is nothing technically preventing an identity provider from doing personal data transactions without the user’s consent. Today, a number of open standards exist for authorization, the most prominent of which is OAuth, managed by the IETF.[14] A particular profile of personal data for OAuth is called OpenID Connect, which has been refined and extended by the Open Identity Foundation.

Given that users currently have no control over what data is being processed about them via authorized transactions, a set of standards are in development called User-Managed Access (UMA), which aim to put the entire flow of personal data under user control.[15] The combination of this suite of standards, still under development and needing considerably more work, has potential to produce open standards for authorization of personal data transactions.

One of the features of social networking is the movement of real-time data like status updates given as “activity streams” by sites such as Facebook and Twitter. While the traditional Web has focused on static content served via webpages, the social web is moving towards a “real-time Web” of heavily personalized content. While Twitter, Facebook and Google all have proprietary methods for real-time updating in their social networks, there have been a number of proposed standards to tackle the much harder problem of a decentralized real-time Web. The key of the real-time Web in the context of decentralized social networking and personal data stores is to dynamically update other nodes in the network based on social activities or the appearance of new data. The most popular format for open status updates is ActivityStreams[16] and a wide variety of architectures have been proposed for distributing and re-collating these streams. This standard-setting work, along with APIs that make it easy for developers to use, have commenced in the W3C Social Web effort.[17]

There has been far less research than needed on privacy and security in decentralized social networks. In fact, decentralized social networks are not a priori more secure than centralized silos, as the sending of messages between nodes in a decentralized social network reveals the social relationships of the users to a global passive adversary with much less trouble than a centralized social networking silo. Yet currently there does not exist a decentralized social network that doesn’t reveal such valuable data via traffic analysis.

Hope is not lost: It is plausible that a decentralized identity system could be created that is even resistant to the pervasive surveillance of the NSA. However, this will require considerable new research and difficult implementation work, especially for anonymity measures like cover traffic, a way of shielding network traffic from global passive adversaries like the NSA.

One can easily imagine a possible future where, rather than having our digital identities controlled by either nation-states or major corporations, we control our own identities and determine how we interact digitally with the wider world. Access to our online lives has become such an essential foundation of our everyday lives that access and control of one’s own data may soon be considered as completely natural by “digital natives” as control over one’s own body is in any post-slavery society.[18] For the moment, this vision still remains ultimately utopian – something that exists mostly as a vision to inspire and guide a few programmers who are trying to correct our dangerous socio-economic trajectory towards centralized control over our social lives. Programmers dream in code.

Yet there are also reasons to believe that this utopia has a fighting chance of becoming reality. Open standards bodies like the IETF and W3C are taking up the task of defining new identity standards. The language of open standards that could serve as the new vernacular for our digital age is being created in the here-and-now by concerned engineers. What is conspicuously absent is a larger vision that can appeal to the vast swathes of humanity that are not already part of the technical standardization process, and so could galvanize a new social movement suited for this digital age. Strangely enough in this era of cybernetic governance, we are suffering from a failure of communication. Those who can comprehend the current dangers of having our identities outside of our own control and those who understand the latent potential for standards-based autonomous digital identity must surely at some point find the words to express themselves in a way that can be widely comprehended. After all, the revolutionary call for an open social Web is being driven by the self-same collective feeling that historically has driven innumerable revolutions before: the desire for freedom.

Harry Halpin is a Research Scientist at W3C/M.I.T., where he leads efforts in cryptography and social standards. He is also a visiting researcher at L’Institut de recherche et d’innovation (IRI) du Centre Pompidou, where he works with Bernard Stiegler on the philosophy of the Web. He holds a Ph.D. in Informatics from the University of Edinburgh and his thesis has been published as Social Semantics (Springer, 2013).

 Notes

[1] Auroux, Sylvain, La révolution technologique de la grammatisation (Éditions Mardaga, Paris, 1994).

[2] Stiegler, Bernard, Prendre soin (Flammarion, Paris, 2008).

[3] http://www.ietf.org/tao.html.

[4] http://www.ietf.org/rfc/rfc1958.txt.

[5] Berners-Lee, Tim, Weaving the Web (Texere Publishing, London, 2000).

[6] The entire Note Well agreement – http://www.ietf.org/about/note-well.html – explains this in much more detail in RFC 53789 and RFC 4879. See https://www.rfc-editor.org/rfc/rfc5378.txt and https://www.rfc-editor.org/rfc/rfc4879.txt.

[7] The W3C Royalty-Free patent policy is publicly available at http://www.w3.org/Consortium/Patent-Policy-20040205. The entire standardization process is described in the W3C Process Document and is available at http://www.w3.org/2005/10/Process-20051014.

[8] Jordan K., Hauser J., and Foster, S., “The Augmented Social Network: Building Identity and Trust into the Next-Generation Internet,” First Monday, 8(8), (August 4, 2003), available at http://firstmonday.org/ojs/index.php/fm/article/view/1068/988.

[9] https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xri.

[10] http://danbri.org/words/2008/01/29/266.

[11] World Economic Forum, Personal Data: The Emergence of a New Asset Class (2011).

[12] Fiege, U., Fiat, A., and Shamir, A., “Zero Knowledge Proofs of Identity,” in Proceedings of the ACM Symposium on Theory of Computing (STOC ’87) (ACM Press, New York City, 1987), pp. 210-217.

[13] http://www.w3.org/TR/WebCryptoAPI.

[14] http://tools.ietf.org/html/rfc6749.

[15] https://kantarainitiative.org/confluence/display/uma/Home.

[16] http://activitystrea.ms.

[17] http://www.w3.org/2013/socialweb/social-wg-charter.html.

[18] Berners-Lee, T., and Halpin, H., “Defend the Web,” in Digital Enlightenment Yearbook, J. Bus, M. Crompton, M. Hildebrandt, and G. Metakides, Eds. (IOS Press, Berlin, 2012), pp. 3–7.

Return to Index