We Serve the Latest News of IP Industry
for Your Reference
People often compare data to oil in the new era. The earliest source of this analogy can be traced back to 2006, when a British mathematician Clive Humby proposed it. The Economist magazine devoted a cover story to the data issue in 2017, not only pointing out that data is the most valuable resource in the world, but also mentioning the issue of data and new competition rules.
Data, as a new factor of production, is of vital importance in the digital economy and the Internet environment, and is also closely related to network competition. From the formulation and implementation of relevant laws, regulations, local regulations and so on in recent years, it is not difficult to see the importance of data at various levels. Many places have even started to try various aspects of data registration, confirmation, trading and so on. We can also often see reports on relevant foreign situations. For example, as AI-generated content attracts more and more attention worldwide, some network platforms that own data have begun to express dissatisfaction: my data is crawled every day for training AI, but the results of the training are enjoyed by others. Some platforms have already started to refuse or limit free access to their data, and then openly propose to price their data access. Since all kinds of ideas and attempts around data are still in the exploratory stage, what the market and rules will eventually look like, we may not be able to see very specific and clear at present. However, treating data as an important business resource of market entities and trying to find corresponding trading rules and prices in the market may be an inevitable trend.
When we talk about data, what exactly are we talking about? A quick online search reveals that in many scenarios, when people discuss data or information issues, they mix them together without making a clear distinction. This also applies to many materials that define data and information. In different professional fields, what is the relationship between information and data, and what are their differences? The answers we get seem to vary.
For example, what is information? Some places give this definition: “refers to a description of facts or events by data, facts or knowledge that can be transmitted, stored, processed and used… It can be text, sound, image, video or any other form of data.” Some places give this answer: “refers to the content contained in messages, instructions, data, symbols and so on emitted by things…” What is data? Some online resources explain: “refers to raw, unprocessed numbers, symbols or words… It needs to be processed, interpreted and analyzed to be converted into useful information.” What is the relationship between information and data? The result we are likely to find is this: “Information is the interpretation and understanding of data, which is the useful and meaningful content extracted from data… Data is the basis of information, while information is the useful content extracted from data.” Therefore, from the above concepts and definitions of information and data, there seems to be a common phenomenon of mutual explanation and clarification. This is certainly caused by their close connection, but at the same time we must also accurately grasp their essential differences. For example, these article titles “EU plans to require cloud service data to be stored and processed in EU, Amazon, Google, Microsoft and others affected”, “2022 Annual Legal Regulations Publication and Application Data Analysis Report”, “Is spring really that short? Data tells you”, although they all use the concept of “data”, but I’m afraid they have different connotations and extensions. Although in daily life, even if we confuse information with data concepts, it will not cause obvious confusion in understanding. But in the study and analysis of legal issues, their confusion may lead to fundamental differences.
From the perspective of current legal provisions, Article 127 of the Civil Code stipulates that although it mentions data, namely “where there are provisions on the protection of data and network virtual property by law , follow its provisions”, it is only a very principled and declaratory provision. The more direct and clear provision is the Data Security Law , Article 3 states: The term “data” as used in this Law refers to any record of information by electronic or other means. It is worth noting that here “record” appears as a noun rather than a verb , referring to the result or carrier of recording information . Many experts and scholars have also recognized the importance of distinguishing between information and data , such as Professor Shi Jianzhong from China University of Political Science and Law , Professor Mei Xiaying from University of International Business and Economics , Professor Shen Weixing from Tsinghua University , etc., who have written articles on this topic from various angles . For example , Professor Shen Weixing pointed out in his article “On Data Usufruct” : “The essence of data is the carrier of information , while information is the source of knowledge … In the digital economy era , data should be a digital description of known or unknown information (along with metadata) , and technically can be an object of digital operations (processing , storage and transmission) , which is an electronic information record that exists in a machine-readable way … To correctly discuss the data ownership issue , we must strictly distinguish data and information from the object , and mixing them together will cause many unnecessary misunderstandings …”
Returning to the specific scenario of platform data competition that we are talking about today, it is more important to clearly distinguish between data and information.
For example, based on the aforementioned analogy of comparing data to oil, some people will further think that since data is like oil, they are both natural resources, then everyone has the right to freely obtain, develop and use them. First of all, this view confuses different states and natures of oil: is it oil that has not been exploited under the seabed or rock formations, or is it crude oil products that have been extracted from oil wells and barreled, or is it gasoline and other commodities made from oil refining? Although they are all derived from natural resources of oil, with the input and processing of mining, refining and other procedures, their nature, especially the commodity attribute, has changed fundamentally. Secondly, applying this to the attributes of data simply is also too simple and crude. Moreover, from the concept of data and its relationship with information mentioned above, we should recognize that the data here is not at all possible to have the same natural attribute as natural resources, it is invariably the result of human processing such as collection, storage, use, processing, transmission, provision, disclosure and so on. Or we can also understand the data in the context of platform competition in this way: even if it is water, it is not natural water flowing in rivers, lakes and seas, but drinking water that has been processed, filtered and packaged as a commodity on supermarket shelves.
Information can have various forms, some information can constitute works, some cannot. For information that can constitute works, its relationship with the corresponding data is equivalent to the relationship between works and carriers that we have understood before, which is the relationship between novels and books, songs and records, paintings and canvases. On this basis, it is not difficult to recognize: the same information can correspond to different data , and can also correspond to different data holders or controllers; and these data holders and controllers have separate and coexisting rights for their respective held and controlled data. In this case , even if a subject controls some data , it does not mean that it necessarily controls the corresponding information . It is worth further thinking about that when we advocate “sharing” and “circulation” in the Internet field , what we are referring to should be information or electronic data as its carrier ? Which one has legitimacy , necessity , rationality ?
On the premise of distinguishing between information and data , the boundaries between many concepts naturally become clear . For example , public information and public data cannot be equated . In addition , there are some concepts that need to be distinguished , such as : public data does not equal common data , although there is only one word difference , but they have essential differences ; public provision of information or data does not mean that it must be provided for free , nor does it mean that it must be provided unconditionally to all subjects on the Internet , nor does it mean that any subject who obtains this information or data can use this data in any way .
Finally , for the rules of crawling and using platform data , although we can discuss from different perspectives and introduce different theories , I am afraid they cannot go against basic principles and common sense .
Once entering the network world , will human relationships , behavioral norms , rights and obligations change fundamentally ? Do we really need to break or even go against the original rules and consensus , and reshape a set of behavioral norms ? I doubt this very much . Of course , how to scientifically expand existing rules so that they can more comprehensively extend to network environments and better apply to network behaviors , we can continue to study and explore . But people’s long-term life in the past has gradually formed and recognized these basic common sense , rules , as well as at least behavioral boundaries . Even if entering into digital age and network environment , they still have practical significance .