For the first time, China has released a plan to allow the use of public data for labelling as the nation aims to empower its fast-growing digital economy and facilitate artificial intelligence (AI) development amid an intensifying international rivalry.
Advertisement
According to a 13-point circular jointly released by four government bodies on Monday, China will promote the systematic labelling and utilisation of public data, while addressing the data needs of key sectors such as agriculture, manufacturing and information technology.
“[The government should] support cross-sectoral, cross-regional and cross-administrative-level use of public data; encourage collaboration between government and enterprises in data labelling and training for large language models tailored to government affairs; and also promote the inclusion of data-labelling services into government procurement,” said the circular, jointly issued by the National Development and Reform Commission (NDRC), the National Bureau of Statistics, the Ministry of Finance, and the Ministry of Human Resources and Social Security.
It added that the labelling of public data should be conducted in an orderly manner and in accordance with the law. Data annotation – the process of categorising and labelling different data types such as text, audio, images and video – is often considered a foundation for enabling AI systems to produce accurate and reliable outcomes. The lack of high-quality data has been one of the challenges in developing large language models.
With the sector valued at 80 billion yuan (US$10.91 billion) in 2023, data annotation is widely applicable in advancing fields such as autonomous driving, low-altitude economies, smart manufacturing and intelligent healthcare.
Advertisement
China has vowed to increase the compound annual growth rate of the data-annotation industry to 20 per cent by 2027, according to the guidelines.