Focus Taiwan App
Download

'Sovereign AI language database' due before year-end: Digital ministry

07/15/2025 05:19 PM
To activate the text-to-speech service, please first agree to the privacy policy below.
Chuang Ming-fen, head of the Department of Data Innovation at the Ministry of Digital Affairs. CNA photo July 15, 2025
Chuang Ming-fen, head of the Department of Data Innovation at the Ministry of Digital Affairs. CNA photo July 15, 2025

Taipei, July 15 (CNA) Taiwan's Ministry of Digital Affairs (MODA) said Tuesday it plans to release the first version of its "sovereign AI language database" in the fourth quarter of this year.

The release will be based on licensing terms for the training corpus -- the body of data used to train the AI learning model -- that the ministry has drafted to help agencies identify suitable data for inclusion while addressing copyright issues, MODA said at a news conference.

Government ministries are currently reviewing their datasets, and both public and private sector entities will be able to apply to access the database once it goes online, according to the ministry.

Chuang Ming-fen (莊明芬), head of MODA's Department of Data Innovation, said the ministry began preparing the licensing terms to address copyright concerns that have arisen over AI training.

She said the corpus is expected to include open government data, policy reports and government publications.

Rather than measuring the dataset by volume, the ministry plans to use tokens as the unit for quantifying the data, Chuang added.

She added that only around 1,000 of the more than 50,000 open datasets currently available are textual in nature, which is the type of data large language models require.

Agencies such as the Hakka Affairs Council (HAC), Ministry of Education (MOE), Council of Indigenous Peoples (CIP), and Ministry of Culture (MOC) are among those now reviewing language data for possible inclusion, she said.

The announcement came at a press conference introducing draft legislation on promoting data innovation and utilization, which is open for public comment until Aug. 15.

Chuang said the draft act focuses on four key areas: expanding open data for AI, promoting cross-industry data-sharing mechanisms, lowering agency data costs, and building a data innovation ecosystem, including requiring municipalities to appoint chief data officers.

Deputy Digital Affairs Minister Lin Yi-jing (林宜敬) said the proposed law aims to "train more AI models with Taiwanese perspectives" by allowing copyrighted data to be released with privacy safeguards.

(By Su Szu-yun and James Thompson)

Enditem/ls

    0:00
    /
    0:00
    We value your privacy.
    Focus Taiwan (CNA) uses tracking technologies to provide better reading experiences, but it also respects readers' privacy. Click here to find out more about Focus Taiwan's privacy policy. When you close this window, it means you agree with this policy.
    31