Accelerating HPC Industry Innovation

AsianScientist (Oct.16, 2024) – Supported by supercomputers and democratized cloud-based computing, industries are innovating themselves to make processes smarter, faster and cheaper. Digital twins, for instance, are helping maritime researchers to create smarter autonomous ships and urban planners to build more comfortable residential towns.

High-performance computing (HPC) is also becoming increasingly accessible. In healthcare, biomedical scientists are working together to improve the diagnosis of brain diseases by leveraging HPC on an open cloud-based platform to analyze brain imaging data. With the digitalization of industries, stakeholders are making the effort to ensure underserved regions are not left behind; the creation of large language models (LLMs) that speak to the Southeast Asian region is one such example.

Across sectors, HPC technologies and advancements are rapidly shaping and steering the digital transformation.

SMART SHIPS AT SEA

Since the invention of the steam engine in the 1700s, technological advancements have dramatically transformed the maritime industry. Today, digitalization of the sector is creating smarter transportation systems, as evident from maritime autonomous surface ships (MASS)—which range from remote-controlled ships to sailor-less ships with operating systems capable of complex decision-making.

MASS has the potential to enhance efficiency, improve safety and lower operational costs by cutting manpower. The Republic of Singapore Navy deploys unmanned surface vessels that detect and dispose of mines in the seabed.

These vessels efficiently launch in 10 minutes, compared to manned vessels that require 30 minutes. Furthermore, in contrast to requiring a 32-person crew, only three people are needed to control an unmanned vessel from the safety of an onshore station.

To spearhead research in MASS, the Centre of Excellence for Autonomous & Remotely Operated Vessels at the Technology Centre for Offshore and Marine, Singapore (TCOMS) is tapping into the petascale power of supercomputers at the National Supercomputing Centre (NSCC) Singapore.

Digital twins are used to help anticipate and assess future scenarios a priori, improving the ability of MASS to adapt to ever-changing sea states. This is akin to seafarers relying on experiential knowledge to respond to real-world situations, especially challenging sea conditions. Using computational fluid dynamics simulations, TCOMS researchers have created a digital twin of MASS to understand a vessel’s maneuverability and overall behavior in rough weather involving currents, waves and winds. Such a model would ultimately improve the capabilities of MASS in autonomous navigation, optimal route planning and collision avoidance.

By leveraging parallel computing with hundreds of central processing units at NSCC Singapore, the researchers enabled predictions of the marine vessel’s motion and velocity over time under specific steering actions.

ENVIRONMENTAL URBAN PLANNING

From seashores to concrete jungles, digital twin technologies have also transformed urban planning by simulating the intricacies of the built environment and its interactions with environmental factors such as wind, temperature, solar irradiance and noise. One example of such technologies is the integrated environmental modeller (IEM), built by engineers from Singapore’s Housing & Development Board alongside researchers from the Institute of High Performance Computing and the Institute for Infocomm Research.

As a high-resolution simulation tool, IEM integrates key urban features—like water bodies, green spaces and buildings—with multiple natural environment parameters, including road traffic noise. Compared to commercial models that assess only one to two environmental factors, IEM can simulate complex multi-physical interactions between the features and parameters in a single three-dimensional model. Thus, IEM better mimics a real-world scenario, optimizing urban design.

A notable application of IEM is the upcoming “forest town” Tengah, the first public housing town in Singapore to be designed using smart technologies right from the get-go. Powered by ASPIRE 1, Singapore’s first national petascale supercomputer, IEM was applied to the planning of Tengah back in 2016. Today, as Tengah is gradually being unveiled, residents will be able to experience its biophilic design that optimally blends green spaces and urban features to reduce solar heat gain and make the town cooler.

After garnering the President’s Technology Award in 2019, IEM was licensed in 2021 to local ICT systems provider Wizvision. The company adapted the software to make it accessible to the broader architecture, engineering and construction industry, commercializing it as IEMSim™.

Meanwhile, scientists have built on the success of IEM, bringing it into Phase 2—IEM2. Advancements include expanding environmental physics parameters to cover mean radiant temperature and thermal comfort index, increasingly important factors as climate change amplifies the urban heat island effect.

THE MATRIX OF A MIND

Beyond creating digital avatars of the external environment, scientists are looking inside the human body and accelerating brain imaging analysis—in the hopes of developing a “digital brain” that would revolutionize the way we fight neurological diseases.

Magnetic resonance imaging (MRI) lends insights into brain function and structure. Through MRI data, we can witness changes to the developing brain as well as diagnose diseases, such as brain tumors, stroke and neurodegenerative diseases like Alzheimer’s and Parkinson’s. This wealth of data would enable the creation of a digital twin of the brain that allows the simulation and prediction of neurological conditions. However, pre-processing raw magnetic resonance signals remains a bottleneck that requires computational heavy-lifting.

Scientists from Japan’s RIKEN Center for Computational Science are tackling this challenge head-on with the help of the world-famous Fugaku supercomputer. The team first maximized the brain image preprocessing performance (such as noise reduction, correction of data distortion) of a single node of Fugaku, before leveraging parallel computing on multiple nodes to accelerate preprocessing of large numbers of images. Using just 177 nodes out of Fugaku’s 158,976 nodes, the scientists processed 1,410 brain images in 17.33 hours, a task that would take existing software programs several years.

With Fugaku, Japan is ready to establish a “digital brain” as part of a government-backed, six-year research project to develop drug treatment for dementia. In a partnership between Fujitsu and

GMO Research Activity Support & Technology, an analysis environment for brain MRI data was deployed on Fujitsu’s Computing-as-a-Service cloud platform in March 2024. The analysis environment provides both open data and software developed at Fugaku that helps endusers like researchers and engineers preprocess brain MRI data at supercharged speeds. Users, who can feed in their own data, are now empowered to efficiently develop brain MRI analysis AI tools. Through collective efforts in this space, early and automated detection of neuropsychiatric disorders could become a reality.

MAKING AI AWARE

The minds of English speakers automatically process “LOL” in texts to indicate “laughing out loud.” However, many may not recognize “5555” or “wkwkwk” as expressions of laughter that are commonplace in Thailand and Indonesia, respectively. Such cultural nuances require exposure to the local languages, something existing LLMs that power chatbots like ChatGPT lack.

Despite the increasingly crowded generative AI space, with developers competing to build bigger and smarter language models, cultural bias in LLMs remains a challenge. Many LLMs are fed a diet of English language training data, including information scraped from the internet that is often Western-centric in terms of cultural, societal and political beliefs.

To address the underrepresentation of Southeast Asian languages in LLMs, AI Singapore (AISG) launched the Southeast Asian Languages in One Network (SEA-LION) family of LLMs in December 2023. The collaborative effort brought together Singapore’s Agency for Science, Technology and Research (A*STAR); the National Research Foundation; and the Infocomm Media Development Authority (IMDA).

SEA-LION uses AISG’s p r o p r i e t a r y SEABPETokenizer—tailored for Southeast Asian languages—to break down lengths of text into pieces of words that are used to train the LLM. Besides leveraging a language tokenizer optimized for Southeast Asia, 13 percent of SEA-LION’s training data was in local languages such as Thai, Indonesian, Malay and Vietnamese; a sharp contrast to the 0.5 percent for Meta’s Llama 2.

In a head-to-head test, both SEA-LION and Llama 2 were prompted in Indonesian on what ASEAN is. Notably, SEA-LION gave the correct answer in the right language, whereas Llama 2 could not understand what ASEAN is and responded with a longer answer in English.

To train SEA-LION, AISG tapped into the Amazon Web Services cloud infrastructure. With the power of 256 NVIDIA A100 Tensor Core graphics processing units, a three-billion parameter version of SEA-LION was trained in 14 days, with a scaled-up seven-billion parameter completed after 22 days. Though SEA-LION is smaller than existing LLMs, it is nimble.

AISG has made SEA-LION available to the public, democratizing LLMs and allowing enterprises—including those in underserved regions—to deploy and finetune the compact-sized LLMs more swiftly and costeffectively. For example, Indonesian e-commerce platform Tokopedia is using SEA-LION to generate product descriptions in Southeast Asian languages to create a better customer experience for a more diverse audience. Meanwhile, SEA-LION is helping Singapore IT company NCS translate content in its legal compliance process into Thai and Indonesian more accurately, as compared to the prior English-based LLMs that NCS had been using.

To continue the advancement of regional LLMs, Singapore launched a S$70 million (~US$ 53 million) initiative known as the National Multimodal LLM Programme (NMLP) in December 2023. A partnership between IMDA, AISG and A*STAR, with the backing of NSCC Singapore’s HPC resources, NMLP has ambitions to develop SEA-LION into a 30–50 billion parameter sized LLM, as well as extend it into a multimodal speech-to-text model.

Hot on the heels of SEA-LION’s release, DAMO Academy—the research institute of China’s leading technology company Alibaba—launched its own open-source Southeast Asian language-focused LLM known as SeaLLM. This model, and its conversationally fine-tuned counterpart SeaLLM-chat, come in 7-billion and 13-billion parameter sized versions.

SeaLLM interprets and processes text up to nine times longer than English-centric LLMs such as ChatGPT-3.5, and is supposedly cheaper to operate. Across various benchmarks like translation and comprehension, mathematical reasoning, and answering general and local context-specific questions, SeaLLM- 13B bested ChatGPT-3.5 in languages such as Thai, Burmese, Lao and Khmer.

As HPC becomes more accessible, industry players will continue to harness its capabilities to improve their operations against the backdrop of an increasingly digitalized economy. In parallel, issues like equitable access and AI governance will need to be addressed as industries that touch our daily lives undergo transformative facelifts.

—

This article was first published in the print version of Supercomputing Asia, July 2024.Click here to subscribe to Asian Scientist Magazine in print.

Disclaimer: This article does not necessarily reflect the views of AsianScientist or its staff.