Saturday, June 15, 2024
HomeBig DataIBM to check Southeast Asian LLM and facilitate localization efforts

IBM to check Southeast Asian LLM and facilitate localization efforts


bangkok4gettyimages-1499456004

@ Didier Marti/Getty Photographs

IBM has inked an settlement with AI Singapore (AISG) to check the latter’s Southeast Asian massive language mannequin (LLM) and make it accessible for builders to construct custom-made synthetic intelligence (AI) functions. 

Beneath the partnership, IBM will take a look at the Southeast Asian Languages in One Community (SEA-LION) mannequin utilizing Large Blue’s AI expertise and knowledge platform, Watsonx, and work with AISG to fine-tune the LLM. The aim is to assist organizations select appropriate AI fashions for his or her enterprise necessities, IBM and AISG stated in a joint assertion on Tuesday. 

Additionally: Google joins collaborative efforts to construct localized massive language fashions

IBM may even make SEA-LION accessible in its AI use case library, dubbed Digital Self-Serve Co-Create Expertise (DSCE), enabling builders and knowledge scientists to construct localized generative AI (GenAI) functions. 

An open-source LLM developed by AISG, SEA-LION is designed to be smaller, extra versatile, and quicker than different LLMs, based on AISG. Its present iteration runs on two base fashions: a 3-billion-parameter mannequin and a 7-billion-parameter mannequin. The LLM’s coaching knowledge consists of 981 billion language tokens, which AISG defines as fragments of phrases created from breaking down textual content throughout the tokenization course of. These fragments embody 623 billion English tokens, 128 billion Southeast Asia tokens, and 91 billion Chinese language tokens.

With SEA-LION, Singapore goals to drive the event of LLMs that higher replicate Southeast Asia’s societal combine and exhibit stronger contextual understanding of the area’s cultures and languages. 

The partnership goals to push ahead a “custom-made basis mannequin” for Southeast Asia and made by Southeast Asians, based on Leslie Teo, AISG’s senior director of AI merchandise. The 2 organizations may even look to construct use circumstances, gas SEA-LION’s adoption, and assist organizations “scale AI safely and responsibly,” Teo stated. 

The collaboration encompasses efforts to include AI governance into SEA-LION, so companies can higher navigate compliance, danger administration, and mannequin lifecycle administration, whilst authorities rules on AI proceed to evolve. 

“[IBM] believes additional progress of GenAI will carry better efficiency in smaller language fashions, with customers given the chance to personalize fashions based mostly on their enterprise and business necessities,” Catherine Lian, IBM Asean’s basic supervisor and expertise chief, stated in an announcement. 

Additionally: Generative AI could also be creating extra work than it saves

“Nobody mannequin is a one-size-fits-all for companies, and organizations have to be empowered with a alternative to make use of their fashions based mostly on their wants,” Lian stated. “[The] SEA-LION LLM is a giant step ahead in creating an open AI system and addressing the Asean language challenges that firms and governments face when working with AI.”

In March, AISG additionally introduced a partnership with Google to reinforce datasets used to coach, fine-tune, and assess AI fashions in languages particular to Southeast Asia. Known as Challenge Southeast Asian Languages in One Community Information, the initiative goals to “enhance cultural context consciousness” in LLMs constructed for the area.

Initially, the mission will concentrate on Indonesian, Thai, Tamil, Filipino, and Burmese — languages for which AISG and Google will develop translocalization and translation fashions. They may even construct instruments to assist scale translocalization capabilities, share greatest practices for tuning datasets, and publish pre-training guides for Southeast Asian languages. 



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments