Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Multilingual Standards: XLIFF Lessons



Alexander O'Connor (ADAPT@DCU—IE)
@uberalex Alexander.OConnor@dcu.ie

Get the Data Out and Get it back in again, even if it's different.

the data is always different, and it can be very different

XLIFF Modules

Translation Candidates (matches, mtc:) Format Style (fs:) Metadata (mda:) Resource Data (res:) Change Tracking (ctr:) Size Restriction (slr:) Validation (val:)

Numerous stakeholders (from individual to large company), crossing many technical formats (Software strings, word to text and back). Challenges to vendor adoption in industry. Different objectives

TMX: Translation Memory eXchange

SRX: Segmentation Representation

Good examples of complimentary and duplicated features across the standards. SRX allows defining sometimes-complex rules for mixing bi-directional text, or handling to improve leverage. Worth noting they are also very quiet standards.

TermBase eXchange

Interchange

Dissemination

Analysis

TBX is designed to facilitate the following use cases:

    Interchange, such as that required to support
  • the flow of terminological data between technologies and systems
  • integration of terminological data from multiple sources
  • data conversion necessitated by a change in applications or technologies
  • Dissemination, including
    • querying multiple terminological databases through a single user interface by passing data through a
    • common intermediate format on a batch or dynamic basis
    • placing data on an online site for download by interested parties
    • making entries which require some work available for public feedback
    • making terminology available dynamically in networked applications through a Web service
    Analysis and representation, including
    • comparing the contents of various terminological databases
    • studying how lossless a conversion between two terminology databases can be
    • designing a new terminological database intended to minimize loss during conversion..

The technology described in this document “Internationalization Tag Set (ITS) 2.0“ enhances the foundation to integrate automated processing of human language into core Web technologies.

Includes terminology, good illustration of the breadth of use cases internationalisation implies.

Lemon heavily used (despite being revised in ontolex). TBX and Multilingual (as opp. Bilingual dicts). Best Practice guidelines from W3C.

OASIS XLIFF Object Model and Other Serializations (XLIFF OMOS) Technical Committee

  1. JSON
  2. TMX
  3. Other Formats

Lemon heavily used (despite being revised in ontolex). TBX and Multilingual (as opp. Bilingual dicts). Best Practice guidelines from W3C.

Some of these standards are ageing. LISA is gone, and there are new efforts over things like AR. The idea of separating transportation and processing representations and creating object or serialised formats might be worth expanding on.

Use the spacebar or arrow keys to navigate