z-logo
Premium
Platform‐independent code conversion within the C++ locale framework
Author(s) -
Engebretsen Lars
Publication year - 2006
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.734
Subject(s) - unicode , computer science , character encoding , programming language , character (mathematics) , code (set theory) , byte , source code , operating system , set (abstract data type) , natural language processing , geometry , mathematics
This paper describes some of the author's experiences from a C++ implementation of a concordance program for texts in Old West Norse (also known as Old Icelandic) and Runic Swedish. Since the input to the program used a character repertoire that no standard one‐byte character encoding covers, it was natural to use Unicode to represent data both inside the program and in external files. Inside the program, each character was represented with C++ ‘wide characters’; the input and output was represented in UTF‐8. The author constructed C++ code conversion facets that convert data between those two representations during file I/O. This enabled him to successfully compile, and run, the concordance program on both Linux (Fedora Core 3 with gcc 3.4.2) and Windows XP (using Visual C++ .NET 2003). The only necessary change to the source when changing platform was isolated to the lines selecting which code conversion facet to use—all other pieces of code remained unchanged. In particular, the author could still use the standard C++ locale framework for collation and code conversion, in spite of the fact that the library‐provided code conversion facets had been replaced. Copyright © 2006 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here