ImageEn, unit iexBitmaps

TIOParams.DICOM_CharacterSet

TIOParams.DICOM_CharacterSet


Declaration

property DICOM_CharacterSet: string;


Description

Returns the value of the DICOM tag, "Specific Character Set" (0008, 0005).
If the value is "ISO_IR 192" then DICOM string tags may contain unicode values.
If the value is "ISO_IR 100" (and most other values) the string tags are 7-bit US-ASCII.
ImageEn does not convert other character sets, e.g. "ISO 2022 IR 87" (Japanese), to Unicode.


Example

s := Edit1.Text;
isUnicode := s <> AnsiString(s);
if isUnicode and ImageEnView1.IO.Params.DICOM_CharacterSet <> 'ISO_IR 192' then
  raise Exception.create( 'Unicode text cannot be specified for this file' );


More Information

The "Specific Character Set" tag (0008,0005) provides the only method to determine the encoding used in the dataset.
It specifies those character sets that may be used to encode those value representations affected by the choice of character set (see PS 3.5 6.1.2.2 for which those are), including names (PN).
The "Specific Character Set" tag (0008,0005) tag is usually absent, which constrains strings to strictly ISO 646 (7-bit US-ASCII). Otherwise, it usually has a single value, which means that all strings are in the specified character set, and no other.
It may be multivalued only when code extension techniques are in use, and so far that means using ISO 2022 escape sequences to switch between (only) those character sets listed in "Specific Character Set" tag (0008,0005).

The most common value for the tag is the single value "ISO_IR 100", which indicates that the Latin-1 ISO 8859-1 characters are available in the higher half (G1) of the 8 bit characters in addition to ISO 646 (US-ASCII) in the lower half (G0).
ISO 8859-1 covers most Western European characters, hence its popularity. ISO 8859-1 is equivalent to part of ECMA 94.

Recently, to support Chinese characters, Unicode has also been added to DICOM (CP 252). This is supported by specifying "ISO_IR 192" for "Specific Character Set" tag (0008,0005).
The side effect of introducing CP 252 is that theoretically one could now use "ISO_IR 192" all the time, and thus use UTF-8 encoding for all values, including European characters.
However, with "ISO_IR 100" (ISO 8859-1) encoded strings, European characters are not encoded the same way as are "ISO_IR 192" UTF-8 encoded strings, though in both cases the lower 7 bit characters are the same, US-ASCII.


See Also

- GetTagString
- dicom.nema.org/dicom/2013/output/chtml/part05/chapter_6.html
- dicom.nema.org/dicom/2013/output/chtml/part05/chapter_J.html
- dicom.nema.org/dicom/2013/output/chtml/part05/sect_6.2.html
- dicom.innolitics.com/ciods/cr-image/sop-common/00080005