NIPO ODIN Version 5.17Multi-byte Character Fields (MBCSFields Setting)Note: With the setting The NIPO CATI / Web Master receives U- and O-file data from the NIPO CATI Clients and/or Web Client in the Unicode format. By default, the NIPO CATI / Web Master converts these files to a non-Unicode (ASCII) format using the configured code page, assuming the configuration was not set to store as Unicode file. Some (mostly Asian) code pages use multi-byte characters. A single character may be stored in one, two or 3 bytes. Text files store these characters run-length. This means that two lines with an equal amount of characters may need a different amount of actual bytes (positions) to store the text. During conversion to the configured code page, multi-byte character encoding in the U-file can cause data to be ‘horizontally shifted’. This happens in The setting Consider the following script: *Q 10 *ALPHA 61L15 In a single byte character set (SBCS) such as in West-European languages, the interviewer may enter a text of up to 15 characters. After conversion from Unicode to the configured code page, the U-file record may look like this: pos 61 76 80 Running the same questionnaire in the NIPO Fieldwork System using a multiple byte character set (MBCS) such as Japanese, the interviewer is still allowed to enter a text of up to 15 characters. For storage however each character may need more than one position, the exact length depending on the number of bytes required per character. The code page conversion is unaware of the data length. The positions may end up like this in the U-file (hypothetical assumption): pos 61 76 86 89 While the text ???? ????? consists of only 10 characters, most characters use more than 1 byte to store the character in a MBCS. The result length of the data varies depending on which characters are used. The length of the result of the conversion is not the same as the length of the initial text that was entered by the interviewer. In short: The length of each U-record in number of bytes is determined by the characters it consists of. Thus, the content determines the length. In a worst case scenario, no record has the same length. The data on position after an Simply storing data in Unicode format would solve this issue as each character in UTF-16 occupies the same amount of bytes (a 15 character field would always be converted to a 30-byte text). However, some customers need to stick to the non-Unicode format because their data analysis products do not support Unicode. Customers that use multi-byte character code pages need a solution to properly store the The solution to this is two-fold: the NIPO ODIN Script writer needs to reserve dummy positions in the Q-file, after the Survey configuration file setting for MBCS fields [Config] This setting changes to NIPO CATI / Web Master storage of U-file data, in particular the conversion from (internally used) Unicode to MBCS file format. |
|||
|