Need a Win32 function to convert UTF-8 to ANSI

Discussion:

(too old to reply)

Lilian Pigallio

2004-05-07 13:25:26 UTC

Hi all,

I use Power Builder and I need a Win32 function (or a .DLL) to convert an
UTF-8
string to an ANSI string.

Lilian.

Jochen Kalmbach

2004-05-07 13:40:27 UTC

Permalink

Post by Lilian Pigallio
I use Power Builder and I need a Win32 function (or a .DLL) to convert an
UTF-8
string to an ANSI string.

See: WideCharToMultiByte in kernel32.dll
http://msdn.microsoft.com/library/en-us/intl/unicode_2bj9.asp

--
Greetings
Jochen

Do you need a memory-leak finder ?
http://www.codeproject.com/tools/leakfinder.asp

Do you need daily reports from your server ?
http://sourceforge.net/projects/srvreport/

Lilian Pigallio

2004-05-07 15:01:59 UTC

Permalink

Thanks... but I don't really understand this function :((

It converts an UTF-8 string to ANSI ?? Using which parameters ?
CodePage= CP_UTF8 ?

Lilian.

Post by Jochen Kalmbach

Post by Lilian Pigallio
I use Power Builder and I need a Win32 function (or a .DLL) to convert an
UTF-8
string to an ANSI string.

See: WideCharToMultiByte in kernel32.dll
http://msdn.microsoft.com/library/en-us/intl/unicode_2bj9.asp
--
Greetings
Jochen
Do you need a memory-leak finder ?
http://www.codeproject.com/tools/leakfinder.asp
Do you need daily reports from your server ?
http://sourceforge.net/projects/srvreport/

Mihai N.

2004-05-08 05:29:47 UTC

Permalink

Post by Lilian Pigallio
It converts an UTF-8 string to ANSI ?? Using which parameters ?
CodePage= CP_UTF8 ?

No.
You should convert UTF-8 to UTF16 (UCS2) using MultiByteToWideChar,
(code page CP_UTF8), then UTF16 to ANSI using WideCharToMultiByte.

Warning: CP_UTF8 is not supported by Windows 95.
Solutions:
1. Use Microsoft Layer for Unicode (MSLU, from
http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx)
and explicitely use MultiByteToWideChar exported by unicows.dll.
2. Use the code provided by the Unicode Consortium
(ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF/)

------------------------------------------------
Another option is to use MLang. You need at least IE 4.0

One of the interfaces exposed by MLang is IMLangConvertCharset

And between other members of IMLangConvertCharset you have
DoConversionFromUnicode =~ WideCharToMultiByte
DoConversionToUnicode =~ MultiByteToWideChar
DoConversion = direct conversion for one code page to another

--
Mihai
-------------------------
Replace _year_ with _ to get the real email

Lilian Pigallio

2004-05-10 08:44:37 UTC

Permalink

Ok, now that works :))

Thanks.

Post by Mihai N.

Post by Lilian Pigallio
It converts an UTF-8 string to ANSI ?? Using which parameters ?
CodePage= CP_UTF8 ?

No.
You should convert UTF-8 to UTF16 (UCS2) using MultiByteToWideChar,
(code page CP_UTF8), then UTF16 to ANSI using WideCharToMultiByte.
Warning: CP_UTF8 is not supported by Windows 95.
1. Use Microsoft Layer for Unicode (MSLU, from
http://www.microsoft.com/globaldev/handson/dev/mslu_announce.mspx)
and explicitely use MultiByteToWideChar exported by unicows.dll.
2. Use the code provided by the Unicode Consortium
(ftp://ftp.unicode.org/Public/PROGRAMS/CVTUTF/)
------------------------------------------------
Another option is to use MLang. You need at least IE 4.0
One of the interfaces exposed by MLang is IMLangConvertCharset
And between other members of IMLangConvertCharset you have
DoConversionFromUnicode =~ WideCharToMultiByte
DoConversionToUnicode =~ MultiByteToWideChar
DoConversion = direct conversion for one code page to another
--
Mihai
-------------------------
Replace _year_ with _ to get the real email

Lilian Pigallio

2004-05-10 07:43:05 UTC

Permalink

Declare this WIN32 function:

Function Long MultiByteToWideChar(UnsignedLong CodePage, Ulong dwFlags,
string lpMultiByteStr, Long cbMultiByte, REF blob lpWideCharStr, Long
cchWideChar) Library "kernel32.dll"

and copy this script:
// Reserve 2000 Unicode Chars
lblb_wide_chars = blob( space(4000) )

// Concert UTF-8 to UTF-16
li_return = MultiByteToWideChar(65001, 0, ls_line, -1, lblb_wide_chars,
2000)
IF li_return > 0 THEN
// Convert UTF-16 to ANSI
ls_line = FromUnicode( lblb_wide_chars ) //PB9 function
END IF

Post by Lilian Pigallio
Hi all,
I use Power Builder and I need a Win32 function (or a .DLL) to convert an
UTF-8
string to an ANSI string.
Lilian.

Lilian Pigallio

2004-05-10 12:51:07 UTC

Permalink

But anoher questions: how to detect UTF-8 strings ?

IsTextUnicode seems to works only with UTF-16 strings (it always return
false with UTF-8 strings)

Lilian.

Post by Lilian Pigallio
Function Long MultiByteToWideChar(UnsignedLong CodePage, Ulong dwFlags,
string lpMultiByteStr, Long cbMultiByte, REF blob lpWideCharStr, Long
cchWideChar) Library "kernel32.dll"
// Reserve 2000 Unicode Chars
lblb_wide_chars = blob( space(4000) )
// Concert UTF-8 to UTF-16
li_return = MultiByteToWideChar(65001, 0, ls_line, -1, lblb_wide_chars,
2000)
IF li_return > 0 THEN
// Convert UTF-16 to ANSI
ls_line = FromUnicode( lblb_wide_chars ) //PB9 function
END IF

Post by Lilian Pigallio
Hi all,
I use Power Builder and I need a Win32 function (or a .DLL) to convert an
UTF-8
string to an ANSI string.
Lilian.

Carl Appellof

2004-05-10 21:34:02 UTC

Permalink

Post by Lilian Pigallio
But anoher questions: how to detect UTF-8 strings ?
IsTextUnicode seems to works only with UTF-16 strings (it always return
false with UTF-8 strings)
Lilian.

Given an arbitrary array of bytes, there's no real way to tell if it
contains useful information encoded as UTF-8. You may notice that the
description of IsTextUnicode() says that it is not ***guaranteed*** to give
you the right answer. It can only make a good guess based on statistical
information, or can tell you that all 16-bit characters look like UNICODE
versions of ANSI characters.

With UTF-8, it's even more difficult. The only thing you know for sure is
that a UTF-8 string is terminated by a byte with a value of 0. (Also, I
think the value of the byte before the 0 must be < 128, as the sign bit in a
UTF-8 character string indicates the start or continuation of a multibyte
character, and a single multibyte character can't end with a 0 byte.)

Do you know anything else about the string? Do you know its length, at
least?

A simple example to show the problem: A byte array with values 0x41, 0x00,
0x42, 0x00, 0x00, 0x00 could be interpreted as a UTF-8 string "a"
(terminating with the first 0 byte), or as a UNICODE (UTF-16) string "ab"
(terminating with the first double-zero). Which do you pick?

Carl