Back to index

lightning-sunbird  0.9+nobinonly
Public Types | Public Member Functions | Protected Attributes
nsUnicodeToUTF8 Class Reference

A character set converter from Unicode to UTF8. More...

#include <nsUnicodeToUTF8.h>

Inheritance diagram for nsUnicodeToUTF8:
Inheritance graph
[legend]
Collaboration diagram for nsUnicodeToUTF8:
Collaboration graph
[legend]

List of all members.

Public Types

enum  { kOnError_Signal, kOnError_CallBack, kOnError_Replace }

Public Member Functions

 nsUnicodeToUTF8 ()
 Class constructor.
NS_IMETHOD FillInfo (PRUint32 *aInfo)
NS_IMETHOD Convert (const PRUnichar *aSrc, PRInt32 *aSrcLength, char *aDest, PRInt32 *aDestLength)
 Converts the data from Unicode to a Charset.
NS_IMETHOD Finish (char *aDest, PRInt32 *aDestLength)
 Finishes the conversion.
NS_IMETHOD GetMaxLength (const PRUnichar *aSrc, PRInt32 aSrcLength, PRInt32 *aDestLength)
 Returns a quick estimation of the size of the buffer needed to hold the converted data.
NS_IMETHOD Reset ()
 Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.
NS_IMETHOD SetOutputErrorBehavior (PRInt32 aBehavior, nsIUnicharEncoder *aEncoder, PRUnichar aChar)
 Specify what to do when a character cannot be mapped into the dest charset.

Protected Attributes

PRUnichar mHighSurrogate

Detailed Description

A character set converter from Unicode to UTF8.

05/Apr/1999

Author:
Catalin Rotaru [CATA]

Definition at line 60 of file nsUnicodeToUTF8.h.


Member Enumeration Documentation

anonymous enum [inherited]
Enumerator:
kOnError_Signal 
kOnError_CallBack 
kOnError_Replace 

Definition at line 136 of file nsIUnicodeEncoder.h.

       {
    kOnError_Signal,        // on an error, stop and signal
    kOnError_CallBack,      // on an error, call the error handler
    kOnError_Replace       // on an error, replace with a different character
  };

Constructor & Destructor Documentation

Class constructor.

Definition at line 69 of file nsUnicodeToUTF8.h.


Member Function Documentation

NS_IMETHODIMP nsUnicodeToUTF8::Convert ( const PRUnichar aSrc,
PRInt32 aSrcLength,
char *  aDest,
PRInt32 aDestLength 
) [virtual]

Converts the data from Unicode to a Charset.

About the byte ordering:

  • The input stream is Unicode, having the byte order which is internal for the machine on which the converter is running on.
  • For output, if the converter cares (that depends of the charset, for example a singlebyte will ignore the byte ordering) it should assume network order. If necessary and requested, we can add a method SetOutputByteOrder() so that the reverse order can be used, too. That method would have as default the assumed network order.

Unless there is not enough output space, this method must consume all the available input data! We don't have partial input for the Unicode charset. And for the last converted char, even if there is not enought output space, a partial ouput must be done until all available space will be used. The rest of the output should be buffered until more space becomes available. But this is not also true about the error handling method!!! So be very, very careful...

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN/OUT] the length of source data buffer; after conversion will contain the number of Unicode characters read
aDest[OUT] the destination data buffer
aDestLength[IN/OUT] the length of the destination data buffer; after conversion will contain the number of bytes written
Returns:
NS_OK_UENC_MOREOUTPUT if only a partial conversion was done; more output space is needed to continue NS_ERROR_UENC_NOMAPPING if character without mapping was encountered and the behavior was set to "signal".

Implements nsIUnicodeEncoder.

Definition at line 68 of file nsUnicodeToUTF8.cpp.

{
  const PRUnichar * src = aSrc;
  const PRUnichar * srcEnd = aSrc + *aSrcLength;
  char * dest = aDest;
  PRInt32 destLen = *aDestLength;
  PRUint32 n;

  //complete remaining of last conversion
  if (mHighSurrogate) {
    if (src < srcEnd) {
      *aDestLength = 0;
      return NS_OK_UENC_MOREINPUT;
    }
    if (*aDestLength < 4) {
      *aSrcLength = 0;
      *aDestLength = 0;
      return NS_OK_UENC_MOREOUTPUT;
    }
    if (*src < (PRUnichar)0xdc00 || *src > (PRUnichar)0xdfff) { //not a pair
      *dest++ = (char)0xe0 | (mHighSurrogate >> 12);
      *dest++ = (char)0x80 | ((mHighSurrogate >> 6) & 0x003f);
      *dest++ = (char)0x80 | (mHighSurrogate & 0x003f);
      destLen -= 3;
    } else { 
      n = ((mHighSurrogate - (PRUnichar)0xd800) << 10) + 
              (*src - (PRUnichar)0xdc00) + 0x10000;
      *dest++ = (char)0xf0 | (n >> 18);
      *dest++ = (char)0x80 | ((n >> 12) & 0x3f);
      *dest++ = (char)0x80 | ((n >> 6) & 0x3f);
      *dest++ = (char)0x80 | (n & 0x3f);
      ++src;
      destLen -= 4;
    }
    mHighSurrogate = 0;
  }

  while (src < srcEnd) {
    if ( *src <= 0x007f) {
      if (destLen < 1)
        goto error_more_output;
      *dest++ = (char)*src;
      --destLen;
    } else if (*src <= 0x07ff) {
      if (destLen < 2)
        goto error_more_output;
      *dest++ = (char)0xc0 | (*src >> 6);
      *dest++ = (char)0x80 | (*src & 0x003f);
      destLen -= 2;
    } else if (*src >= (PRUnichar)0xD800 && *src < (PRUnichar)0xDC00) {
      if ((src+1) >= srcEnd) {
        //we need another surrogate to complete this unicode char
        mHighSurrogate = *src;
        *aDestLength = dest - aDest;
        return NS_OK_UENC_MOREINPUT;
      }
      //handle surrogate
      if (destLen < 4)
        goto error_more_output;
      if (*(src+1) < (PRUnichar)0xdc00 || *(src+1) > 0xdfff) { //not a pair
        *dest++ = (char)0xe0 | (*src >> 12);
        *dest++ = (char)0x80 | ((*src >> 6) & 0x003f);
        *dest++ = (char)0x80 | (*src & 0x003f);
        destLen -= 3;
      } else {
        n = ((*src - (PRUnichar)0xd800) << 10) + (*(src+1) - (PRUnichar)0xdc00) + (PRUint32)0x10000;
        *dest++ = (char)0xf0 | (n >> 18);
        *dest++ = (char)0x80 | ((n >> 12) & 0x3f);
        *dest++ = (char)0x80 | ((n >> 6) & 0x3f);
        *dest++ = (char)0x80 | (n & 0x3f);
        destLen -= 4;
        ++src;
      }
    } else { 
      if (destLen < 3)
        goto error_more_output;
      //treat rest of the character as BMP
      *dest++ = (char)0xe0 | (*src >> 12);
      *dest++ = (char)0x80 | ((*src >> 6) & 0x003f);
      *dest++ = (char)0x80 | (*src & 0x003f);
      destLen -= 3;
    }
    ++src;
  }

  *aDestLength = dest - aDest;
  return NS_OK;

error_more_output:
  *aSrcLength = src - aSrc;
  *aDestLength = dest - aDest;
  return NS_OK_UENC_MOREOUTPUT;
}

Definition at line 62 of file nsUnicodeToUTF8.cpp.

{
  memset(aInfo, 0xFF, (0x10000L >> 3));
  return NS_OK;
}

Here is the call graph for this function:

NS_IMETHODIMP nsUnicodeToUTF8::Finish ( char *  aDest,
PRInt32 aDestLength 
) [virtual]

Finishes the conversion.

The converter has the possibility to write some extra data and flush its final state.

Parameters:
aDest[OUT] the destination data buffer
aDestLength[IN/OUT] the length of destination data buffer; after conversion it will contain the number of bytes written
Returns:
NS_OK_UENC_MOREOUTPUT if only a partial conversion was done; more output space is needed to continue

Implements nsIUnicodeEncoder.

Definition at line 165 of file nsUnicodeToUTF8.cpp.

{
  char * dest = aDest;

  if (mHighSurrogate) {
    if (*aDestLength < 3) {
      *aDestLength = 0;
      return NS_OK_UENC_MOREOUTPUT;
    }
    *dest++ = (char)0xe0 | (mHighSurrogate >> 12);
    *dest++ = (char)0x80 | ((mHighSurrogate >> 6) & 0x003f);
    *dest++ = (char)0x80 | (mHighSurrogate & 0x003f);
    mHighSurrogate = 0;
    *aDestLength = 3;
    return NS_OK;
  } 

  *aDestLength  = 0;
  return NS_OK;
}
NS_IMETHODIMP nsUnicodeToUTF8::GetMaxLength ( const PRUnichar aSrc,
PRInt32  aSrcLength,
PRInt32 aDestLength 
) [virtual]

Returns a quick estimation of the size of the buffer needed to hold the converted data.

Remember: this estimation is >= with the actual size of the buffer needed. It will be computed for the "worst case"

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN] the length of source data buffer
aDestLength[OUT] the needed size of the destination buffer
Returns:
NS_OK_UENC_EXACTLENGTH if an exact length was computed NS_OK if all we have is an approximation

Implements nsIUnicodeEncoder.

Definition at line 49 of file nsUnicodeToUTF8.cpp.

{
  // aSrc is interpreted as UTF16, 3 is normally enough.
  // But when previous buffer only contains part of the surrogate pair, we 
  // need to complete it here. If the first word in following buffer is not
  // in valid surrogate rang, we need to convert the remaining of last buffer 
  // to 3 bytes.
  *aDestLength = 3*aSrcLength + 3;
  return NS_OK;
}
NS_IMETHOD nsUnicodeToUTF8::Reset ( ) [inline, virtual]

Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.

Implements nsIUnicodeEncoder.

Definition at line 83 of file nsUnicodeToUTF8.h.

{mHighSurrogate = 0; return NS_OK;}
NS_IMETHOD nsUnicodeToUTF8::SetOutputErrorBehavior ( PRInt32  aBehavior,
nsIUnicharEncoder aEncoder,
PRUnichar  aChar 
) [inline, virtual]

Specify what to do when a character cannot be mapped into the dest charset.

Parameters:
aOrder[IN] the behavior; taken from the enum

Implements nsIUnicodeEncoder.

Definition at line 85 of file nsUnicodeToUTF8.h.

                                                   {return NS_OK;};

Member Data Documentation

Definition at line 86 of file nsUnicodeToUTF8.h.


The documentation for this class was generated from the following files: