Back to index

lightning-sunbird  0.9+nobinonly
Public Types | Public Member Functions | Protected Member Functions | Protected Attributes
nsGBKToUnicode Class Reference

A character set converter from GBK to Unicode. More...

#include <nsGBKToUnicode.h>

Inheritance diagram for nsGBKToUnicode:
Inheritance graph
[legend]
Collaboration diagram for nsGBKToUnicode:
Collaboration graph
[legend]

List of all members.

Public Types

enum  { kOnError_Recover, kOnError_Signal }

Public Member Functions

 nsGBKToUnicode ()
 Class constructor.
NS_IMETHOD Convert (const char *aSrc, PRInt32 *aSrcLength, PRUnichar *aDest, PRInt32 *aDestLength)
 Converts the data from one Charset to Unicode.
NS_IMETHOD Reset ()
 Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.
NS_IMETHOD GetMaxLength (const char *aSrc, PRInt32 aSrcLength, PRInt32 *aDestLength)
 Returns a quick estimation of the size of the buffer needed to hold the converted data.

Protected Member Functions

NS_IMETHOD ConvertNoBuff (const char *aSrc, PRInt32 *aSrcLength, PRUnichar *aDest, PRInt32 *aDestLength)
 Convert method but without the buffer management stuff.
virtual void CreateExtensionDecoder ()
virtual void Create4BytesDecoder ()
PRBool TryExtensionDecoder (const char *aSrc, PRUnichar *aDest)
PRBool Try4BytesDecoder (const char *aSrc, PRUnichar *aDest)
virtual PRBool DecodeToSurrogate (const char *aSrc, PRUnichar *aDest)
void FillBuffer (const char **aSrc, PRInt32 aSrcLength)
void DoubleBuffer ()

Protected Attributes

nsGBKConvUtil mUtil
nsCOMPtr< nsIUnicodeDecodermExtensionDecoder
nsCOMPtr< nsIUnicodeDecoderm4BytesDecoder
char * mBuffer
 Internal buffer for partial conversions.
PRInt32 mBufferCapacity
PRInt32 mBufferLength
PRUint32 mMaxLengthFactor

Detailed Description

A character set converter from GBK to Unicode.

07/Sept/1999

Author:
Yueheng Xu, Yuehe.nosp@m.ng.X.nosp@m.u@int.nosp@m.el.c.nosp@m.om

Definition at line 55 of file nsGBKToUnicode.h.


Member Enumeration Documentation

anonymous enum [inherited]
Enumerator:
kOnError_Recover 
kOnError_Signal 

Definition at line 98 of file nsIUnicodeDecoder.h.

       {
    kOnError_Recover,       // on an error, recover and continue
    kOnError_Signal         // on an error, stop and signal
  };

Constructor & Destructor Documentation

Class constructor.

Definition at line 62 of file nsGBKToUnicode.h.


Member Function Documentation

NS_IMETHODIMP nsBufferDecoderSupport::Convert ( const char *  aSrc,
PRInt32 aSrcLength,
PRUnichar aDest,
PRInt32 aDestLength 
) [virtual, inherited]

Converts the data from one Charset to Unicode.

About the byte ordering:

  • For input, if the converter cares (that depends of the charset, for example a singlebyte will ignore the byte ordering) it should assume network order. If necessary and requested, we can add a method SetInputByteOrder() so that the reverse order can be used, too. That method would have as default the assumed network order.
  • The output stream is Unicode, having the byte order which is internal for the machine on which the converter is running on.

Unless there is not enough output space, this method must consume all the available input data! The eventual incomplete final character data will be stored internally in the converter and used when the method is called again for continuing the conversion. This way, the caller will not have to worry about managing incomplete input data by mergeing it with the next buffer.

Error conditions: If the read value does not belong to this character set, one should replace it with the Unicode special 0xFFFD. When an actual input error is encountered, like a format error, the converter stop and return error. Hoever, we should keep in mind that we need to be lax in decoding.

Converter required behavior: In this order: when output space is full - return right away. When input data is wrong, return input pointer right after the wrong byte. When partial input, it will be consumed and cached. All the time input pointer will show how much was actually consumed and how much was actually written.

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN/OUT] the length of source data buffer; after conversion will contain the number of bytes read
aDest[OUT] the destination data buffer
aDestLength[IN/OUT] the length of the destination data buffer; after conversion will contain the number of Unicode characters written
Returns:
NS_PARTIAL_MORE_INPUT if only a partial conversion was done; more input is needed to continue NS_PARTIAL_MORE_OUTPUT if only a partial conversion was done; more output space is needed to continue NS_ERROR_ILLEGAL_INPUT if an illegal input sequence was encountered and the behavior was set to "signal"

Implements nsIUnicodeDecoder.

Definition at line 114 of file nsUCSupport.cpp.

{
  // we do all operations using pointers internally
  const char * src = aSrc;
  const char * srcEnd = aSrc + *aSrcLength;
  PRUnichar * dest = aDest;
  PRUnichar * destEnd = aDest + *aDestLength;

  PRInt32 bcr, bcw; // byte counts for read & write;
  nsresult res = NS_OK;

  // do we have some residual data from the last conversion?
  if (mBufferLength > 0) if (dest == destEnd) {
    res = NS_OK_UDEC_MOREOUTPUT;
  } else for (;;) {
    // we need new data to add to the buffer
    if (src == srcEnd) {
      res = NS_OK_UDEC_MOREINPUT;
      break;
    }

    // fill that buffer
    PRInt32 buffLen = mBufferLength;  // initial buffer length
    FillBuffer(&src, srcEnd - src);

    // convert that buffer
    bcr = mBufferLength;
    bcw = destEnd - dest;
    res = ConvertNoBuff(mBuffer, &bcr, dest, &bcw);
    dest += bcw;

    if ((res == NS_OK_UDEC_MOREINPUT) && (bcw == 0)) {
        res = NS_ERROR_UNEXPECTED;
#if defined(DEBUG_yokoyama) || defined(DEBUG_ftang)
        NS_ASSERTION(0, "This should not happen. Internal buffer may be corrupted.");
#endif
        break;
    } else {
      if (bcr < buffLen) {
        // we didn't convert that residual data - unfill the buffer
        src -= mBufferLength - buffLen;
        mBufferLength = buffLen;
#if defined(DEBUG_yokoyama) || defined(DEBUG_ftang)
        NS_ASSERTION(0, "This should not happen. Internal buffer may be corrupted.");
#endif
      } else {
        // the buffer and some extra data was converted - unget the rest
        src -= mBufferLength - bcr;
        mBufferLength = 0;
        res = NS_OK;
      }
      break;
    }
  }

  if (res == NS_OK) {
    bcr = srcEnd - src;
    bcw = destEnd - dest;
    res = ConvertNoBuff(src, &bcr, dest, &bcw);
    src += bcr;
    dest += bcw;

    // if we have partial input, store it in our internal buffer.
    if (res == NS_OK_UDEC_MOREINPUT) {
      bcr = srcEnd - src;
      // make sure buffer is large enough
      if (bcr > mBufferCapacity) {
          // somehow we got into an error state and the buffer is growing out of control
          res = NS_ERROR_UNEXPECTED;
      } else {
          FillBuffer(&src, bcr);
      }
    }
  }

  *aSrcLength   -= srcEnd - src;
  *aDestLength  -= destEnd - dest;
  return res;
}

Here is the call graph for this function:

NS_IMETHODIMP nsGBKToUnicode::ConvertNoBuff ( const char *  aSrc,
PRInt32 aSrcLength,
PRUnichar aDest,
PRInt32 aDestLength 
) [protected, virtual]

Convert method but without the buffer management stuff.

Implements nsBufferDecoderSupport.

Definition at line 142 of file nsGBKToUnicode.cpp.

{
  PRInt32 i=0;
  PRInt32 iSrcLength = (*aSrcLength);
  PRInt32 iDestlen = 0;
  nsresult rv=NS_OK;
  *aSrcLength = 0;
  
  for (i=0;i<iSrcLength;i++)
  {
    if ( iDestlen >= (*aDestLength) )
    {
      rv = NS_OK_UDEC_MOREOUTPUT;
      break;
    }
    // The valid range for the 1st byte is [0x81,0xFE] 
    if(LEGAL_GBK_MULTIBYTE_FIRST_BYTE(*aSrc))
    {
      if(i+1 >= iSrcLength) 
      {
        rv = NS_OK_UDEC_MOREINPUT;
        break;
      }
      // To make sure, the second byte has to be checked as well.
      // In GBK, the second byte range is [0x40,0x7E] and [0x80,0XFE]
      if(LEGAL_GBK_2BYTE_SECOND_BYTE(aSrc[1]))
      {
        // Valid GBK code
        *aDest = mUtil.GBKCharToUnicode(aSrc[0], aSrc[1]);
        if(UCS2_NO_MAPPING == *aDest)
        { 
          // We cannot map in the common mapping, let's call the
          // delegate 2 byte decoder to decode the gbk or gb18030 unique 
          // 2 byte mapping
          if(! TryExtensionDecoder(aSrc, aDest))
          {
            *aDest = UCS2_NO_MAPPING;
          }
        }
        aSrc += 2;
        i++;
      }
      else if (LEGAL_GBK_4BYTE_SECOND_BYTE(aSrc[1]))
      {
        // from the first 2 bytes, it looks like a 4 byte GB18030
        if(i+3 >= iSrcLength)  // make sure we got 4 bytes
        {
          rv = NS_OK_UDEC_MOREINPUT;
          break;
        }
        // 4 bytes patten
        // [0x81-0xfe][0x30-0x39][0x81-0xfe][0x30-0x39]
        // preset the 
 
        if (LEGAL_GBK_4BYTE_THIRD_BYTE(aSrc[2]) &&
            LEGAL_GBK_4BYTE_FORTH_BYTE(aSrc[3]))
        {
           if ( ! FIRST_BYTE_IS_SURROGATE(aSrc[0])) 
           {
             // let's call the delegated 4 byte gb18030 converter to convert it
             if(! Try4BytesDecoder(aSrc, aDest))
               *aDest = UCS2_NO_MAPPING;
           } else {
              // let's try supplement mapping
             NS_ASSERTION(( (iDestlen+1) <= (*aDestLength) ), "no enouth output memory");
             if ( (iDestlen+1) <= (*aDestLength) )
             {
               if(DecodeToSurrogate(aSrc, aDest))
               {
                 // surrogte two PRUnichar
                 iDestlen++;
                 aDest++;
               }  else {
                 *aDest = UCS2_NO_MAPPING;
              }
             } else {
               *aDest = UCS2_NO_MAPPING;
             }
           }
        } else {
          *aDest = UCS2_NO_MAPPING; 
        }
        aSrc += 4;
        i+=3;
      }
      else if ((PRUint8) aSrc[0] == (PRUint8)0xA0 )
      {
        // stand-alone (not followed by a valid second byte) 0xA0 !
        // treat it as valid a la Netscape 4.x
        *aDest = CAST_CHAR_TO_UNICHAR(*aSrc);
        aSrc++;
      } else {
        // Invalid GBK code point (second byte should be 0x40 or higher)
        *aDest = UCS2_NO_MAPPING;
        aSrc++;
      }
    } else {
      if(IS_ASCII(*aSrc))
      {
        // The source is an ASCII
        *aDest = CAST_CHAR_TO_UNICHAR(*aSrc);
        aSrc++;
      } else {
        if(IS_GBK_EURO(*aSrc)) {
          *aDest = UCS2_EURO;
        } else {
          *aDest = UCS2_NO_MAPPING;
        }
        aSrc++;
      }
    }
    iDestlen++;
    aDest++;
    *aSrcLength = i+1;
  }
  *aDestLength = iDestlen;
  return rv;
}

Here is the call graph for this function:

void nsGBKToUnicode::Create4BytesDecoder ( ) [protected, virtual]

Reimplemented in nsGB18030ToUnicode.

Definition at line 269 of file nsGBKToUnicode.cpp.

Here is the caller graph for this function:

void nsGBKToUnicode::CreateExtensionDecoder ( ) [protected, virtual]

Reimplemented in nsGB18030ToUnicode.

Definition at line 265 of file nsGBKToUnicode.cpp.

Here is the caller graph for this function:

PRBool nsGBKToUnicode::DecodeToSurrogate ( const char *  aSrc,
PRUnichar aDest 
) [protected, virtual]

Reimplemented in nsGB18030ToUnicode.

Definition at line 332 of file nsGBKToUnicode.cpp.

{
  return PR_FALSE;
}

Here is the caller graph for this function:

void nsBufferDecoderSupport::DoubleBuffer ( ) [protected, inherited]

Definition at line 102 of file nsUCSupport.cpp.

{
  mBufferCapacity *= 2;
  char * newBuffer = new char [mBufferCapacity];
  if (mBufferLength > 0) memcpy(newBuffer, mBuffer, mBufferLength);
  delete [] mBuffer;
  mBuffer = newBuffer;
}

Here is the call graph for this function:

void nsBufferDecoderSupport::FillBuffer ( const char **  aSrc,
PRInt32  aSrcLength 
) [protected, inherited]

Definition at line 94 of file nsUCSupport.cpp.

{
  PRInt32 bcr = PR_MIN(mBufferCapacity - mBufferLength, aSrcLength);
  memcpy(mBuffer + mBufferLength, *aSrc, bcr);
  mBufferLength += bcr;
  (*aSrc) += bcr;
}

Here is the call graph for this function:

Here is the caller graph for this function:

NS_IMETHODIMP nsBufferDecoderSupport::GetMaxLength ( const char *  aSrc,
PRInt32  aSrcLength,
PRInt32 aDestLength 
) [virtual, inherited]

Returns a quick estimation of the size of the buffer needed to hold the converted data.

Remember: this estimation is >= with the actual size of the buffer needed. It will be computed for the "worst case"

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN] the length of source data buffer
aDestLength[OUT] the needed size of the destination buffer
Returns:
NS_EXACT_LENGTH if an exact length was computed NS_OK is all we have is an approximation

Implements nsIUnicodeDecoder.

Definition at line 203 of file nsUCSupport.cpp.

{
  NS_ASSERTION(mMaxLengthFactor != 0, "Must override GetMaxLength!");
  *aDestLength = aSrcLength * mMaxLengthFactor;
  return NS_OK;
}
NS_IMETHODIMP nsBufferDecoderSupport::Reset ( ) [virtual, inherited]

Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.

Implements nsIUnicodeDecoder.

Reimplemented in nsBasicUTF7Decoder.

Definition at line 197 of file nsUCSupport.cpp.

{
  mBufferLength = 0;
  return NS_OK;
}

Here is the caller graph for this function:

PRBool nsGBKToUnicode::Try4BytesDecoder ( const char *  aSrc,
PRUnichar aDest 
) [protected]

Definition at line 336 of file nsGBKToUnicode.cpp.

{
  if(!m4BytesDecoder)
    Create4BytesDecoder();
  if(m4BytesDecoder)
  {
    nsresult res = m4BytesDecoder->Reset();
    NS_ASSERTION(NS_SUCCEEDED(res), "4 bytes unique conversoin reset failed");
    PRInt32 len = 4;
    PRInt32 dstlen = 1;
    res = m4BytesDecoder->Convert(aSrc,&len, aOut, &dstlen); 
    NS_ASSERTION(NS_FAILED(res) || ((len==4) && (dstlen == 1)), 
       "some strange conversion result");
     // if we failed, we then just use the 0xfffd 
     // therefore, we ignore the res here. 
    if(NS_SUCCEEDED(res)) 
      return PR_TRUE;
  }
  return  PR_FALSE;
}

Here is the call graph for this function:

Here is the caller graph for this function:

PRBool nsGBKToUnicode::TryExtensionDecoder ( const char *  aSrc,
PRUnichar aDest 
) [protected]

Definition at line 311 of file nsGBKToUnicode.cpp.

{
  if(!mExtensionDecoder)
    CreateExtensionDecoder();
  NS_ASSERTION(mExtensionDecoder, "cannot creqte 2 bytes unique converter");
  if(mExtensionDecoder)
  {
    nsresult res = mExtensionDecoder->Reset();
    NS_ASSERTION(NS_SUCCEEDED(res), "2 bytes unique conversoin reset failed");
    PRInt32 len = 2;
    PRInt32 dstlen = 1;
    res = mExtensionDecoder->Convert(aSrc,&len, aOut, &dstlen); 
    NS_ASSERTION(NS_FAILED(res) || ((len==2) && (dstlen == 1)), 
       "some strange conversion result");
     // if we failed, we then just use the 0xfffd 
     // therefore, we ignore the res here. 
    if(NS_SUCCEEDED(res)) 
      return PR_TRUE;
  }
  return  PR_FALSE;
}

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation

Definition at line 77 of file nsGBKToUnicode.h.

char* nsBufferDecoderSupport::mBuffer [protected, inherited]

Internal buffer for partial conversions.

Definition at line 131 of file nsUCSupport.h.

Definition at line 132 of file nsUCSupport.h.

Definition at line 133 of file nsUCSupport.h.

Definition at line 76 of file nsGBKToUnicode.h.

Definition at line 135 of file nsUCSupport.h.

Definition at line 75 of file nsGBKToUnicode.h.


The documentation for this class was generated from the following files: