Back to index

lightning-sunbird  0.9+nobinonly
Public Types | Public Member Functions | Protected Member Functions | Protected Attributes
nsUnicodeToGBKNoAscii Class Reference

A character set converter from Unicode to GBK (no ascii). More...

#include <nsUnicodeToGBKNoAscii.h>

Inheritance diagram for nsUnicodeToGBKNoAscii:
Inheritance graph
[legend]
Collaboration diagram for nsUnicodeToGBKNoAscii:
Collaboration graph
[legend]

List of all members.

Public Types

enum  { kOnError_Signal, kOnError_CallBack, kOnError_Replace }

Public Member Functions

NS_IMETHOD FillInfo (PRUint32 *aInfo)
 A character set converter from Unicode to GBK (no ascii).
NS_IMETHOD Convert (const PRUnichar *aSrc, PRInt32 *aSrcLength, char *aDest, PRInt32 *aDestLength)
 Converts the data from Unicode to a Charset.
NS_IMETHOD Finish (char *aDest, PRInt32 *aDestLength)
 Finishes the conversion.
NS_IMETHOD Reset ()
 Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.
NS_IMETHOD SetOutputErrorBehavior (PRInt32 aBehavior, nsIUnicharEncoder *aEncoder, PRUnichar aChar)
 Specify what to do when a character cannot be mapped into the dest charset.
NS_IMETHOD GetMaxLength (const PRUnichar *aSrc, PRInt32 aSrcLength, PRInt32 *aDestLength)
 Returns a quick estimation of the size of the buffer needed to hold the converted data.

Protected Member Functions

NS_IMETHOD ConvertNoBuff (const PRUnichar *aSrc, PRInt32 *aSrcLength, char *aDest, PRInt32 *aDestLength)
 Convert method but without the buffer management stuff and with error handling stuff.
NS_IMETHOD ConvertNoBuffNoErr (const PRUnichar *aSrc, PRInt32 *aSrcLength, char *aDest, PRInt32 *aDestLength)
 Convert method but without the buffer management stuff and without error handling stuff.
virtual void CreateExtensionEncoder ()
virtual void Create4BytesEncoder ()
PRBool TryExtensionEncoder (PRUnichar aChar, char *aDest, PRInt32 *aOutLen)
PRBool Try4BytesEncoder (PRUnichar aChar, char *aDest, PRInt32 *aOutLen)
virtual PRBool EncodeSurrogate (PRUnichar aSurrogateHigh, PRUnichar aSurrogateLow, char *aDest)
NS_IMETHOD FinishNoBuff (char *aDest, PRInt32 *aDestLength)
 Finish method but without the buffer management stuff.
nsresult FlushBuffer (char **aDest, const char *aDestEnd)
 Copy as much as possible from the internal buffer to the destination.

Protected Attributes

nsCOMPtr< nsIUnicodeEncodermExtensionEncoder
nsCOMPtr< nsIUnicodeEncoderm4BytesEncoder
PRUnichar mSurrogateHigh
nsGBKConvUtil mUtil
char * mBuffer
 Internal buffer for partial conversions.
PRInt32 mBufferCapacity
char * mBufferStart
char * mBufferEnd
PRInt32 mErrBehavior
 Error handling stuff.
nsCOMPtr< nsIUnicharEncodermErrEncoder
PRUnichar mErrChar
PRUint32 mMaxLengthFactor

Detailed Description

A character set converter from Unicode to GBK (no ascii).

Jan/24/2001

Author:
Brian Stell bstel.nosp@m.l@ne.nosp@m.tscap.nosp@m.e.co.nosp@m.m

Definition at line 54 of file nsUnicodeToGBKNoAscii.h.


Member Enumeration Documentation

anonymous enum [inherited]
Enumerator:
kOnError_Signal 
kOnError_CallBack 
kOnError_Replace 

Definition at line 136 of file nsIUnicodeEncoder.h.

       {
    kOnError_Signal,        // on an error, stop and signal
    kOnError_CallBack,      // on an error, call the error handler
    kOnError_Replace       // on an error, replace with a different character
  };

Member Function Documentation

NS_IMETHODIMP nsEncoderSupport::Convert ( const PRUnichar aSrc,
PRInt32 aSrcLength,
char *  aDest,
PRInt32 aDestLength 
) [virtual, inherited]

Converts the data from Unicode to a Charset.

About the byte ordering:

  • The input stream is Unicode, having the byte order which is internal for the machine on which the converter is running on.
  • For output, if the converter cares (that depends of the charset, for example a singlebyte will ignore the byte ordering) it should assume network order. If necessary and requested, we can add a method SetOutputByteOrder() so that the reverse order can be used, too. That method would have as default the assumed network order.

Unless there is not enough output space, this method must consume all the available input data! We don't have partial input for the Unicode charset. And for the last converted char, even if there is not enought output space, a partial ouput must be done until all available space will be used. The rest of the output should be buffered until more space becomes available. But this is not also true about the error handling method!!! So be very, very careful...

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN/OUT] the length of source data buffer; after conversion will contain the number of Unicode characters read
aDest[OUT] the destination data buffer
aDestLength[IN/OUT] the length of the destination data buffer; after conversion will contain the number of bytes written
Returns:
NS_OK_UENC_MOREOUTPUT if only a partial conversion was done; more output space is needed to continue NS_ERROR_UENC_NOMAPPING if character without mapping was encountered and the behavior was set to "signal".

Implements nsIUnicodeEncoder.

Definition at line 475 of file nsUCSupport.cpp.

{
  // we do all operations using pointers internally
  const PRUnichar * src = aSrc;
  const PRUnichar * srcEnd = aSrc + *aSrcLength;
  char * dest = aDest;
  char * destEnd = aDest + *aDestLength;

  PRInt32 bcr, bcw; // byte counts for read & write;
  nsresult res;

  res = FlushBuffer(&dest, destEnd);
  if (res == NS_OK_UENC_MOREOUTPUT) goto final;

  bcr = srcEnd - src;
  bcw = destEnd - dest;
  res = ConvertNoBuff(src, &bcr, dest, &bcw);
  src += bcr;
  dest += bcw;
  if ((res == NS_OK_UENC_MOREOUTPUT) && (dest < destEnd)) {
    // convert exactly one character into the internal buffer
    // at this point, there should be at least a char in the input
    for (;;) {
      bcr = 1;
      bcw = mBufferCapacity;
      res = ConvertNoBuff(src, &bcr, mBuffer, &bcw);

      if (res == NS_OK_UENC_MOREOUTPUT) {
        delete [] mBuffer;
        mBufferCapacity *= 2;
        mBuffer = new char [mBufferCapacity];
      } else {
        src += bcr;
        mBufferStart = mBufferEnd = mBuffer;
        mBufferEnd += bcw;
        break;
      }
    }

    res = FlushBuffer(&dest, destEnd);
  }

final:
  *aSrcLength   -= srcEnd - src;
  *aDestLength  -= destEnd - dest;
  return res;
}

Here is the call graph for this function:

NS_IMETHODIMP nsUnicodeToGBK::ConvertNoBuff ( const PRUnichar aSrc,
PRInt32 aSrcLength,
char *  aDest,
PRInt32 aDestLength 
) [protected, inherited]

Convert method but without the buffer management stuff and with error handling stuff.

Reimplemented from nsEncoderSupport.

Definition at line 402 of file nsUnicodeToGBK.cpp.

{
  PRInt32 iSrcLength = 0;
  PRInt32 iDestLength = 0;
  PRUnichar unicode;
  nsresult res = NS_OK;
  while (iSrcLength < *aSrcLength )
  {
    unicode = *aSrc;
    //if unicode's hi byte has something, it is not ASCII, must be a GB
    if(IS_ASCII(unicode))
    {
      // this is an ASCII
      *aDest = CAST_UNICHAR_TO_CHAR(*aSrc);
      aDest++; // increment 1 byte
      iDestLength +=1;
    } else {
      char byte1, byte2;
      if(mUtil.UnicodeToGBKChar( unicode, PR_FALSE, &byte1, &byte2))
      {
        // make sure we still have 2 bytes for output first
        if(iDestLength+2 > *aDestLength)
        {
          res = NS_OK_UENC_MOREOUTPUT;
          break;
        }
        aDest[0] = byte1;
        aDest[1] = byte2;
        aDest += 2;  // increment 2 bytes
        iDestLength +=2;
      } else {
        PRInt32 aOutLen = 2;
        // make sure we still have 2 bytes for output first
        if(iDestLength+2 > *aDestLength)
        {
          res = NS_OK_UENC_MOREOUTPUT;
          break;
        }
        // we cannot map in the common mapping. Let's try to
        // call the delegated 2 byte converter for the gbk or gb18030
        // unique 2 byte mapping
        if(TryExtensionEncoder(unicode, aDest, &aOutLen))
        {
          iDestLength += aOutLen;
          aDest += aOutLen;
        } else {
          // make sure we still have 4 bytes for output first
          if(iDestLength+4 > *aDestLength)
          {
            res = NS_OK_UENC_MOREOUTPUT;
            break;
          }
          // we still cannot map. Let's try to
          // call the delegated GB18030 4 byte converter 
          aOutLen = 4;
          if( IS_HIGH_SURROGATE(unicode) )
          {
            if((iSrcLength+1) < *aSrcLength ) {
              if(EncodeSurrogate(aSrc[0],aSrc[1], aDest)) {
                // since we got a surrogate pair, we need to increment src.
                iSrcLength++ ; 
                aSrc++;
                iDestLength += aOutLen;
                aDest += aOutLen;
              } else {
                // only get a high surrogate, but not a low surrogate
                res = NS_ERROR_UENC_NOMAPPING;
                iSrcLength++;   // include length of the unmapped character
                break;
              }
            } else {
              mSurrogateHigh = aSrc[0];
              break; // this will go to afterwhileloop
            }
          } else {
            if( IS_LOW_SURROGATE(unicode) )
            {
              if(IS_HIGH_SURROGATE(mSurrogateHigh)) {
                if(EncodeSurrogate(mSurrogateHigh, aSrc[0], aDest)) {
                  iDestLength += aOutLen;
                  aDest += aOutLen;
                } else {
                  // only get a high surrogate, but not a low surrogate
                  res = NS_ERROR_UENC_NOMAPPING;
                  iSrcLength++;   // include length of the unmapped character
                  break;
                }
              } else {
                // only get a low surrogate, but not a low surrogate
                res = NS_ERROR_UENC_NOMAPPING;
                iSrcLength++;   // include length of the unmapped character
                break;
              }
            } else {
              if(Try4BytesEncoder(unicode, aDest, &aOutLen))
              {
                NS_ASSERTION((aOutLen == 4), "we should always generate 4 bytes here");
                iDestLength += aOutLen;
                aDest += aOutLen;
              } else {
                res = NS_ERROR_UENC_NOMAPPING;
                iSrcLength++;   // include length of the unmapped character
                break;
              }
            }
          }
        }
      } 
    }
    iSrcLength++ ; // Each unicode char just count as one in PRUnichar string;        
    mSurrogateHigh = 0;
    aSrc++;
    if ( iDestLength >= (*aDestLength) && (iSrcLength < *aSrcLength) )
    {
      res = NS_OK_UENC_MOREOUTPUT;
      break;
    }
  }
//afterwhileloop:
  *aDestLength = iDestLength;
  *aSrcLength = iSrcLength;
  return res;
}

Here is the call graph for this function:

NS_IMETHOD nsUnicodeToGBK::ConvertNoBuffNoErr ( const PRUnichar aSrc,
PRInt32 aSrcLength,
char *  aDest,
PRInt32 aDestLength 
) [inline, protected, virtual, inherited]

Convert method but without the buffer management stuff and without error handling stuff.

Implements nsEncoderSupport.

Definition at line 75 of file nsUnicodeToGBK.h.

  {
    return NS_OK;
  };  // just make it not abstract;
void nsUnicodeToGBK::Create4BytesEncoder ( ) [protected, virtual, inherited]

Reimplemented in nsUnicodeToGB18030.

Definition at line 339 of file nsUnicodeToGBK.cpp.

Here is the caller graph for this function:

void nsUnicodeToGBK::CreateExtensionEncoder ( ) [protected, virtual, inherited]

Reimplemented in nsUnicodeToGB18030.

Definition at line 335 of file nsUnicodeToGBK.cpp.

Here is the caller graph for this function:

PRBool nsUnicodeToGBK::EncodeSurrogate ( PRUnichar  aSurrogateHigh,
PRUnichar  aSurrogateLow,
char *  aDest 
) [protected, virtual, inherited]

Reimplemented in nsUnicodeToGB18030.

Definition at line 394 of file nsUnicodeToGBK.cpp.

{
  return PR_FALSE; // GBK cannot encode Surrogate, let the subclass encode it.
} 

Here is the caller graph for this function:

A character set converter from Unicode to GBK (no ascii).

Jan/24/2001

Author:
Brian Stell bstel.nosp@m.l@ne.nosp@m.tscap.nosp@m.e.co.nosp@m.m Revision History

Reimplemented from nsUnicodeToGBK.

Definition at line 52 of file nsUnicodeToGBKNoAscii.cpp.

{
  nsresult rv = nsUnicodeToGBK::FillInfo(aInfo); // call the super class
  if(NS_SUCCEEDED(rv))
  {
    // mark the first 128 bits as 0. 4 x 32 bits = 128 bits
    aInfo[0] = aInfo[1] = aInfo[2] = aInfo[3] = 0;
  }
  return rv;
}
NS_IMETHODIMP nsEncoderSupport::Finish ( char *  aDest,
PRInt32 aDestLength 
) [virtual, inherited]

Finishes the conversion.

The converter has the possibility to write some extra data and flush its final state.

Parameters:
aDest[OUT] the destination data buffer
aDestLength[IN/OUT] the length of destination data buffer; after conversion it will contain the number of bytes written
Returns:
NS_OK_UENC_MOREOUTPUT if only a partial conversion was done; more output space is needed to continue

Implements nsIUnicodeEncoder.

Definition at line 526 of file nsUCSupport.cpp.

{
  // we do all operations using pointers internally
  char * dest = aDest;
  char * destEnd = aDest + *aDestLength;

  PRInt32 bcw; // byte count for write;
  nsresult res;

  res = FlushBuffer(&dest, destEnd);
  if (res == NS_OK_UENC_MOREOUTPUT) goto final;

  // do the finish into the internal buffer.
  for (;;) {
    bcw = mBufferCapacity;
    res = FinishNoBuff(mBuffer, &bcw);

    if (res == NS_OK_UENC_MOREOUTPUT) {
      delete [] mBuffer;
      mBufferCapacity *= 2;
      mBuffer = new char [mBufferCapacity];
    } else {
      mBufferStart = mBufferEnd = mBuffer;
      mBufferEnd += bcw;
      break;
    }
  }

  res = FlushBuffer(&dest, destEnd);

final:
  *aDestLength  -= destEnd - dest;
  return res;
}

Here is the call graph for this function:

NS_IMETHODIMP nsEncoderSupport::FinishNoBuff ( char *  aDest,
PRInt32 aDestLength 
) [protected, inherited]

Finish method but without the buffer management stuff.

Reimplemented in nsBasicUTF7Encoder, and nsUnicodeToISO2022JP.

Definition at line 443 of file nsUCSupport.cpp.

{
  *aDestLength = 0;
  return NS_OK;
}

Here is the caller graph for this function:

nsresult nsEncoderSupport::FlushBuffer ( char **  aDest,
const char *  aDestEnd 
) [protected, inherited]

Copy as much as possible from the internal buffer to the destination.

Definition at line 450 of file nsUCSupport.cpp.

{
  PRInt32 bcr, bcw; // byte counts for read & write;
  nsresult res = NS_OK;
  char * dest = *aDest;

  if (mBufferStart < mBufferEnd) {
    bcr = mBufferEnd - mBufferStart;
    bcw = aDestEnd - dest;
    if (bcw < bcr) bcr = bcw;
    memcpy(dest, mBufferStart, bcr);
    dest += bcr;
    mBufferStart += bcr;

    if (mBufferStart < mBufferEnd) res = NS_OK_UENC_MOREOUTPUT;
  }

  *aDest = dest;
  return res;
}

Here is the call graph for this function:

Here is the caller graph for this function:

NS_IMETHODIMP nsEncoderSupport::GetMaxLength ( const PRUnichar aSrc,
PRInt32  aSrcLength,
PRInt32 aDestLength 
) [virtual, inherited]

Returns a quick estimation of the size of the buffer needed to hold the converted data.

Remember: this estimation is >= with the actual size of the buffer needed. It will be computed for the "worst case"

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN] the length of source data buffer
aDestLength[OUT] the needed size of the destination buffer
Returns:
NS_OK_UENC_EXACTLENGTH if an exact length was computed NS_OK if all we have is an approximation

Implements nsIUnicodeEncoder.

Definition at line 582 of file nsUCSupport.cpp.

{
  *aDestLength = aSrcLength * mMaxLengthFactor;
  return NS_OK;
}
NS_IMETHODIMP nsEncoderSupport::Reset ( ) [virtual, inherited]

Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.

Implements nsIUnicodeEncoder.

Reimplemented in nsBasicUTF7Encoder, and nsUnicodeToISO2022JP.

Definition at line 561 of file nsUCSupport.cpp.

NS_IMETHODIMP nsEncoderSupport::SetOutputErrorBehavior ( PRInt32  aBehavior,
nsIUnicharEncoder aEncoder,
PRUnichar  aChar 
) [virtual, inherited]

Specify what to do when a character cannot be mapped into the dest charset.

Parameters:
aOrder[IN] the behavior; taken from the enum

Implements nsIUnicodeEncoder.

Definition at line 567 of file nsUCSupport.cpp.

{
  if (aBehavior == kOnError_CallBack && aEncoder == nsnull) 
    return NS_ERROR_NULL_POINTER;

  mErrEncoder = aEncoder;
  mErrBehavior = aBehavior;
  mErrChar = aChar;
  return NS_OK;
}
PRBool nsUnicodeToGBK::Try4BytesEncoder ( PRUnichar  aChar,
char *  aDest,
PRInt32 aOutLen 
) [protected, inherited]

Definition at line 368 of file nsUnicodeToGBK.cpp.

{
  if( IS_HIGH_SURROGATE(aChar) || 
      IS_LOW_SURROGATE(aChar) )
  {
    // performance tune for surrogate characters
    return PR_FALSE;
  }
  if(! m4BytesEncoder )
    Create4BytesEncoder();
  if(m4BytesEncoder) 
  {
    PRInt32 len = 1;
    nsresult res = NS_OK;
    res = m4BytesEncoder->Convert(&aChar, &len, aOut, aOutLen);
    NS_ASSERTION(NS_FAILED(res) || ((1 == len) && (4 == *aOutLen)),
      "unexpect conversion length");
    if(NS_SUCCEEDED(res) && (*aOutLen > 0))
      return PR_TRUE;
  }
  return PR_FALSE;
}

Here is the call graph for this function:

Here is the caller graph for this function:

PRBool nsUnicodeToGBK::TryExtensionEncoder ( PRUnichar  aChar,
char *  aDest,
PRInt32 aOutLen 
) [protected, inherited]

Definition at line 343 of file nsUnicodeToGBK.cpp.

{
  if( IS_HIGH_SURROGATE(aChar) || 
      IS_LOW_SURROGATE(aChar) )
  {
    // performance tune for surrogate characters
    return PR_FALSE;
  }
  if(! mExtensionEncoder )
    CreateExtensionEncoder();
  if(mExtensionEncoder) 
  {
    PRInt32 len = 1;
    nsresult res = NS_OK;
    res = mExtensionEncoder->Convert(&aChar, &len, aOut, aOutLen);
    if(NS_SUCCEEDED(res) && (*aOutLen > 0))
      return PR_TRUE;
  }
  return PR_FALSE;
}

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation

Definition at line 87 of file nsUnicodeToGBK.h.

char* nsEncoderSupport::mBuffer [protected, inherited]

Internal buffer for partial conversions.

Definition at line 332 of file nsUCSupport.h.

Definition at line 333 of file nsUCSupport.h.

char* nsEncoderSupport::mBufferEnd [protected, inherited]

Definition at line 335 of file nsUCSupport.h.

char* nsEncoderSupport::mBufferStart [protected, inherited]

Definition at line 334 of file nsUCSupport.h.

PRInt32 nsEncoderSupport::mErrBehavior [protected, inherited]

Error handling stuff.

Definition at line 340 of file nsUCSupport.h.

PRUnichar nsEncoderSupport::mErrChar [protected, inherited]

Definition at line 342 of file nsUCSupport.h.

Definition at line 341 of file nsUCSupport.h.

Definition at line 86 of file nsUnicodeToGBK.h.

Definition at line 343 of file nsUCSupport.h.

PRUnichar nsUnicodeToGBK::mSurrogateHigh [protected, inherited]

Definition at line 89 of file nsUnicodeToGBK.h.

nsGBKConvUtil nsUnicodeToGBK::mUtil [protected, inherited]

Definition at line 90 of file nsUnicodeToGBK.h.


The documentation for this class was generated from the following files: