Back to index

lightning-sunbird  0.9+nobinonly
Public Types | Public Member Functions | Protected Member Functions | Protected Attributes | Private Attributes
nsEUCJPToUnicodeV2 Class Reference

#include <nsJapaneseToUnicode.h>

Inheritance diagram for nsEUCJPToUnicodeV2:
Inheritance graph
[legend]
Collaboration diagram for nsEUCJPToUnicodeV2:
Collaboration graph
[legend]

List of all members.

Public Types

enum  { kOnError_Recover, kOnError_Signal }

Public Member Functions

 nsEUCJPToUnicodeV2 ()
virtual ~nsEUCJPToUnicodeV2 ()
NS_IMETHOD Convert (const char *aSrc, PRInt32 *aSrcLength, PRUnichar *aDest, PRInt32 *aDestLength)
 Converts the data from one Charset to Unicode.
NS_IMETHOD GetMaxLength (const char *aSrc, PRInt32 aSrcLength, PRInt32 *aDestLength)
 Returns a quick estimation of the size of the buffer needed to hold the converted data.
NS_IMETHOD Reset ()
 Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.

Protected Member Functions

void setMapMode ()

Protected Attributes

const PRUint16 *constmMapIndex

Private Attributes

PRInt32 mState
PRInt32 mData

Detailed Description

Definition at line 86 of file nsJapaneseToUnicode.h.


Member Enumeration Documentation

anonymous enum [inherited]
Enumerator:
kOnError_Recover 
kOnError_Signal 

Definition at line 98 of file nsIUnicodeDecoder.h.

       {
    kOnError_Recover,       // on an error, recover and continue
    kOnError_Signal         // on an error, stop and signal
  };

Constructor & Destructor Documentation

Definition at line 90 of file nsJapaneseToUnicode.h.

     { 
          mState=0; mData=0; 
          setMapMode();
     };

Here is the call graph for this function:

virtual nsEUCJPToUnicodeV2::~nsEUCJPToUnicodeV2 ( ) [inline, virtual]

Definition at line 95 of file nsJapaneseToUnicode.h.

{};

Member Function Documentation

NS_IMETHODIMP nsEUCJPToUnicodeV2::Convert ( const char *  aSrc,
PRInt32 aSrcLength,
PRUnichar aDest,
PRInt32 aDestLength 
) [virtual]

Converts the data from one Charset to Unicode.

About the byte ordering:

  • For input, if the converter cares (that depends of the charset, for example a singlebyte will ignore the byte ordering) it should assume network order. If necessary and requested, we can add a method SetInputByteOrder() so that the reverse order can be used, too. That method would have as default the assumed network order.
  • The output stream is Unicode, having the byte order which is internal for the machine on which the converter is running on.

Unless there is not enough output space, this method must consume all the available input data! The eventual incomplete final character data will be stored internally in the converter and used when the method is called again for continuing the conversion. This way, the caller will not have to worry about managing incomplete input data by mergeing it with the next buffer.

Error conditions: If the read value does not belong to this character set, one should replace it with the Unicode special 0xFFFD. When an actual input error is encountered, like a format error, the converter stop and return error. Hoever, we should keep in mind that we need to be lax in decoding.

Converter required behavior: In this order: when output space is full - return right away. When input data is wrong, return input pointer right after the wrong byte. When partial input, it will be consumed and cached. All the time input pointer will show how much was actually consumed and how much was actually written.

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN/OUT] the length of source data buffer; after conversion will contain the number of bytes read
aDest[OUT] the destination data buffer
aDestLength[IN/OUT] the length of the destination data buffer; after conversion will contain the number of Unicode characters written
Returns:
NS_PARTIAL_MORE_INPUT if only a partial conversion was done; more input is needed to continue NS_PARTIAL_MORE_OUTPUT if only a partial conversion was done; more output space is needed to continue NS_ERROR_ILLEGAL_INPUT if an illegal input sequence was encountered and the behavior was set to "signal"

Implements nsIUnicodeDecoder.

Definition at line 226 of file nsJapaneseToUnicode.cpp.

{
   static const PRUint8 sbIdx[256] =
   {
/* 0x0X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0x1X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0x2X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0x3X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0x4X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0x5X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0x6X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0x7X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0x8X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0x9X */
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
     0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
/* 0xAX */
     0xFF, 0,    1,    2,    3,    4,    5,    6,  
     7,    8 ,   9,    10,   11,   12,   13,   14,
/* 0xBX */
     15,   16,   17,   18,   19,   20,   21,   22, 
     23,   24,   25,   26,   27,   28,   29,   30, 
/* 0xCX */
     31,   32,   33,   34,   35,   36,   37,   38, 
     39,   40,   41,   42,   43,   44,   45,   46, 
/* 0xDX */
     47,   48,   49,   50,   51,   52,   53,   54, 
     55,   56,   57,   58,   59,   60,   61,   62, 
/* 0xEX */
     63,   64,   65,   66,   67,   68,   69,   70, 
     71,   72,   73,   74,   75,   76,   77,   78, 
/* 0xFX */
     79,   80,   81,   82,   83,   84,   85,   86, 
     87,   88,   89,   90,   91,   92,   93,   0xFF, 
   };

   const unsigned char* srcEnd = (unsigned char*)aSrc + *aSrcLen;
   const unsigned char* src =(unsigned char*) aSrc;
   PRUnichar* destEnd = aDest + *aDestLen;
   PRUnichar* dest = aDest;
   while((src < srcEnd))
   {
       switch(mState)
       {
          case 0:
          if(*src & 0x80  && *src != (unsigned char)0xa0)
          {
            mData = JIS0208_INDEX[*src & 0x7F];
            if(mData != 0xFFFD )
            {
               mState = 1; // two byte JIS0208
            } else {
               if( 0x8e == *src) {
                 // JIS 0201
                 mState = 2; // JIS0201
               } else if(0x8f == *src) {
                 // JIS 0212
                 mState = 3; // JIS0212
               } else {
                 // others 
                 *dest++ = 0xFFFD;
                 if(dest >= destEnd)
                   goto error1;
               }
            }
          } else {
            // ASCII
            *dest++ = (PRUnichar) *src;
            if(dest >= destEnd)
              goto error1;
          }
          break;

          case 1: // Index to table
          {
            PRUint8 off = sbIdx[*src];
            if(0xFF == off) {
              *dest++ = 0xFFFD;
               // if the first byte is valid for EUC-JP but the second 
               // is not while being a valid US-ASCII(i.e. < 0xc0), save it
               // instead of eating it up !
               if ( ! (*src & 0xc0)  )
                 *dest++ = (PRUnichar) *src;;
            } else {
               *dest++ = gJapaneseMap[mData+off];
            }
            mState = 0;
            if(dest >= destEnd)
              goto error1;
          }
          break;

          case 2: // JIS 0201
          {
            if((0xA1 <= *src) && (*src <= 0xDF)) {
              *dest++ = (0xFF61-0x00A1) + *src;
            } else {
              *dest++ = 0xFFFD;             
              // if 0x8e is not followed by a valid JIS X 0201 byte
              // but by a valid US-ASCII, save it instead of eating it up.
              if ( (PRUint8)*src < (PRUint8)0x7f )
                 *dest++ = (PRUnichar) *src;
            }
            mState = 0;
            if(dest >= destEnd)
              goto error1;
          }
          break;

          case 3: // JIS 0212
          {
            if(*src & 0x80)
            {
              mData = JIS0212_INDEX[*src & 0x7F];
              if(mData != 0xFFFD )
              {
                 mState = 4; 
              } else {
                 mState = 5; // error
              }
            } else {
              mState = 5; // error
            }
          }
          break;
          case 4:
          {
            PRUint8 off = sbIdx[*src];
            if(0xFF == off) {
               *dest++ = 0xFFFD;
            } else {
               *dest++ = gJapaneseMap[mData+off];
            }
            mState = 0;
            if(dest >= destEnd)
              goto error1;
          }
          break;
          case 5: // two bytes undefined
          {
            *dest++ = 0xFFFD;
            mState = 0;
            if(dest >= destEnd)
              goto error1;
          }
          break;
       }
       src++;
   }
   *aDestLen = dest - aDest;
   return NS_OK;
error1:
   *aDestLen = dest-aDest;
   src++;
   if ((mState == 0) && (src == srcEnd)) {
     return NS_OK;
   } 
   *aSrcLen = src - (const unsigned char*)aSrc;
   return NS_OK_UDEC_MOREOUTPUT;
}
NS_IMETHOD nsEUCJPToUnicodeV2::GetMaxLength ( const char *  aSrc,
PRInt32  aSrcLength,
PRInt32 aDestLength 
) [inline, virtual]

Returns a quick estimation of the size of the buffer needed to hold the converted data.

Remember: this estimation is >= with the actual size of the buffer needed. It will be computed for the "worst case"

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN] the length of source data buffer
aDestLength[OUT] the needed size of the destination buffer
Returns:
NS_EXACT_LENGTH if an exact length was computed NS_OK is all we have is an approximation

Implements nsIUnicodeDecoder.

Definition at line 99 of file nsJapaneseToUnicode.h.

     {
        *aDestLength = aSrcLength;
        return NS_OK;
     };
NS_IMETHOD nsEUCJPToUnicodeV2::Reset ( ) [inline, virtual]

Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.

Implements nsIUnicodeDecoder.

Definition at line 105 of file nsJapaneseToUnicode.h.

     {
        mState = 0;
        setMapMode();
        return NS_OK;
     };

Here is the call graph for this function:

void nsJapaneseToUnicode::setMapMode ( ) [protected, inherited]

Definition at line 54 of file nsJapaneseToUnicode.cpp.

{
  nsresult res;

  mMapIndex = gIndex;

  nsCOMPtr<nsIPrefBranch> prefBranch = do_GetService(NS_PREFSERVICE_CONTRACTID);
  if (!prefBranch) return;
  nsXPIDLCString prefMap;
  res = prefBranch->GetCharPref("intl.jis0208.map", getter_Copies(prefMap));
  if (!NS_SUCCEEDED(res)) return;
  nsCaseInsensitiveCStringComparator comparator;
  if ( prefMap.Equals(NS_LITERAL_CSTRING("cp932"), comparator) ) {
    mMapIndex = gCP932Index;
  } else if ( prefMap.Equals(NS_LITERAL_CSTRING("ibm943"), comparator) ) {
    mMapIndex = gIBM943Index;
  }
}

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation

Definition at line 114 of file nsJapaneseToUnicode.h.

Definition at line 50 of file nsJapaneseToUnicode.h.

Definition at line 110 of file nsJapaneseToUnicode.h.


The documentation for this class was generated from the following files: