Back to index

lightning-sunbird  0.9+nobinonly
Public Types | Public Member Functions | Private Attributes
nsUnicodeToTSCII Class Reference

#include <nsUnicodeToTSCII.h>

Inheritance diagram for nsUnicodeToTSCII:
Inheritance graph
[legend]
Collaboration diagram for nsUnicodeToTSCII:
Collaboration graph
[legend]

List of all members.

Public Types

enum  { kOnError_Signal, kOnError_CallBack, kOnError_Replace }

Public Member Functions

 nsUnicodeToTSCII ()
virtual ~nsUnicodeToTSCII ()
NS_IMETHOD Convert (const PRUnichar *aSrc, PRInt32 *aSrcLength, char *aDest, PRInt32 *aDestLength)
 Converts the data from Unicode to a Charset.
NS_IMETHOD Finish (char *aDest, PRInt32 *aDestLength)
 Finishes the conversion.
NS_IMETHOD GetMaxLength (const PRUnichar *aSrc, PRInt32 aSrcLength, PRInt32 *aDestLength)
 Returns a quick estimation of the size of the buffer needed to hold the converted data.
NS_IMETHOD Reset ()
 Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.
NS_IMETHOD SetOutputErrorBehavior (PRInt32 aBehavior, nsIUnicharEncoder *aEncoder, PRUnichar aChar)
 Specify what to do when a character cannot be mapped into the dest charset.
NS_IMETHOD FillInfo (PRUint32 *aInfo)

Private Attributes

PRUint32 mBuffer

Detailed Description

Definition at line 52 of file nsUnicodeToTSCII.h.


Member Enumeration Documentation

anonymous enum [inherited]
Enumerator:
kOnError_Signal 
kOnError_CallBack 
kOnError_Replace 

Definition at line 136 of file nsIUnicodeEncoder.h.

       {
    kOnError_Signal,        // on an error, stop and signal
    kOnError_CallBack,      // on an error, call the error handler
    kOnError_Replace       // on an error, replace with a different character
  };

Constructor & Destructor Documentation

Definition at line 58 of file nsUnicodeToTSCII.h.

{ mBuffer = 0; };
virtual nsUnicodeToTSCII::~nsUnicodeToTSCII ( ) [inline, virtual]

Definition at line 59 of file nsUnicodeToTSCII.h.

{};

Member Function Documentation

NS_IMETHODIMP nsUnicodeToTSCII::Convert ( const PRUnichar aSrc,
PRInt32 aSrcLength,
char *  aDest,
PRInt32 aDestLength 
) [virtual]

Converts the data from Unicode to a Charset.

About the byte ordering:

  • The input stream is Unicode, having the byte order which is internal for the machine on which the converter is running on.
  • For output, if the converter cares (that depends of the charset, for example a singlebyte will ignore the byte ordering) it should assume network order. If necessary and requested, we can add a method SetOutputByteOrder() so that the reverse order can be used, too. That method would have as default the assumed network order.

Unless there is not enough output space, this method must consume all the available input data! We don't have partial input for the Unicode charset. And for the last converted char, even if there is not enought output space, a partial ouput must be done until all available space will be used. The rest of the output should be buffered until more space becomes available. But this is not also true about the error handling method!!! So be very, very careful...

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN/OUT] the length of source data buffer; after conversion will contain the number of Unicode characters read
aDest[OUT] the destination data buffer
aDestLength[IN/OUT] the length of the destination data buffer; after conversion will contain the number of bytes written
Returns:
NS_OK_UENC_MOREOUTPUT if only a partial conversion was done; more output space is needed to continue NS_ERROR_UENC_NOMAPPING if character without mapping was encountered and the behavior was set to "signal".

Implements nsIUnicodeEncoder.

Reimplemented in nsUnicodeToTamilTTF.

Definition at line 114 of file nsUnicodeToTSCII.cpp.

{
  const PRUnichar * src = aSrc;
  const PRUnichar * srcEnd = aSrc + *aSrcLength;
  char * dest = aDest;
  char * destEnd = dest + *aDestLength;

  nsresult rv = NS_OK;
                      
  while (src < srcEnd && dest < destEnd) {
    PRUnichar ch = *src;
    if (mBuffer) {                        
      // Attempt to combine the last character with this one.
      PRUint32 last = mBuffer;
                            
      // last : consonant 
      if (IS_TSC_CONSONANT(last)) {                      
        if (ch == UNI_VOWELSIGN_U && IS_TSC_CONSONANT1(last)) {                      
          *dest++ = consonant_with_u[last - TSC_KA];
          mBuffer = 0;                  
          ++src;
          continue;
        }                      
  
        if (ch == UNI_VOWELSIGN_UU && IS_TSC_CONSONANT1(last)) {                      
          *dest++ = consonant_with_uu[last - TSC_KA];          
          mBuffer = 0;                  
          ++src;                  
          continue;                  
        }                      
  
        // reorder. vowel sign goes to the left of consonant
        if (IS_UNI_LEFT_VOWELSIGN(ch)) {                      
          if (dest + 2 > destEnd)
            goto error_more_output;
          *dest++ = TSC_LEFT_VOWELSIGN(ch);
          *dest++ = last;                
          mBuffer = 0;                
          ++src;                  
          continue;                  
        }                      
  
        // split and reorder. consonant goes bet. two parts
        if (IS_UNI_2PARTS_VOWELSIGN(ch)) {                      
          if (dest + 3 > destEnd)
            goto error_more_output;
          *dest++ = TSC_LEFT_VOWEL_PART(ch);
          *dest++ = last;                
          *dest++ = TSC_RIGHT_VOWEL_PART(ch);
          mBuffer = 0;                
          ++src;                  
          continue;                  
        }                      
  
        // Virama
        if (ch == UNI_VIRAMA) {                      
          // consonant KA can form a conjunct with consonant SSA(SHA).
          // buffer dead consonant 'K' for the now.
          if (last == TSC_KA) {                 
            mBuffer = TSC_KA_DEAD;
          }
          // SA can form a conjunct when followed by 'RA'. 
          // buffer dead consonant 'S' for the now.
          else if (last == TSC_SA) {
            mBuffer = TSC_SA_DEAD;                
          }
          else {                    
            *dest++ = IS_TSC_CONSONANT1(last) ?
              consonant_with_virama[last - TSC_KA] : last + 5;
            mBuffer = 0;                
          }                    
          ++src;                  
          continue;                  
        }                      

        // consonant TA forms a ligature with vowel 'I' or 'II'.
        if (last == TSC_TA && (ch == UNI_VOWELSIGN_I || ch == UNI_VOWELSIGN_II)) {                      
          *dest++ = ch - (UNI_VOWELSIGN_I - TSC_TI_LIGA);
          mBuffer = 0;                  
          ++src;                  
          continue;                  
        }                      
      }                      
      else if (last == TSC_KA_DEAD) {                      
        // Kd + SSA =  K.SSA
        if (ch == UNI_SSA) {                      
          mBuffer = TSC_KSSA; 
          ++src;                  
          continue;                  
        }                      
      }                      
      else if (last == TSC_SA_DEAD) {                      
        // Sd + RA = S.RA. Buffer RA + Sd. 
        if (ch == UNI_RA) {                      
          mBuffer = 0xc38a;                
          ++src;                  
          continue;                  
        }                      
      }                      
      else if (last == TSC_KSSA) {                      
        if (ch == UNI_VIRAMA) {
          *dest++ = (char) TSC_KSSA_DEAD;
          mBuffer = 0;                  
          ++src;                  
          continue;                  
        }                      

        // vowel splitting/reordering should be done around conjuncts as well.
        // reorder. vowel sign goes to the left of consonant
        if (IS_UNI_LEFT_VOWELSIGN(ch)) {                      
          if (dest + 2 > destEnd)
            goto error_more_output;
          *dest++ = TSC_LEFT_VOWELSIGN(ch);
          *dest++ = last;                
          mBuffer = 0;                
          ++src;                  
          continue;                  
        }                      
  
        // split and reorder. consonant goes bet. two parts
        if (IS_UNI_2PARTS_VOWELSIGN(ch)) {                      
          if (dest + 3 > destEnd)
            goto error_more_output;
          *dest++ = TSC_LEFT_VOWEL_PART(ch);
          *dest++ = last;                
          *dest++ = TSC_RIGHT_VOWEL_PART(ch);
          mBuffer = 0;                
          ++src;                  
          continue;                  
        }                      
      }                      
      else {
        NS_ASSERTION(last == 0xc38a, "No other value can be buffered");
        if (ch == UNI_VOWELSIGN_II) {                      
          *dest++ = (char) TSC_SRII_LIGA;
          mBuffer = 0;                  
          ++src;                  
          continue;                  
        }                      
        else {
          // put back TSC_SA_DEAD and TSC_RA
          *dest++ = (char) TSC_SA_DEAD;
          mBuffer = TSC_RA;
          ++src;                  
          continue;                  
        }  
      }                      
                          
      /* Output the buffered character.  */              
      if (last >> 8) {                      
        if (dest + 2 >  destEnd)
          goto error_more_output;
        *dest++ = last & 0xff;              
        *dest++ = (last >> 8) & 0xff;              
      }                      
      else                      
        *dest++ = last & 0xff;                
      mBuffer = 0;                    
      continue;                    
    }                        
                        
    if (ch < 0x80)   // Plain ASCII character.
      *dest++ = (char)ch;                    
    else if (IS_UNI_TAMIL(ch)) {                        
      PRUint8 t = UnicharToTSCII[ch - UNI_TAMIL_START];
                            
      if (t != 0) {                      
          if (IS_TSC_CONSONANT(t))
            mBuffer = (PRUint32) t;              
          else                    
            *dest++ = t;                  
      }                      
      else if (IS_UNI_2PARTS_VOWELSIGN(ch)) {   
          // actually this is an illegal sequence.
          if (dest + 2 > destEnd)
            goto error_more_output;

          *dest++ = TSC_LEFT_VOWEL_PART(ch);
          *dest++ = TSC_RIGHT_VOWEL_PART(ch);
      }                      
      else {
        *aDestLength = dest - aDest;
        return NS_ERROR_UENC_NOMAPPING;
      }                      
    }                        
    else if (ch == 0x00A9)                  
      *dest++ = (char)ch;                    
    else if (IS_UNI_SINGLE_QUOTE(ch))
      *dest++ = ch - UNI_LEFT_SINGLE_QUOTE + TSC_LEFT_SINGLE_QUOTE;
    else if (IS_UNI_DOUBLE_QUOTE(ch))
      *dest++ = ch - UNI_LEFT_DOUBLE_QUOTE + TSC_LEFT_DOUBLE_QUOTE;
    else {
      *aDestLength = dest - aDest;
      return NS_ERROR_UENC_NOMAPPING;
    }                        
                        
    /* Now that we wrote the output increment the input pointer.  */        
    ++src;                      
  }

  // flush the buffer
  if (mBuffer >> 8) {                      
    // Write out the last character, two bytes. 
    if (dest + 2 > destEnd)
      goto error_more_output;
    *dest++ = (mBuffer >> 8) & 0xff;            
    *dest++ = mBuffer & 0xff;              
    mBuffer = 0;
  }                      
  else if (mBuffer) {
    // Write out the last character, a single byte.
    if (dest >= destEnd)
      goto error_more_output;
    *dest++ = mBuffer & 0xff;              
    mBuffer = 0;
  }                      

  *aSrcLength = src - aSrc;
  *aDestLength = dest - aDest;
  return rv;

error_more_output:
  *aSrcLength = src - aSrc;
  *aDestLength = dest - aDest;
  return NS_OK_UENC_MOREOUTPUT;
}

Here is the caller graph for this function:

Implements nsICharRepresentable.

Definition at line 393 of file nsUnicodeToTSCII.cpp.

{
  // Tamil block is so sparse.
  static const PRUint8 coverage[] = {
    0xe8, // 11101000  U+0B87 - U+0B80
    0xc7, // 11000111  U+0B8F - U+0B88
    0x3d, // 00111101  U+0B97 - U+0B90
    0xd6, // 11010110  U+0B9F - U+0B98
    0x18, // 00011000  U+0BA7 - U+0BA0
    0xc7, // 11000111  U+0BAF - U+0BA8
    0xbf, // 10111111  U+0BB7 - U+0BB0
    0xc7, // 11000111  U+0BBF - U+0BB8
    0xc7, // 11000111  U+0BC7 - U+0BC0
    0x3d, // 00111101  U+0BCF - U+0BC8
    0x80, // 10000000  U+0BD7 - U+0BD0
    0x00, // 00000000  U+0BDF - U+0BD8
    0x80, // 10000000  U+0BE7 - U+0BE0
    0xff, // 11111111  U+0BEF - U+0BE8
    0x07, // 00000111  U+0BF7 - U+0BF0
  };

  PRUnichar i;
  for(i = 0; i <  0x78; i++)
    if (coverage[i / 8] & (1 << (i % 8)))
      SET_REPRESENTABLE(aInfo, i + UNI_TAMIL_START);

  // TSCII is a superset of US-ASCII.
  for(i = 0x20; i < 0x7f; i++)
     SET_REPRESENTABLE(aInfo, i);

  // additional characters in TSCII
  SET_REPRESENTABLE(aInfo, 0xA9);   // copyright sign
  SET_REPRESENTABLE(aInfo, UNI_LEFT_SINGLE_QUOTE);
  SET_REPRESENTABLE(aInfo, UNI_RIGHT_SINGLE_QUOTE);
  SET_REPRESENTABLE(aInfo, UNI_LEFT_DOUBLE_QUOTE);
  SET_REPRESENTABLE(aInfo, UNI_RIGHT_DOUBLE_QUOTE);

  return NS_OK;
}
NS_IMETHODIMP nsUnicodeToTSCII::Finish ( char *  aDest,
PRInt32 aDestLength 
) [virtual]

Finishes the conversion.

The converter has the possibility to write some extra data and flush its final state.

Parameters:
aDest[OUT] the destination data buffer
aDestLength[IN/OUT] the length of destination data buffer; after conversion it will contain the number of bytes written
Returns:
NS_OK_UENC_MOREOUTPUT if only a partial conversion was done; more output space is needed to continue

Implements nsIUnicodeEncoder.

Definition at line 343 of file nsUnicodeToTSCII.cpp.

{
  if (!mBuffer) {
    *aDestLength = 0;
    return NS_OK;
  }

  if (mBuffer >> 8) {                      
    // Write out the last character, two bytes. 
    if (*aDestLength < 2) {
      *aDestLength = 0;
      return NS_OK_UENC_MOREOUTPUT;
    }
    *aDest++ = (mBuffer >> 8) & 0xff;            
    *aDest++ = mBuffer & 0xff;              
    mBuffer = 0;
    *aDestLength = 2;
  }                      
  else {                      
    // Write out the last character, a single byte.
    if (*aDestLength < 1) {                    
      *aDestLength = 0;
      return NS_OK_UENC_MOREOUTPUT;
    }
    *aDest++ = mBuffer & 0xff;              
    mBuffer = 0;
    *aDestLength = 1;
  }                      
  return NS_OK;
}
NS_IMETHODIMP nsUnicodeToTSCII::GetMaxLength ( const PRUnichar aSrc,
PRInt32  aSrcLength,
PRInt32 aDestLength 
) [virtual]

Returns a quick estimation of the size of the buffer needed to hold the converted data.

Remember: this estimation is >= with the actual size of the buffer needed. It will be computed for the "worst case"

Parameters:
aSrc[IN] the source data buffer
aSrcLength[IN] the length of source data buffer
aDestLength[OUT] the needed size of the destination buffer
Returns:
NS_OK_UENC_EXACTLENGTH if an exact length was computed NS_OK if all we have is an approximation

Implements nsIUnicodeEncoder.

Reimplemented in nsUnicodeToTamilTTF.

Definition at line 383 of file nsUnicodeToTSCII.cpp.

{
  // Some Tamil letters  can be decomposed into 2 glyphs in TSCII.
  *aDestLength = aSrcLength *  2;
  return NS_OK;
}

Resets the charset converter so it may be recycled for a completely different and urelated buffer of data.

Implements nsIUnicodeEncoder.

Definition at line 376 of file nsUnicodeToTSCII.cpp.

{
  mBuffer = 0;
  return NS_OK;
}

Specify what to do when a character cannot be mapped into the dest charset.

Parameters:
aOrder[IN] the behavior; taken from the enum

Implements nsIUnicodeEncoder.

Reimplemented in nsUnicodeToTamilTTF.

Definition at line 434 of file nsUnicodeToTSCII.cpp.

{
  return NS_OK;
}

Member Data Documentation

Definition at line 78 of file nsUnicodeToTSCII.h.


The documentation for this class was generated from the following files: