Back to index

lightning-sunbird  0.9+nobinonly
Public Types | Public Member Functions | Private Attributes
ConvertUTF8toUTF16 Class Reference

A character sink (see |copy_string| in nsAlgorithm.h) for converting UTF-8 to UTF-16. More...

#include <nsUTF8Utils.h>

List of all members.

Public Types

typedef nsACString::char_type value_type
typedef nsAString::char_type buffer_type

Public Member Functions

 ConvertUTF8toUTF16 (buffer_type *aBuffer)
size_t Length () const
PRUint32 NS_ALWAYS_INLINE write (const value_type *start, PRUint32 N)
void write_terminator ()

Private Attributes

buffer_type *const mStart
buffer_typemBuffer
PRBool mErrorEncountered

Detailed Description

A character sink (see |copy_string| in nsAlgorithm.h) for converting UTF-8 to UTF-16.

Definition at line 66 of file nsUTF8Utils.h.


Member Typedef Documentation

typedef nsAString::char_type ConvertUTF8toUTF16::buffer_type

Definition at line 70 of file nsUTF8Utils.h.

typedef nsACString::char_type ConvertUTF8toUTF16::value_type

Definition at line 69 of file nsUTF8Utils.h.


Constructor & Destructor Documentation

Definition at line 72 of file nsUTF8Utils.h.

        : mStart(aBuffer), mBuffer(aBuffer), mErrorEncountered(PR_FALSE) {}

Member Function Documentation

size_t ConvertUTF8toUTF16::Length ( ) const [inline]

Definition at line 75 of file nsUTF8Utils.h.

{ return mBuffer - mStart; }

Here is the caller graph for this function:

Definition at line 77 of file nsUTF8Utils.h.

      {
        if ( mErrorEncountered )
          return N;

        // algorithm assumes utf8 units won't
        // be spread across fragments
        const value_type* p = start;
        const value_type* end = start + N;
        buffer_type* out = mBuffer;
        for ( ; p != end /* && *p */; )
          {
            char c = *p++;

            if ( UTF8traits::isASCII(c) )
              {
                *out++ = buffer_type(c);
                continue;
              }

            PRUint32 ucs4;
            PRUint32 minUcs4;
            PRInt32 state = 0;

            if ( UTF8traits::is2byte(c) )
              {
                ucs4 = (PRUint32(c) << 6) & 0x000007C0L;
                state = 1;
                minUcs4 = 0x00000080;
              }
            else if ( UTF8traits::is3byte(c) )
              {
                ucs4 = (PRUint32(c) << 12) & 0x0000F000L;
                state = 2;
                minUcs4 = 0x00000800;
              }
            else if ( UTF8traits::is4byte(c) )
              {
                ucs4 = (PRUint32(c) << 18) & 0x001F0000L;
                state = 3;
                minUcs4 = 0x00010000;
              }
            else if ( UTF8traits::is5byte(c) )
              {
                ucs4 = (PRUint32(c) << 24) & 0x03000000L;
                state = 4;
                minUcs4 = 0x00200000;
              }
            else if ( UTF8traits::is6byte(c) )
              {
                ucs4 = (PRUint32(c) << 30) & 0x40000000L;
                state = 5;
                minUcs4 = 0x04000000;
              }
            else
              {
                NS_ERROR("Not a UTF-8 string. This code should only be used for converting from known UTF-8 strings.");
                mErrorEncountered = PR_TRUE;
                mBuffer = out;
                return N;
              }

            while ( state-- )
              {
                if (p == end)
                  {
                    NS_ERROR("Buffer ended in the middle of a multibyte sequence");
                    mErrorEncountered = PR_TRUE;
                    mBuffer = out;
                    return N;
                  }

                c = *p++;

                if ( UTF8traits::isInSeq(c) )
                  {
                    PRInt32 shift = state * 6;
                    ucs4 |= (PRUint32(c) & 0x3F) << shift;
                  }
                else
                  {
                    NS_ERROR("not a UTF8 string");
                    mErrorEncountered = PR_TRUE;
                    mBuffer = out;
                    return N;
                  }
              }

            if ( ucs4 < minUcs4 )
              {
                // Overlong sequence
                *out++ = UCS2_REPLACEMENT_CHAR;
              }
            else if ( ucs4 <= 0xD7FF )
              {
                *out++ = ucs4;
              }
            else if ( /* ucs4 >= 0xD800 && */ ucs4 <= 0xDFFF )
              {
                // Surrogates
                *out++ = UCS2_REPLACEMENT_CHAR;
              }
            else if ( ucs4 == 0xFFFE || ucs4 == 0xFFFF )
              {
                // Prohibited characters
                *out++ = UCS2_REPLACEMENT_CHAR;
              }
            else if ( ucs4 >= PLANE1_BASE )
              {
                if ( ucs4 >= UCS_END )
                  *out++ = UCS2_REPLACEMENT_CHAR;
                else {
                  *out++ = (buffer_type)H_SURROGATE(ucs4);
                  *out++ = (buffer_type)L_SURROGATE(ucs4);
                }
              }
            else
              {
                *out++ = ucs4;
              }
          }
        mBuffer = out;
        return p - start;
      }

Here is the call graph for this function:

Definition at line 202 of file nsUTF8Utils.h.

      {
        *mBuffer = buffer_type(0);
      }

Member Data Documentation

Definition at line 209 of file nsUTF8Utils.h.

Definition at line 210 of file nsUTF8Utils.h.

Definition at line 208 of file nsUTF8Utils.h.


The documentation for this class was generated from the following file: