Back to index

lightning-sunbird  0.9+nobinonly
Go to the documentation of this file.
00001 /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
00002 /* ***** BEGIN LICENSE BLOCK *****
00003  * Version: MPL 1.1/GPL 2.0/LGPL 2.1
00004  *
00005  * The contents of this file are subject to the Mozilla Public License Version
00006  * 1.1 (the "License"); you may not use this file except in compliance with
00007  * the License. You may obtain a copy of the License at
00008  *
00009  *
00010  * Software distributed under the License is distributed on an "AS IS" basis,
00011  * WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
00012  * for the specific language governing rights and limitations under the
00013  * License.
00014  *
00015  * The Original Code is HTML Sanitizer code.
00016  *
00017  * The Initial Developer of the Original Code is
00018  * Ben Bucksch <>.
00019  * Portions created by the Initial Developer are Copyright (C) 2002
00020  * the Initial Developer. All Rights Reserved.
00021  *
00022  * Contributor(s):
00023  *   Netscape
00024  *
00025  * Alternatively, the contents of this file may be used under the terms of
00026  * either of the GNU General Public License Version 2 or later (the "GPL"),
00027  * or the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
00028  * in which case the provisions of the GPL or the LGPL are applicable instead
00029  * of those above. If you wish to allow use of your version of this file only
00030  * under the terms of either the GPL or the LGPL, and not to allow others to
00031  * use your version of this file under the terms of the MPL, indicate your
00032  * decision by deleting the provisions above and replace them with the notice
00033  * and other provisions required by the GPL or the LGPL. If you do not delete
00034  * the provisions above, a recipient may use your version of this file under
00035  * the terms of any one of the MPL, the GPL or the LGPL.
00036  *
00037  * ***** END LICENSE BLOCK ***** */
00039 /* Cleans up HTML source from unwanted tags/attributes
00041    This class implements a content sink, which takes a parsed HTML document
00042    and removes all tags and attributes that are not explicitly allowed.
00044    This may improve the viewing experience of the user and/or the
00045    security/privacy.
00047    What is allowed is defined by a string (format described before the
00048    implementation of |mozHTMLSanitizer::ParsePrefs()|). The sytnax of the
00049    definition is not very rich - you can only (dis)allow certain tags and
00050    attributes, but not where they may appear. (This makes the implementation
00051    much more simple.) E.g. it is impossible to disallow ordinary text as a
00052    direct child of the <head> node or to disallow multiple <head> nodes.
00054    We also remove some known bad attribute values like javascript: URLs.
00055    Draconian attitude.
00057    Currently, the output of this class is unparsed (!) HTML source, which
00058    means that each document has to go through the parser twice. Of course,
00059    that is a performance killer. There are some reasons for for me doing it
00060    that way:
00061    * There is, to my knowledge, no interface to hook up such modifiers
00062      in the document display data flow. We have a nice interface for doing
00063      the modifications (the DOM), but no place to get the DOM and to invoke
00064      this code. As I don't want to hack this directly into the html sink,
00065      I'd have to create a generic interface first, which is too much work for
00066      me at the moment.
00067    * It is quite easy to hook up modifiers for the (unparsed) data stream,
00068      both in netwerk (for the browser) and esp. in libmime (for Mailnews).
00069    * It seems like the safest method - it is easier to debug (you have the
00070      HTML source output to check) and is less prone to security-relevant bugs
00071      and regressions, because in the case of a bug, it will probably fall back
00072      to not outputting, which is safer than erring on the side of letting
00073      something slip through (most of the alternative approaches listed below
00074      are probably vulnerable to the latter).
00075    * It should be possible to later change this class to output a parsed HTML
00076      document.
00077    So, in other words, I had the choice between better design and better
00078    performance. I choose design. Bad performance has an effect on the users
00079    of this class only, while bad design has an effect on all users and 
00080    programmers.
00082    That being said, I have some ideas, how do make it much more efficient, but
00083    they involve hacking core code.
00084    * At some point when we have DOM, but didn't do anything with it yet
00085      (in particular, didn't load any external objects or ran any javascript),
00086      walk the DOM and delete everything the user doesn't explicitly like.
00087    * There's this nice GetPref() in the HTMLContentSink. It isn't used exactly
00088      as I would like to, but that should be doable. Bascially, before
00089      processing any tag (e.g. in OpenContainer or AddLeaf), ask that
00090      function, if the tag is allowed. If not, just return.
00091    In any case, there's the problem, how the users of the renderer
00092    (e.g. Mailnews) can tell it to use the sanitizer and which tags are
00093    allowed (the browser may want to allow more tags than Mailnews).
00094    That probably means that I have to hack into the docshell (incl. its
00095    interface) or similar, which I would really like to avoid.
00096    Any ideas appreciated.
00097 */
00098 #ifndef _mozISanitizingSerializer_h__
00099 #define _mozISanitizingSerializer_h__
00101 #include "nsISupports.h"
00102 #include "nsAString.h"
00106 /* starting interface:    nsIContentSerializer */
00107 #define MOZ_ISANITIZINGHTMLSERIALIZER_IID_STR "feca3c34-205e-4ae5-bd1c-03c686ff012b"
00110   {0xfeca3c34, 0x205e, 0x4ae5, \
00111     { 0xbd, 0x1c, 0x03, 0xc6, 0x86, 0xff, 0x01, 0x2b }}
00113 class mozISanitizingHTMLSerializer : public nsISupports {
00114  public: 
00118   NS_IMETHOD Initialize(nsAString* aOutString,
00119                         PRUint32 aFlags,
00120                         const nsAString& allowedTags) = 0;
00121   // This function violates string ownership rules, see impl.
00122 };
00124 #endif