您的位置:寻梦网首页编程乐园HTML园地HTML4.0参考文献

前页|后页| 目录|元素| 特性

SGML声明

文档字符订

从SGML角度来看HTML4.0文档字符集是[ISO10646] 的世界字符集(Universal Character Set,UCS).目前,它完全逐字 等价于[UNICODE] 标准.

数据转换

当HTML文本用UCS-2(charset="UNICODE-1-1")直接传送的时候,你 必定会关心它的位元次序:对于双位元字符,高位元是先 送还是后送?这份说明书建议UCS-2以big-endian 位元次序(先 传送高位元)传输,它同时符合确认网络位元传送规则以 及?UNICODE([UNICODE]) 建议的系列文本数据传送方式.而且,为了最大化正确解 译的机会,建议以UCS-2传送文本时以ZERO-WIDTH NON-BREAKING SPACE 字符(16进制FEFF)?开始,它在位元反转时成为FFFE,这个一 个可以保证不会被分配的字符.因此,用户代理器收到?一 个FFFE作为文本的第一个文?本的8位元时可以知道位元已 经从剩余的文本反转.

[ISO10646] 的UTF-1(由IANA作为ISO-10646-UTF-1注册)变形格式,将不被使 用.

SGML声明

   <!SGML?"ISO 8879:1986"
   --
   ?SGML Declaration for HyperText Markup Language version 4.0

   ?With support for Unicode UCS-4 and increased limits
   ?for tag and literal lengths etc.
   --

   CHARSET
      BASESET "ISO Registration Number 177//CHARSET
            ISO/IEC 10646-1:1993 UCS-4 with
            implementation level 3//ESC 2/5 2/15 4/6"
      DESCSET 0   9     UNUSED
             9   2     9
             11?2     UNUSED
             13?1     13
             14?18  ?UNUSED
             32?95  ?32
             127 1     UNUSED
             128 32  ?UNUSED
             160 2147483486 160
   --
    In ISO 10646, the positions with hexadecimal
    values 0000D800 - 0000DFFF, used in the UTF-16
    encoding of UCS-4, are reserved, as well as the last
    two code values in each plane of UCS-4, i.e. all
    values of the hexadecimal form xxxxFFFE or xxxxFFFF.
    These code values or the corresponding numeric
    character references must not be included when
    generating a new HTML document, and they should be
    ignored if encountered when processing a HTML
    document.
   --

   CAPACITY   ?SGMLREF
           TOTALCAP   ?150000
           GRPCAP    ?150000
         ENTCAP     150000

   SCOPE  ?DOCUMENT
   SYNTAX
      SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
        17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
      BASESET"ISO 646IRV:1991//CHARSET
            International Reference Version
            (IRV)//ESC 2/8 4/2"
      DESCSET?0 128 0

      FUNCTION
             RE      13
             RS      10
             SPACE     32
             TAB SEPCHAR  9

      NAMING   LCNMSTRT ""
             UCNMSTRT ""
             LCNMCHAR ".-"-- ?include "~/_" for URLs? --
             UCNMCHAR ".-"
             NAMECASE GENERAL YES
                ENTITY?NO
      DELIM  GENERAL?SGMLREF
             SHORTREF SGMLREF
      NAMES  ?SGMLREF
      QUANTITY SGMLREF
             ATTSPLEN 65536   -- These are the largest values --
             LITLEN   65536   -- permitted in the declaration --
             NAMELEN 65536   -- Avoid fixed limits in actual --
             PILEN   65536   -- implementations of HTML UA's --
             TAGLVL   100
             TAGLEN   65536
             GRPGTCNT 150
             GRPCNT   64

   FEATURES
     MINIMIZE
    DATATAG NO
    OMITTAG YES
    RANK     NO
    SHORTTAG YES
     LINK
    SIMPLE   NO
    IMPLICIT NO
    EXPLICIT NO
     OTHER
    CONCUR   NO
    SUBDOC   NO
    FORMAL   YES
   >