iXML Community Group Test Suite

23 Oct 2023 (22 Nov 2023)

Top-level catalog for tests in the iXML Community Group Test Suite.

Tests have been contributed from several sources, but the core of the test collection are the tests contributed by Steven Pemberton in December 2021.

Tests producing parse trees

22 Nov 2022

Tests provided by Steven Pemberton in December 2021, with corrections of 21 December. Reorganized by Norm Tovey-Walsh, February 2022.

ixml tests

Created 16 Dec 2021 by SP

Updated 21 Dec 2021 by SP

Corrected input, grammar, or output for 5 tests

Updated 30 Dec 2021 by MSM

Updated catalog, corrected many tests.

Updated 30 May 2022 by MSM

Add whitespace-and-delimiters test sets.

unicode-version-check

Created 15 Nov 2023 by MSM

Invisible XML Grammar

{ An attempt to detect which version of Unicode the ixml processor is
  currently using.

  The input is a sequence of the following code points, separated
  for legibility by whitespace.

  First, some characters whose general category value has changed:

  U+00A7 Section sign (So in 6.0, Po from 6.1)
  U+2308 Left ceiling (Sm in 6.0-6.2, Ps from 6.3)
  U+FD3E..FD3F Ornate L/R parenthesis (Ps,Pe in 6.0-6.3, Pe,Ps from 7)
  U+13DD Cherokee letter TLA (Lo 6.0-7.0, Lu from 8)
  U+1886 Mongolian letter Ali Gali three Baluda (Lo, then Mn from 9)
  
  U+10D0 Georgian small letter AN (Lo in 6-10, Ll from 11)
  U+166D Canadian syllabics CHI sign (Po in 6-11, So from 12)
  U+1734 Hanundo sign PAMUDPOD (Mn in 6-13, Mc in 14-15)

  Then a sequence of characters new in different versions:

  6.0 U+1F086 So Playing card six of Hearts
  6.1 U+2CF3  Ll Coptic small letter BOHAIRIC KHEI
  6.2 U+20BA  Sc Turkish Lira sign
  6.3 adds only format control characters, omitted here
  7.0 U+052B  Ll Cyrillic small letter DZZHE
  8.0 U+13F5  Lu Cherokee letter MV
  
  9.0 U+1715C Lo (not named)
  10.0 U+0860 Lo Syriac letter Malayalam NGA
  11.0 U+0560 Ll Armenian ... turned AYB
  12.0 U+0C77 Po Telugu sign SIDDHAM
  12.1 U+32FF So Square era name REIWA
  
  13.0 U+1FBF7 Nd Segmented digit seven
  14.0 U+2E54 Po Medieval question mark
  15.0 U+11B00 Po Devanagari head mark
  15.1 U+2FFC So ideographic desc char surround from right

  For the record, earlier versions had:
  13.0 U+08BE Lo Arabic ... PAH with small v
  14.0 U+061D Po Arabic end of text mark
  but I'm trying to avoid confusing debuggers with bidi.

}
-unicode-version = unicode-6.0
                | unicode-6.1
                | unicode-6.2
                | unicode-6.3
                | unicode-7.0
                | unicode-8.0
                | unicode-9.0
                | unicode-10.0
                | unicode-11.0
                | unicode-12.0
                | unicode-12.1
                | unicode-13.0
                | unicode-14.0
                | unicode-15.0
                | unicode-15.1
                .

unicode-6.0  = -[So], -[Sm], -[Ps], -[Pe], -[Lo], -[Lo], s,
                {A7    2308   FD3E, FD3F    13DD   1886 }
               -[Lo], -[Po], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -~[Ll], -~[Sc], -~[Ll], -~[Lu], s,
                {1f086   2cf3    20ba    052b    13f5 }
               -~[Lo], -~[Lo], -~[Ll], -~[Po], -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-6.1  = -[Po], -[Sm], -[Ps], -[Pe], -[Lo], -[Lo], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Lo], -[Po], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -~[Sc], -~[Ll], -~[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -~[Lo], -~[Lo], -~[Ll], -~[Po], -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-6.2  = -[Po], -[Sm], -[Ps], -[Pe], -[Lo], -[Lo], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Lo], -[Po], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -~[Ll], -~[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -~[Lo], -~[Lo], -~[Ll], -~[Po], -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-6.3  = -[Po], -[Ps], -[Ps], -[Pe], -[Lo], -[Lo], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Lo], -[Po], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -~[Ll], -~[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -~[Lo], -~[Lo], -~[Ll], -~[Po], -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-7.0  = -[Po], -[Ps], -[Pe], -[Ps], -[Lo], -[Lo], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Lo], -[Po], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -~[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -~[Lo], -~[Lo], -~[Ll], -~[Po], -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-8.0  = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Lo], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Lo], -[Po], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -~[Lo], -~[Lo], -~[Ll], -~[Po], -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-9.0  = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Mn], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Lo], -[Po], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -[Lo],  -~[Lo], -~[Ll], -~[Po], -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-10.0 = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Mn], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Lo], -[Po], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -[Lo],  -[Lo],  -~[Ll], -~[Po], -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-11.0 = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Mn], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Ll], -[Po], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -[Lo],  -[Lo],  -[Ll],  -~[Po], -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-12.0 = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Mn], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Ll], -[So], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -[Lo],  -[Lo],  -[Ll],  -[Po],  -~[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-12.1 = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Mn], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Ll], -[So], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -[Lo],  -[Lo],  -[Ll],  -[Po],  -[So], s,
                {1715c   0860    0560    0c77    32ff }
               -~[Nd], -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-13.0 = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Mn], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Ll], -[So], -[Mn], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -[Lo],  -[Lo],  -[Ll],  -[Po],  -[So], s,
                {1715c   0860    0560    0c77    32ff }
               -[Nd],  -~[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-14.0 = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Mn], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Ll], -[So], -[Mc], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -[Lo],  -[Lo],  -[Ll],  -[Po],  -[So], s,
                {1715c   0860    0560    0c77    32ff }
               -[Nd],  -[Po], -~[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-15.0 = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Mn], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Ll], -[So], -[Mc], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -[Lo],  -[Lo],  -[Ll],  -[Po],  -[So], s,
                {1715c   0860    0560    0c77    32ff }
               -[Nd],  -[Po],  -[Po], -~[So], s.
                {1fbf7   2e54    11b00   2ffc }

unicode-15.1 = -[Po], -[Ps], -[Pe], -[Ps], -[Lu], -[Mn], s,
                { A7   2308   FD3E , FD3F   13DD   1886 }
               -[Ll], -[So], -[Mc], s,
                {10d0  166d   1734 }
               -[So],  -[Ll],  -[Sc],  -[Ll],  -[Lu], s,
                {1f086  2cf3     20ba    052b    13f5 }
               -[Lo],  -[Lo],  -[Ll],  -[Po],  -[So], s,
                {1715c   0860    0560    0c77    32ff }
               -[Nd],  -[Po],  -[Po],  -[So], s.
                {1fbf7   2e54    11b00  2ffc }


-s = -[Zs; #9; #A; #D]+.

Test case: unicode-version-15.1-diagnostic

Repository URI: …/tests/correct/test-catalog.xml

Depends on Unicode version 15.1.

Input string (28 characters)

§⌈﴾﴿Ꮭᢆ
ა᙭᜴
🂆ⳳ₺ԫᏵ
𗅜ࡠՠ౷㋿
🯷⹔𑬀⿼

Expected result

<unicode-15.1/>