From: Victor Mote (vic@portagepub.com)
Date: Mon Apr 05 2004 - 09:32:21 PDT
Nikolai Grigoriev wrote:
> the question of U+2028 is a complicated one. The XSL-FO spec
> does not constrain its processing in any way. There is no
> mention that it is subject to normalization, but equally no
> indication that it is expected to produce a line break at
> all. The effects of this character are therefore not
> well-defined: I doubt whether it can be considered a valid linefeed.
I agree that it is not well-defined.
> In XEP, we treat U+000A, U+000D, and U+2028 as complete
> equivalents. (This refers to the data that come to the
> formatter, after linefeed normalization in the processor).
> The logic is
> straightforward: a character is either a linefeed or not;
> linefeeds terminate lines and are subject to the effects of
> linefeed-treatment; non-linefeeds do neither of these.
Ken's distinction between a linefeed and a LINE SEPARATOR is relevant here I
think. They are different concepts. The fact that linefeed-treatment is
supposed to *only* affect U+000A, but affects U+2028 in XEP indicates some
misunderstanding.
> One can argue if this is a correct behaviour. However, I
> believe that it is inherently unsafe to rely on Unicode text
> flow control characters in systems that have their own markup
> to express the same semantics. There is no reason to use
I mostly agree with this. However, there really is no semantic in XSL-FO
that says "force a line break here". It is true that you can say "start a
new block here", but that really is a different concept.
> U+2028 or U+2029 if you have explicit paragraph structure set
> by <fo:block>s; it is risky to mix LRO/RLO/LRE/RLE with
I think U+2029 really is the same as saying "start a new block", and agree
that there is no good reason to use it in XSL-FO.
> fo:bidi-override. If you need explicit line breaks inside
> non-preformatted text, set a <br/> element in the input XML
> vocabulary and match it to <fo:block/> in the stylesheet. In
> this way, your intent is clear to everybody.
There really is a good reason to not take this approach, unless necessary.
Simply inserting an </fo:block><fo:block> combination does not do the job.
The new block created here may not have the same properties -- things like
space-before, keeps, etc. have great potential to be different. Now, I
acknowledge that this can be worked around in the stylesheet, but it does
add an order-of-magnitude level of complexity.
> One additional consideration: in XML 1.1, U+2028 will be
> subject to parser-side linefeed normalization. It implies
> that you never get it from user text; and if you generate an
> entity just to make it appear after the normalization, why
> not generate a piece of markup instead?
OK. I find this to be persuasive, and it means ultimately that either I or
the authors of the XML 1.1 standard have misunderstood what the Unicode
standard was trying to do with U+2028.
This leaves only the issue of documentation. I would simply suggest that
Section 7.1 of the document "XSL Formatting Objects in XEP 3.7" be modified
to include your comment above that U+2028 is always treated within XEP as a
linefeed character.
Thanks again to both Nikolai and Ken for your explanations. This is not an
issue I feel strongly about, and I didn't mean for it to turn into a big
deal.
Victor Mote
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html
This archive was generated by hypermail 2.1.5 : Mon Apr 05 2004 - 09:41:16 PDT