<div dir="ltr">Simon,<br><div><div class="gmail_extra"><br><div class="gmail_quote">On 23 February 2016 at 14:12, Simon Cozens <span dir="ltr"><<a href="mailto:simon@simon-cozens.org" target="_blank">simon@simon-cozens.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 23/02/2016 13:54, Andrew Cunningham wrote:<br>
> PDF/UA for instance leaves the question deliberately ambigious.<br>
> ActualText is the way to make the content accessible, but developers<br>
> creating tools for PDF do not actually have to process the ActualText.<br>
<br>
</span>Yeah. (Sorry to keep banging the drum but) I've just done some tests<br>
with SILE, which includes some support for tagged/accessible PDFs. Even<br>
when the ActualText includes the correct Devanagari, I am still seeing<br>
the same problems with cut-and-paste. I'm not sure what needs to be done<br>
to get it right.<div dir="ltr"><div><div dir="ltr"><br></div></div></div></blockquote><div><br></div><div>In terms of SILE ... supporting generation of other formats like XPS as an alternative to PDF is probably the only way forward for complex script languages.<br><br></div><div>If SILE is tagging the PDFs and adding ActualText attributes , then it is doing everything it should be doing. The problems are with the PDF specification itself, what it was originally designed to be (a pre-print format based on the Postscript language) and the limitations placed on it by the developers of the spec.<br><br></div><div>Andrew <br></div></div>
</div></div></div>