Archive for the 'Extreme03' Category


Extreme Markup 2003: Day 4

Nordstom

Linking strategies Provides some useful guidelines on using XLink in real document management. Nothing much to say here but this would be a useful paper to read. Points like having IDs being autogenerated to ensure that they’re unique in the document set.

The QA session turns into an interesting discussion of training writers to work in the right way. One quote: 7 years to train writers not to say ’see page <xxx>’ and use ‘<xref…>’ instead.

Henry Thompson

Uniform Access to Infosets via Reflection The infoset defines the data model of XML but it can be extended via XML Schema (new properties can be added to it). How can we use XPath to refer to things in this extended data model.

Reflection: where a syntactic form is analysed and re-expressed using the same language, eg. so that the only functions are ‘apply’:

(f x y)
(apply f (list x y )) (apply f (apply list ...))

Reflection in infosets: define an XML document which describes the infoset of a document which uses fewer element names (or only element names and attributes.

<document size="10"/> 
<child namespace="" localname="document">
      <attribute namespace="" localname="size" value="10"/> </child>

Define Edinburgh Normal Form which encodes infoset in terms of <child/> and <attribute/>

Extend XPath to give access to extended infosets via reflection without actually re-serialising. Use an extension fn reflect() to do this, then uses XPath on the reflected infoset to access things:

p:reflect()/child[@localname="document"]/     attribute[@localname="size"]/@value

Note that reflect takes no args and the namespace determines what kind of reflection is being done. Example (look at paper) adds info on types to the reflection.

But, the PSVI isn’t a tree so how can you reflect it as a tree. In the implementation (based on Saxon — which provides good support for this) they don’t really build the tree, only traverse it lazily. Sure this is dangerous but the onus is on the user to avoid the kinds of patterns that would trigger this circularity.

Has done some thinking about how to deal with circularity in particular for the case where you want to compare identity of nodes.

Posted on 31st July 2003
Under: Extreme03 | No Comments »

Extreme Markup 2003: Day 3

Paolo Ciancarini

Topic Maps and RDF Talks about converting topic maps to RDF and back again, discusses some of the issues in doing the conversion both ways and presents an editor for doing the conversion and modifying/browsing the result. Has a java tool for doing this which might be made available on their website one day.

James Mason

Publication under topic map control Deals with guidance documents for security classification which help people decide on the level of classification to assign to a given document. He’s turned some of these guidance documents into into a topic map to be used for assisting in the classification task.

Lars Marius Garshol (Ontopia)

tolog - topic map query various proposals have been made but no standard, ISO cttee publilshed some requirements and recently a data model document. tolog, simple idea: topic map associations are like Prolog facts: queries are really like Prolog queries. Three implementations, one commercial from Ontopia (in the Omnigator) another layered on SQL and one in TM4J.

tolog (after the fact) found to be very similalr to datalog (with negation, plus some SQL features). tolog queries match data and return tables (note: not composable?). Example:

born-in($PERSON: person, $PLACE: place),
located-in($PLACE: containee, italy: container) 

Or use projection:

select $PERSON  where
born-in($PERSON: person, $PLACE: place),
located-in($PLACE: containee, italy: container) 

Allow explicit disjunction (A | B) instead of using the Prolog way: to make it clearer for users. Defines some built in predicates, eg. instance-of, direct-instance-of. Has negation but require all vars to be bound when called (ie. not(composer($FOO)) won’t generate all non-composers. Has full Prolog rules so complex programs are possible.

tolog 0.1 only queries associations with already known types. There are many other bits of the TM data model (ref?) that can’t be queried. Eg. find all topics, find all occurences. Add new built-in predicates to do these, eg. topic-maps($TOPICS), BUT: what if there’s an association called ‘topic-maps’ (answer is that you need to use NS qualified predicate names, see below).

Non binding closes — find all companies in Oslo and their home pages if they have them, use an empty or branch to indicate an optional clause:

located($COMPANY,oslo), (homepage($COMPANY,$HOMEPAGE) | )

There’s an issue of referring to topics since they have fully qualified names (uris) and these depend on the file that the TM is stored in (and there are three different kinds). Queries become verbose and unmanageable. Answer is to use NS prefixes instead of full uri prefixes.

Defines a string module, but this introduces unsafe (defn: datalog) predicates such as string:length($STR,5) which shouldn’t be evaluated unless $STR is bound.

tolog can query anything: shows RDF, relational queries. TMTL: XSLT for topic maps (proposal). Works like XSLT but without xsl:template, eg: <tmtl:foreach select=”…tolog…”>…

Liam Quin

XML Query Update A number of free implementations. XQengine (Howard Katz) is v. incomplete (see yesterday) but the only implementation which indexes files. Qexo is part of Kawa (scheme based) engine, compiles queries to java bytecode, can run in a servlet. IPSI-XQ (Java) does extensive query optimisation (and can show you what it did). Galax (written in OCAML) has schema support and static type checking, supports ML, C, C++ and Java APIs, second fastest in tests. Saxon is the fastest, now has XQuery as well as XSLT.

Since the standard isn’t final yet, all of these implementations differ in which version they support.

Norm Walsh

RDFTwig There isn’t a good way to work with RDF in XSLT because RDF isn’t trees and XSLT/Xpath are optimised to work with trees. Can work with serialised RDF but ultimately you’re working with the wrong data model: can be worked around but it’s’ fragile. There’s no unique serialisation. RDFTwig lets you work with different serialisations within your stylesheet. Implemented in Java on top of the Jena RDF store.

Different serialisations (breadth, depth and ‘breadth-first-deep’ — duplicate nodes as long as you avoid cycles). RDFTwig defines a bunch of fns which return trees which can then be styled with XSLT.

Also supports Jena’s RDQL — embed RDQL statements in <rt:rdql>

Henry Thompson raises the option of serialising as a flat structure where everything is a reference — don’t use the daughter relation for anything. This would make the xpath expressions more verbose (and perhaps impossible in XPath 1.0) but would avoid having to special case the places when references happen.

Bob Lyons

The schema conformance problem is there an XML document that conforms to a schema. There can be schema for which you can’t generate a conformant document, eg:

<!ELEMENT section  (section+)>   

Schema conformance is NP hard for DTD, RELAXNG (with XML Schema datatypes), NRL, undecideable (ie worse) for Schematron and XML Schema. Solvable in linear time for DTD with no ID/IDREF attributes and for RELAXNG with the base datatypes.

Posted on 30th July 2003
Under: Extreme03 | No Comments »

Extreme Markup 2003: Day 2

Thomas Passin

Bookmark Management with Topic Maps given large collections of browser bookmarks from different browsers, how do you manage them? Applies topic maps to the problem. Conventional bookmark managers are either flat (use search but how do you remember the keywords), of folder based (still hard to find related material).

Goals: navigation, collocation. Uses subject category names which are like folder paths claiming that this gives context for the end term of the path (eg. Software/Languages/Java vs Database/Interfaces/Java) while avoiding the need to organise things into a real hierarchy. Sourceforge project: TM4Jscript.

William Kent

Interesting keynote talk about identity…

Matthew Fuchs

XML Schema The problem of validating schema which use RE like repitition operators, eg. (a,b){1,3} which are used in XML Schema. Obvious algorithm is to unroll the alternatives before processing the schema but this is clearly exponential. Is there a better way? Yes, but I don’t understand enough about the problem enough to appreciate what’s going on.

Howard Katz

XQuery from the bottom up XQEngine (java OSS engine, available from fatdog.com) currently XPath only, preindexes documents to make query more efficient. Limited support at the moment (abbrev axes, element, attr and text nodes only).

XPath eval starts at the end of the path, eg /book/title would find all the titles first — this is the maximal list of nodes that the result could contain, then walk back up the path expr discarding nodes.

Seems to have made some premature optimisations, eg. the node representations contain forward sibling pointers and parent pointers because he’s only yet doing child/descendant processing.

Presents a demo on Shakespeare plays, indexes all plays on a fast machine in 2.5 seconds. It would be cool to evaluate XXPath against this — how bad are we?

Posted on 29th July 2003
Under: Extreme03 | No Comments »

Extreme Markup 2003: Day 1

Day 1 of the main conference saw an interesting range of papers from hard core modal logic applied to document markup to tips for making XSLT writing easier.

B. Tommie Usdin

It’s the Markup, Stupid! Why is XML popular? Don’t believe the Hype. XML is just a syntax, data models came later and there are lots of them (at least 3). What XML is: Pointy brackets, unicode, constraint language (DTD), data models, trees, family of specs. These aren’t related to the touted features — platform independance, internationalisation, future proofing, etc. So what’s the story?

What is good/bad XML? All XML is well formed and could be valid — that’s not the difference. BAD XML is stuff like MS Word’s HTML export, but why? Good XML helps us achieve our business goals, but no specs tell us how to make Good XML. The secret rules of Good XML Markup are: Generic Markup and Indirection.

Seperating form and content is what enables businesses to achieve their goals, but the XML specs don’t talk about this. (Last standard to talk about this was the GENCODE/SGML spec from 1983).

Indirection is using names to refer to things: notation declarations, entities, validation rules seperate from the document. Name it and point to it: allows me to change the pointer to enable portability. Indirection Powers XML

Key tool relating to Indirection: Oasis XML Catalogs. A way of mapping by logical names to entities outside the document — change the mapping to enable portability.

So, XML suceeds becauses it is a good way of doing Generic Markup and Indirection.

Allen Renear

Logic based approaches to documents. BECHEMEL Markup Semantics Project: SGML/XML markup makes assertions (this is a section, this is the title of this section). The document is given by the collection of licenced inferences, not the XML markup, not the data structure it serializes. Is an Existential-Conjunctive language (subset of FOL) good enough? Do you need to say more? Likely interesting extensions will be modal.

Can’t expres negation, alternatives, conditionals, functions, uninversal quantification. Do we need to? Maybe: the author of A is German -> ‘the’ implies universal quantifier (only one author). This implies that more than EC is needed.

Alethic modal logic allows for some things to be always true while others can be true or false….and here I get lost in the details of modal logics for reasoning about documents.

Jenni Tennison

Type related changes in XSLT, while XSLT2.0 is generally good this talk critiques the type related changes in the spec. Motivations for these are in alignment with XML Schema and to allow static typechecking of documents. Usual story is that type related changes can be ignored and XSLT1.0 stylesheets will still run. Is this true? Answer one, running existing stylesheets should be ok in `compatability mode’.

Type system is complicated by the fact that schema validation is optional (and probably won’t be commonly available for a while) so the static type checking benifits won’t be available. Type system is largely similar to XML Schema (but not the same) and using it (with some caveats) gives benifits like being able to sort on dates.

Casting makes XSLT2.0 more complicated since auto-casting isn’t done. Explicit casting is needed in most cases.

Simon St. Leurant

What can you do with half a parser? an amusing talk illustrated by playmobile figures acting out the drama of XML’s entry into the workplace. Core message is that we can disect the XML standard and build processing tools that allow us to ignore the parts that aren’t of interest.

Graham Moore

Engineering the Semantic Web We’re in need of a communicatiton protocol to underly the semantic web, so that agents scan talk to metadata sources for example. We’ve got lots of tools for storing and serialising data, topic maps, rdf etc. Existing communication is in terms of HTTP and HTML, not as direct communication between, eg. data sources/data models. Shows image of the semantic web layer cake but there’s nowhere that the communication protocol can fit.

SW Use cases: web clients finding out metadata about a web resource being browsed. Client applications aggregating SW data from multiple sources. Possibly allow update of SW data as well as just query.

Discusses some of the requirements for SW server. Can it be layered upon HTTP, deployment should be easy…

Identity resolution, RDF has a problem differentiating a ‘resource’ and ‘knowledge about the reification of that resource’ — both are the URI. Topic maps have standardised this.

Evaluate HTTP: don’t want data stored on the server as RDF/XML or XTM — we might accept that these would be delivered but we don’t want t o have to query the XML representations — need to work with the data store.

Evaluate URIQA (URI Query Agent, Patrick Stickler (Nokia)). HTTP extensioon that given a URI will return a concise bounded description (approx. all things known about the identifer). Easy to deploy and implement, has basic query support. No update, introspection support (ie. what QL do you support). Not a general mechanism for interacting with RDF models. [note though that this is really just thin layering on top of HTTP and so isn't really different to the previous para].

RDF Net API: define a protocol to enable the semantic web to allow querying and updatting RDF models. API supports: Query, GetStatements, InsertStatements, RemoveStatements PutStatements, UpdateStatetments, Options (of the server). Defined an HTTP and a SOAP binding for the protocol. Hopes that this is the SAX enabler for the semantic web.

Kal Ahamed

Topic map design patterns. Subject indicator is a resource (document) which describes a subject to a human reader, might include machine oriented metadata for auto-consumption. Subject identifier defines an identity for a subject, can be compared to determine subject equality, should resolve to a subject indicator. Problems with these: what should be in an indic.? How published? How do I know when something is meant as an identifier?

Published Subject Indicator (PSI) defined by OASIS for defining subjects, required to be stable, accompanied by meta-data. Addresses 1.5 of the three objections (publishing as XHTML and PSI URIs are specially defined, but not enough about content of PSI documents.

Missing link relates to prescriptive human readable content…

Design patterns, named generic solutions, having names makes them easier to talk about, compare, etc. Gives better solutions. Topic map design is similar (shared design problems, more than one solution, confusing for the novice) and different (about organisation, identity, semantics) than programming. So what’s a topic map DP? KA say’s pretty much all of it, including a PSI ref to an implementation of the DP. This fits in to the prescriptive human readable slot noted as missing above (I don’t understand this yet). Diagram TM DPs as UML diagrams. Fill the role of prescriptive subject descriptions — describe how the data has been organised.

Holman

Literate XSLT XSLT is trasformation by example. Try to address the challenge of writing stylesheets via Literate programming. “When writing stylesheets, focus on the result”. Distinguishes pull (xsl:for-each, xsl:value-of) vs push (xsl:apply-templates) of writing stylesheets. Push is modular, pull is monolithic, push is better because it’s more reuseable.

Talks about doing XSLT design by annotating the result tree (for a push oriented stylesheet). Achieve this fairly easily by adding markup to mock result, running through literatexslt.xsl to produce the XSLT stylesheet. Advantage is that the mock markup is still viewable (since another namespace is used and, eg. HTML browser or FO processor will ignore). Among other things this allows some validation to be done on XPath expressions by using sample source data which has been generated to contain examples of all possible XPath expressions. Describes the process and the tools developed to follow this process. Sketches a GUI tool that might be used to carry out this kind of process involving interactive design of the mock result followed by drag and drop linking of the source to the mock result.

Posted on 28th July 2003
Under: Extreme03 | No Comments »

Extreme Markup 2003: Tutorial

Having arrived in Montreal after an Airport Ordeal in Detroit and slept very well at the Bed and Breakfast I turned up at the Hilton a little late for Jonathan Robie’s all day XQuery tutorial. This was a pretty full-on session with Jonathan working through various ways of querying XML data (or relational data as XML) before going into detail about XQuery. What was useful was to hear about why some things were done in the design of XQuery and finding out about some of the lesser known features of the language. There’s plenty of material here which will turn up in my lectures later this semester!

Afterwards I wandered down to the Old Port of Montreál, there’s a nice feeling down there although it’s clearly the tourist area of town. Then back to the B&B via Chinatown where I had my first non-steak dinner in a week.

Posted on 27th July 2003
Under: Extreme03 | No Comments »