Skip navigation.
Home

XHTML-IM

Is there any chance of supporting XHTML-IM in JabRSS? An awful lot of RSS feeds contain HTML, <a href="..."> especially, and these end up looking like crap in Gaim at least. Or am I missing something and Gaim is broken?

Dan

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

XHTML Support

Does gaim support XHTML? Which Jabber clients do currently support XHTML messages?

Yes, Gaim supports XHTML. The

Yes, Gaim supports XHTML. The following was generated by Gaim. I did a quick search with Google for other Jabber clients that support XHTML, but I couldn't find any. It's seems that a few are working on it though.

<message type='chat' to='...' from='...'>
<x xmlns='jabber:x:event'>
<composing/>
</x>
<body>
hi there
</body>
<html xmlns='http://jabber.org/protocol/xhtml-im'>
<body xmlns='http://www.w3.org/1999/xhtml'>
<em>
hi there
</em>
</body>
</html>
<x xmlns='jabber:x:delay' from='...' stamp='20040719T14:50:33'>Offline Storage</x>
</message>

It wouldn't be hard would i

It wouldn't be hard would it? If you'd like, I could write some Python code to convert HTML to valid XHTML-IM. I guess I could hack it in myself, but I'm too lazy to work out how your code works. Let me know.

Converting HTML to valid XHTML

Converting HTML to valid XHTML doesn't seem to be that easy in Python. I thought the sgmllib module might do the trick, but it doesn't work that well.

If you have an idea how to best do this (possibly with plain Python 2.2), please tell me.

I guess it would take more

I guess it would take more work than I thought, but I don't think it would involve anything too difficult. I would go about like this: First I would examine the XHTML-IM schema and construct by hand a mini "schema" datastructure in Python, which would say what elements can be children of other elements. There are only about 20 different elements in XHTML-IM, so I don't think it would be too much work.

I would parse the HTML with sgmllib and build the output XHTML as a DOM with xml.minidom. Only legal elements would be added to the DOM (as determined by the "schema"). Illegal tags would be ignored, but child tags of those would be processed as per the XHTML-IM spec.

I think I would keep a pointer into the DOM called "last_element" that points to the last element added. When a start tag in the source HTML is encountered, then:

1. If that tag is a legal child of last_element, then it is added as a child to last_element.

2. Else if the tag is not a legal child of last_element but is a legal child of last_element's parent and last_element doesn't require a closing tag (in plain HTML), then the new element is added as a sibling of last_element.

3. Else the tag is ignored (or maybe output in escaped form). The input must be invalid or non-well-formed in this case I think.

When a close tag is encountered, then the last_element pointer would be made to point to the parent element.  

Any character data encountered is added to the output (even character data enclosed by unrecognized tags). I think all the XHTML-IM elements can have character data, but that would need to be checked.

After the DOM is constructed, then it is dumped as a string.

That explanation is probably a bit confusing, and I think there would still be issues to think about, but I think it would be doable.