|
Online
Solutions And Innovation, Inc.
www.osai.com/Developer/W3HTML.asp
(214) 432-1063 (Dallas) || (866)-573-2865 (Toll Free) Experts in Web, Email, and Application hosting. Dedicated to consistent, reliable, fault tolerant service! |
|
Discovered through twitter, there is an interesting blog post from Kroc Camen on how to learn HTML 5. The author is giving good essential guidelines on semantics and elements. The conclusion of his blog post is spot on and shows one of the painful points of HTML 5 specification:
Once you have made a decent HTML4 site, then you will look at the HTML5 specification, and it will make sense—you will know what to do with it.
A document is being written for filling this hole: The Web Developer’s Guide to HTML 5
HTML 5 is a giant specification. It contains things related to the content model, the APIs, the DOM, the parsing algorithm, etc. We received many comments that it was very hard to read for simple implementers and documentation writers who would like to better understand how html 5 documents are written.
Discover the editor's draft of HTML 5: The Markup Language! Mike Smith has extracted the parts of HTML 5 related to the content model. This document is aimed at people who would like to focus on the content model, be reviewers, authoring tools implementers, documentation writers.
We hope that it will help everyone to have a better understanding of html 5 content model. An additional document should be provided in the future for learning about html 5 with the name Web Authoring Guidelines.
HTML 5 working draft defines a parsing algorithm which is robust enough that it will not break for the most common types of errors. Many computing engineers and Web designers think that this feature encourages bad quality for documents. Point taken. But let's look a bit further at what proposes HTML 5 in terms of input and output.
I do mistakes very often. English is not my mother tongue. I try to fix spelling mistakes as much as possible. Still, my English grammar is still behind. Oh, not looking for any excuses, I do mistakes in French too. But I'm glad that our brains have a very robust parsing mechanism which helps us to recover from broken sentences (bad spelling and clumsy grammar). Just imagine for the experiment, that your brain was not able to parse any sentences that would not be grammatically correct. How many sentences in our daily conversation are 100% correct? Not that much.
At W3C, we take meetings minutes (a lot) of discussions happening over a phone. During these teleconferences, there are people from different nationalities, different accents, different levels of English and on top of that in different time zones (fatigue). Still the scribe (minute taker) writes down, most of the time, correct English sentences after parsing them. The scribe creates a serialization of what he heard, but modifies it to be correct.
An HTML 5 Tidy library could do the same thing. It could parse a broken document and create a DOM following the HTML 5 parsing algorithm. Then it could serialize it (writes it down) following the HTML 5 content model. That would create a conformant HTML 5 document.
This is an important part of the process. What you hear is not what you write. You are stricter, once you have recovered the meaning. The same way what the HTML 5 Tidy library has parsed is not what it will serialize. Let's take a practical example with the infamous center element.
<!DOCTYPE html>
<html>
<title>a broken document</title>
<center><p>I want to be in the center of the page.</p></center>
</html>
The innerHTML view (using HTML 5 Live DOM viewer) is:
<!DOCTYPE HTML>
<html>
<head>
<title>a broken document</title>
</head>
<body>
<center><p>I want to be in the center of the page.</p></center>
</body>
</html>
But the document will be invalid and the message given by the experimental HTML 5 validator instance will be
Validation Output: 1 Error
# Error Line 4, Column 7: The center element is obsolete..
Why? Because nowhere in the content model of HTML 5, the center element is defined. You can't write an HTML 5 conformant document containing the center element. An HTML 5 Tidy library would emit only elements which are compatible with the HTML 5 content model. In this case that could be
<!DOCTYPE HTML>
<html>
<head>
<title>a broken document</title>
</head>
<body>
<p>I want to be in the center of the page.</p>
</body>
</html>
Some people will argue that everyone will want different rules. Indeed that is possible. Some will want to have double quotes around attributes, some single quotes. And if we take into account the set of documents which have complex mixed markup, it will indeed create a lot of headaches. But it's why I think it would be interesting to define a set of basic rules for emitting HTML 5 after it has been parsed.
Some might propose solutions of the following type for the center element.
<!DOCTYPE HTML>
<html>
<head>
<title>a broken document</title>
</head>
<body>
<p style="text-align:center;">
I want to be in the center of the page.
</p>
</body>
</html>
Daniel Glazman proposed recently something quite similar for HTML attributes, inline style or style rules.
Are there any engineers which would be ready to take the challenges of designing an HTML 5 Tidy Library (and the canonic rules to fix the output) using the content model of HTML 5? I showed recently that there will be less document validating with the HTML 5 doctype than with their current doctype. Henri Sivonen rightly commented that HTML 5 content model was stricter than previous versions of HTML. This will not leverage the adoption of conformant HTML 5.
An HTML 5 Tidy Library (even not perfect) would help people to move forward. If there are no benefits, people will continue to use HTML 4.01 and/or XHTML 1.0 because, in the end it doesn't matter,
Last month, Brian Wilson published a survey on validation. He took the top 500 sites URI given by Alexa and sent them to the W3C Markup validator. Recently, W3C created a beta instance of html 5 conformance checker. Brian concluded that 32 of the 487 URLs passed validation (6.57%)
.
So today I decided to take the January 2008 list of web site and to send them to the beta instance of html 5 conformance checker. I created a very simple python script (As usual if you are in horror with my code, any kind suggestions to improve it is welcome). Be careful you will need to install httplib2. The file alexa.txt contains the list of uris, one by line. To be sure to check against html 5, I forced the html 5 doctype.
import httplib2
import time
h = httplib2.Http(".cache")
f = open("alexa.txt", "r")
urllist = f.readlines()
f.close()
for url in urllist:
# wait 10 seconds before the next request - be nice with the validator
time.sleep(10)
resp= {}
url = url.strip()
urlrequest = "http://qa-dev.w3.org/wmvs/HEAD/check?doctype=HTML5&uri="+url
try:
resp, content = h.request(urlrequest, "HEAD")
if resp['x-w3c-validator-status'] == "Abort":
print url, "FAIL"
else:
print url, resp['x-w3c-validator-status'], resp['x-w3c-validator-errors'], resp['x-w3c-validator-warnings']
except:
pass
Before I give the results, repeat after me 10 times : html 5 Conformance checker is in beta, which means not stable and in testing. html 5 specification is a Working Draft, which means highly to change. The test is only on the home page of the site.
The January 2008 file contains 485 web sites. 23 (4.7%) could not be validated. Most of the time, the site was too slow. Only 4 (< 1%) sites were declared valid html 5 by the conformance checker. If Henri Sivonen could do the same thing with his instance of html 5 conformance checker that would help to know if my results are silly or in the right envelop.
Youtube gives a way to insert a video in your pages. You can select a few options and the system gives you a piece of html code to insert in your Web page.
<object width="425" height="349">
<param name="movie" value="http://www.youtube.com/v/ZuNNhOEzJGA&hl=fr&fs=1&rel=0&color1=0x006699&color2=0x54abd6&border=1"></param>
<param name="allowFullScreen" value="true"></param>
<embed src="http://www.youtube.com/v/ZuNNhOEzJGA&hl=fr&fs=1&rel=0&color1=0x006699&color2=0x54abd6&border=1" type="application/x-shockwave-flash" allowfullscreen="true" width="425" height="349"></embed>
</object>
embed is an element which is part of HTML 5 Working Draft but not part of XHTML 1.0 or XHTML 1.1. The embed element in this example is a fallback of the object element. It says if the object element is not working, use the embed element. So I decided to just cut the embed element in the XHTML 1.1 page.
<object width="425" height="349">
<param name="movie" value="http://www.youtube.com/v/ZuNNhOEzJGA&hl=fr&fs=1&rel=0&color1=0x006699&color2=0x54abd6&border=1"></param>
<param name="allowFullScreen" value="true"></param>
</object>
The code stopped working. The video was not displayed at all in the page. It probably means that the object element has no effect at all and embed is always triggered. So I started to explore what was missing. First, the param element is an empty element, so there is no need for a closing element.
<object width="425" height="349">
<param name="movie" value="http://www.youtube.com/v/ZuNNhOEzJGA&hl=fr&fs=1&rel=0&color1=0x006699&color2=0x54abd6&border=1"/>
<param name="allowFullScreen" value="true">
</object>
Then I moved the information in the param element to the object element. And finally I added a textual information about the content of the video in case the video would not work properly.
<object width="425" height="349"
type="application/x-shockwave-flash"
data="http://www.youtube.com/v/ZuNNhOEzJGA&hl=fr&fs=1&rel=0&color1=0x006699&color2=0x54abd6&border=1">
<param name="movie" value="http://www.youtube.com/v/ZuNNhOEzJGA&hl=fr&fs=1&rel=0&color1=0x006699&color2=0x54abd6&border=1"/>
<p>Interview of Philippe Le Hégaret about Video codec</p>
</object>
I finally tested it in Camino (Version 1.6.1Int-v2 (1.8.1.14 2008051211)), Opera (9.52, Révision 4916), Firefox (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; fr; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1), and Safari (Version 3.1.2 (5525.20.1)) and it worked well.
A couple of years ago, Joe Gregorio explained Why so many Python web frameworks? and showed how to create your own Web framework with a few lines of code. The most fundamental bricks are packaged in the standard python library.
There are always been many hypothesis about why the Web was successful, all of them can't be verified, because we can't restart the experiment. But retrospectively, it is interesting to look at what has been reused widely. libwww is a library including all modules to create a Web application, such as http, html, etc. It has been initially written by Tim Berners-Lee, and was freely available. People could just take the library and put an interface on top of it to make a browser, a Web server, an indexing bot.
One of the very difficult and technical parts of a browser is the layout (or rendering) engine. It takes the Web content and displays it after having processed the style and scripting information. Some of these layout engines are open source such as WebKit, Gecko and KHTML. Others are sometimes sold to third parties for developing specific products. Web developers take them and create new browsers or use them in their applications. For sure it is a tad more complicated than just taking the libraries but it became "easier" to create a browser. It is known as BYOB.
Years ago, Dan Connolly created one of the first HTML validators. Since then, validation is one of the practical techniques to maintain quality of your documents. It is not the only one but one which has a lot of influences on developing a language.
HTML 5 is a specification in development. Many implementers have started to integrate the technology into their own softwares: browsers, parsers, etc. Henri Sivonen has developed a parser in Java for his conformance checking service.
We are happy to announce that W3C has integrated a version of HTML 5 conformance checker into a beta instance of the W3C Markup validator. That will help us to detect bugs, improve the user interface, and benefit from the large W3C communities.
Kudos to Henri Sivonen, Olivier Théreaux, and Yves Lafon.
I don't particularly care for the rel="profile" design, but one should choose ones battles and I'm not inclined to choose this one. I'm content for the market to choose.
There is an increasing number of people living in a digital era. Not only the environment becomes digital, but their own life products are digital. I'm not the earliest adopter, but I have used computers since 1983, Internet since 1991 and digital photography since 1993. I have accumulated around 418.000 emails and around 45.000 photos.
Emails have basic metadata (author, subject, date), which helps to create proper indexing and search. It could be certainly refined with sophisticated search algorithms. Digital photos have now EXIF (including date, and technical parameters of the camera) but nothing much else. My brain associates these photos ordered in a dated space with a list that I maintain of my very rough location. It helps me remember in which city it was taken, but nothing more.
Here lies the challenge. Giving more precise metadata to these photographs would be certainly useful for my own consumption but … a fulltime job. That would make me a digital Stakhanovite.
In Musings on photographic metadata, Sean McGrath says:
A great tragedy lies herein. A sad law of this universe seeps forth like an Einstinien nightmare. A law that goes something like this : "the chances of any normal human being taking the time to add incredibly valuable meta-data to the great wads of digital data they create daily, approximates zero." An alternative formulation - using the classic dentistry analogy goes like this: "Most people would prefer root canal work than the utter tedium and ambient feeling of futility that accompanies meta-data creation. Besides everyone is too busy. Oh, and besides that again, it always seems to be more fun to create new stuff than to create stuff about old stuff."
Sean is trying to see what could be done automatically through the devices, location is certainly one that should happen more and more often, see the new Nikon Coolpix P6000 with GPS and the geolocation activity creation being discussed at W3C right now.
The challenges become bigger when you want to share these photos with a larger public. At regular cycle, there is a rage debate over alt attribute on html working group mailing list. I'm not sure there is a perfect solution and we have to find a way to accomodate the circumstances of this sharing. The difficulty is that the right solution is more social than technical. Giving meaningful alternative information for the images you put online, really depends on the context. These are real scenarios.
There are many more possible cases. The big issue is how do we design the technology so that it will accomodate a maximum of use cases (social contexts) without making impossible for others to exist. There is not yet a definitive answer.
I tend to keep an eye on things done at CERN. Not just because this is the Web's mothership, but also because there is always a very slim chance that one of their experiments happen to recreate the big bang, kill us all, re-shape the laws of the universe or something else equally exciting and dreadful. After all, it would really be a waste to plan a release of one of our tools after the end of time. So when I started reading about the countdown to the launch of the Large Hadron Collider for August 8th, 2008 I knew it was time to push that maintenance release of the Markup Validator I had been promising “real soon now” for… the past months.
As it turns out, our friends in Switzerland will only start recreating the time just after the big bang in a month. Ah well. Until then, we will have time to enjoy sports on TV, and the Markup Validator, release 0.8.3.
This is mostly a maintenance release, fixing a few bugs, adding support for recently added or updated document types such as XHTML Basic 1.1, but it does have a number of valuable tricks up its sleeves.
For those of us using the validator not just as a web service but as a web platform, a couple of new features will make our life even easier. First, a json output has been added to the validator's results possible outputs. The format is modeled after the JSON output built by our friends at validator.nu. Try this:
GET "http://validator.w3.org/check?uri=http://qa-dev.w3.org/wmvs/HEAD/dev/tests/2342-opensp_type_X.html&output=json"
…you get:
{
"url": "http://qa-dev.w3.org/wmvs/HEAD/dev/tests/2342-opensp_type_X.html",
"messages": [
{
"type": "info",
"subtype": "warning"
"lastLine": "11",
"lastColumn": 20,
"message": "reference to non-existent ID \"MMIARCH\"",
"messageid": 183,
"explanation": "
[...]
<div class=\"ve mid-183\">
<p>This error can be triggered by:</p>
<ul>
<li>A non-existent input, select or textarea element</li>
<li>A missing id attribute</li>
<li>A typographical error in the id attribute</li>
</ul>
<p>Try to check the spelling and case of the id you are referring to.</p>
</div>
",
}
],
"source": {
"encoding": "utf-8"
}
}
While we are looking at calling the validator and getting quick, easy to process results, did you know that the fastest way to get basic info on validation were the validator's custom HTTP headers? They have been around for a while, now are properly documented and we have added information about the number of warnings, too. Try this:
HEAD http://validator.w3.org/check?uri=http://qa-dev.w3.org/wmvs/HEAD/dev/tests/2342-opensp_type_X.html
200 OK
Date: Fri, 08 Aug 2008 15:00:49 GMT
Content-Language: en
Content-Type: text/html; charset=utf-8
Client-Date: Fri, 08 Aug 2008 15:00:52 GMT
Client-Peer: 128.30.52.49:80
Client-Response-Num: 1
X-W3C-Validator-Errors: 0
X-W3C-Validator-Recursion: 1
X-W3C-Validator-Status: Valid
X-W3C-Validator-Warnings: 1
Another good piece of news. If you have a vested interest in XHTML, you will know this dilemma fairly well:
application/xhtml+xml media type. That XHTML media type has a few issues, however, in particular the fact that the most distributed browser, up to now, still hasn't added support for it. application/xhtml+xml only to the agents that clearly specify they support this media type, and as text/html, by default, to the othersAccept header optional, and its absence just means “send me what you've got”Accept hack for XHTML, the validator would be served content as text/html, and, since that is not supposed to happen, the validator would yield a warning stating, in essence are you certain you really want to serve XHTML 1.1 content as text/html?.It may have been a mere warning, but it made a lot, lot, lot of people anxious and upset. So, by popular demand – and also because the XHTML working group are preparing a revised note on XHTML and media types − the warning is gone.
Those interested in HTTP content negotiation beyond the issue with XHTML media type will be interested with some new features in the validator. In version 0.8.2 we had added a way to specify the Accept: and Accept-Language headers sent by the validator to the server holding documents it checks, and in 0.8.3 we also added Accept-Charset and User-Agent. These options are still experimental, but should be useful for content-negotiated resources that do not have a specific URI for each representation.
There is more in this version, and more to come. Read the 0.8.3 release notes, learn how to send feedback or participate in the project, and join me in thanking everyone involved in this release.
Some technical blogs are usually interesting, but there are some which really push the limit and helps you to analyze and understand. Reading these blogs, it just feels good. A sample of interesting blog posts I have read lately:
Once upon a time, we started the Quality Assurance activity at W3C in 2001, one of the objectives was to find a way to improve the materials for communicating with Web developers. In the QA group, Snorre M. Grimsby (Opera) told me that we might find resources for producing educational materials. The discussion became quiet for a while and restarted in June 2006 with David Storey (Opera). As the same time, some people at WASP started a survey for defining requirements for a Web Standards Curriculum.
Finally in March 2008, David introduced me Chris Mills (Opera) and I had the chance to read and send comments on earlier drafts of the Web Standards Curriculum. It has been now released and it's a wonderful piece of work. I will give it a full read and review in the next month and suggest things to Chris Mills.
Now how can you help? Read it, use it in your Web agency, in your classroom, among your Web developers friends. Note what people misunderstood, suggest techniques to Chris Mills to improve his materials. Publish it on your blog, talk about it. Let it grow in the community. It's a cool work which comes from a long story and really it is beautiful story.
Thanks to Chris Mills and Opera. They did it.
Good news today from Sunava Dutta of Microsoft's Internet Explorer team in regard to the W3C Access Control for Cross-Site Requests specification: Sunava writes that, as early as IE8 Beta 2,
IE8 will ship the updated section of Access Control that enables public data aggregation (no creds on wildcard) while setting us up on a trajectory to support more in the future (post IE8) using the API flag in an XDR level 2.
That's contingent on getting some understanding that "this area of the spec (public data) will not change significantly unless there are new security concerns."
That means we are now one (big) step closer to ultimately having cross-browser support for Web developers who want to write Web applications for that "public data" use-case of the Access-Control mechanism.
Sunava's announcement is one of several positive outcomes from a three-day face-to-face meeting that the Web Applications Working Group had at the Microsoft offices last week.
So, much thanks to Sunava and to others on the Internet Explorer team for the work they've been doing to help bring us closer to getting this collaboratively developed open standard for client-side cross-site requests out to the widest number of Web developers and end users possible.
In a recent item on IE8 Security, Eric Lawrence, Security Program Manager for Internet Explorer, introduced a work-around to the security risks associated with content-type sniffing: an authoritative=true parameter on the Content-Type header in HTTP. This re-started discussion of the content-type sniffing rules and the Support Existing Content design principle of HTML 5. In response to a challenge asking for evidence that supporting existing content requires sniffing, Adam made a suggestion that I'd like to pass along:
I encourage you to build a copy of Firefox without content sniffing and try surfing the web. I tried this for a while, and I remember there being a lot of broken sites ...
That reminded me of an idea I heard in TAG discussions of MIME types and error recovery: a browser mode for "This is my content, show me problems rather than fixing them for me silently."
Though Adam offered a patch, building firefox is not something I have mastered yet, so I'm interested to learn about run-time configuration options in IE (notes Julian) and Opera (notes Michael). Eric Lawrence's reply points out:
Please do keep in mind, however, that most folks (even the ultra-web engaged on these lists) see but a small fraction of the web, especially considering private address space/intranets, etc.
A report from one developer suggests there's light at the end of the tunnel, at least for sniffing associated with feeds:
I did, partly as an experiment, stop sniffing text/plain in the latest release of SimplePie (which, inevitably, isn't the nicest of things to do, seeming there are tens of thousands of users). Next to nothing broke. I know for a fact this couldn't have been done a year or two ago: things have certainly moved on in terms of the MIME types feeds are served with ...
If you get a chance to try life without MIME type sniffing, please let us know how it goes.
You have read a lot about the html 5 specification. You heard that there were hidden dragons and acid rains. But what about looking by yourself practically how html 5 parsing is working? There are already some tools to play with html 5.
DOM (Document Object Model) is the representation that browsers are using in memory to manipulate Web content. Browsers have bugs and the content on the Web is largely not conforming. It results in very different DOM representations in browsers. If you are interested by seeing what a document looks like in different browsers, you can use the Live DOM Viewer. Open this link with each browser you know and paste code into the window.
This helps you to see how the Web content is understood today by different tools.
Now you might be interested to see how a document will be represented by a tool implementing html 5 parsing rules. An important note, html 5 is a specification in development. Things might change. The following tools might be incomplete and contain bugs as well. But it will give you an idea of the DOM. It is very practical when you are developing another language which is not html 5 but might be sent as text/html (by mistake or practical choice).
There are at least two online services:
Henri Sivonen developed a standalone application that you can use on your desktop. Here are the instructions to get it running. It worked fine on my macintosh.
Henri gave a list of limitations and bugs
There are for now three implementations of the html 5 parsing algorithm.
There is an attempt at implementing in C# for .Net 2.0, but no code has been released yet.
If you know other tools implementing it, leave a comment.
When a software is shipped, it has bugs. There are many reasons for these bugs. It can be poor in-house development, it can be careless testing, it can be unclear specifications, and many other things. We have to live with these bugs in software.
A bug deployed in a software for a long term becomes a feature.
It's specifically true in a distributed environment where pieces are loosely joined: the Web. Softwares are released with their inherent bugs. Content and framework developers are hit by the bug. They modify their own software to accommodate the bug or take advantage of it. No new version of the buggy software is released for a long time. When it is finally time to release a new version, the buggy software has to keep the bug as a feature to not break anything on the Web. Eventually, one day the bug makes its way to a specification like html 5.
It is difficult to change things because they are all intertwined but in a very loose way, which makes its strength. You can try to fix the software knowing that it will break things at many places. You have then to be ready to loose customers if someone else as implemented the bug. Users are not aware of the bug, and they don't really care about it. Fixing means also, in this case, educating people about the issue, and content developers on how to fix their content. Content developers will be the hardest ones. If they fix, knowing that it will break things in other softwares, they will loose customers. So they are not likely to do it.
To avoid that bugs become features, softwares have to be released with a short cycle. So that people can't take advantage of bugs. It means also that bugs don't survive many releases.
Can we improve the situation for bugs already deployed?
The solution could be a simultaneous release of softwares and a campaign educating people. This is challenging. Very challenging. It means agreement between companies at the release moment and a front with regards to unsatisfied customers. I just wonder if it would be possible as an experiment for one or two bugs. For example, in HTML 5 specification, browsers and Web sites, would it be possible to fix the content-type sniffing on text/plain.
Almost 70 years ago, on a Sunday, October 30, 1938, we could hear on a radio:
Ladies and gentlemen, we interrupt our program of dance music to bring you a special bulletin from the Intercontinental Radio News. At twenty minutes before eight, central time, Professor Farrell of the Mount Jennings Observatory, Chicago, Illinois, reports observing several explosions of incandescent gas, occurring at regular intervals on the planet Mars.
Recently on Monday, June 23, 2008, we could read on a radio site
hCalendar will be gone from /programmes by the next deploy (probably this Thursday).
In the meantime we'll be looking at the possible use of RDFa (a slightly bigger S semantic web technology similar to microformats but without some of the more unexpected side-effects).
What's common between the two? They created a big wave of reactions, comments and arguments: A war of the worlds.
I would like to focus on two blog posts which I like in this flood of comments. There are many more interesting.
Ed Dumbill says in The BBC, microformats, RDFa and Resig:
One of the wonderful things Resig has done with JavaScript is take time to love it and figure out its corners. Take some of the "confusing" and "advanced" things away and you're not able to achieve the same things. What he's done in jQuery is add a layer of elegance, predictability and accessibility.
I for one would love to see what Resig would do with semantic markup. jQuery really encourages and enables good markup practices, so there's a lot of synergy with his current style.
Not only jQuery, I met once, John Resig in Tokyo. He was giving a talk about new features of the future Ecmascript. It was complex, not necessary easy to understand, but he made it in a way that was enlightning. We could see he had pleasure talking about it. That was refreshing. I decided to put it on the side of good speakers who are worth to go see again.
Then not so far ago, John ported Processing vizualization language to Javascript. I love graphics and information processing. It was yet again another moment of pleasure thinking "Some people have talents and creativity in their hands, they do beautiful things with complex objects."
The other blog post is in French and comment also about the affair. Damien Bonvillain is giving his take on RDFa and its simplicity:
In fact, RDFa defines only 5 new attributes (about, property, resource, datatype, typeof)
RDFa became a candidate recommendation last week. You can read the Primer or go to the RDFa wiki to learn a bit more about the technology. Yes, indeed, for some people it will need a bit of work to understand the concepts. But it took me time to learn HTML, and I don't really master Javascript, but people like John gave me the opportunity to simplify things by developping tools, libraries or authoring tools.
And HTML 5 in all that? Here again there is the story behind the story. The first version of RDFa was using a lot elements like meta and link in the body of a page. But browsers because of invalid markup found on the Web have to recover pages and put back the link and the meta in the head of the document. RDFa community listened and learned. They modified their model to make a step toward HTML 5, to create an environment that will create less interoperability issues. They made a step in the right direction to be able to work together.
Next week, I will show why it is important and how that can work even if not perfectly. But remember, it is because there are people like John Resig, who creates, that complex things become easy. The war of the worlds was a fiction.
When groups of implementors and others (working groups in standards bodies and what have you, or groups of implementors and others with shared interest in a certain set of technologies) gather together publicly for focused technical discussion on a particular topic — or, say, to pool their efforts to produce specifications for new technologies — there's a common scenario they can sometimes find themselves facing.
That scenario starts with the appearance in the group of certain kinds of new arrivals (for lack of a better term) — sincere, well-intentioned people who show up with some ideas that are often pretty interesting but that they've cooked up sorta on their own, in relative isolation. The new arrival — driven by a strong personal conviction that his ideas have real value — then makes a sustained effort for a while to do everything he can to get the rest of the group to pay attention and consider spec’ing out and implementing those ideas.
But the reaction of the rest of the group in such cases can often range from simple indifference to sometimes-polite and sometimes-not-so-polite attempts to point out to the new arrival that his ideas have some fundamental problems that make implementing those ideas impractical or impossible or even just plain undesirable.
In a recent real-world case of something that could be seen as an instance of that kind of scenario, Michael Kay posted a message that makes an interesting analogy:
To be honest, it's a bit like walking up to Boeing and Airbus with some sketches of a new plane and asking them to build it. We can wish you luck, but we won't be placing our bets on it.
So the way the scenario more often than not ends up getting played out is with the new arrival — met with that kind of “We can wish you luck, but we won't be placing our bets” reaction, and becoming frustrated/angry/confused about why the rest of the group just can't see the value in his ideas that he sees in them — getting marginalized or ignored by the rest of group (as they grow impatient with the discussion and give up), or with the new arrival leaving altogether.
It is a mistake to dismiss such people outright as dilettantes or dabblers and to simply ignore them. At the same time, it may perhaps also be a mistake for groups working on standards to — in an effort to avoid offending people or making them feel unwelcome or unappreciated — to adopt completely welcoming/accepting “all ideas are created equal” discussion norms that risk encouraging continued, extended discussion of any proposal regardless of its lack of intrinsic merit or implementability.
It is perhaps far better for the group to encourage a discussion atmosphere of evaluating all ideas and proposals based on their technical merit and likelihood of being implemented. That does not mean the group shouldn't remain open to all proposals and new ideas. But it should recognize that there are proposals that some amount of initial discussion will likely reveal as clearly not meeting the group's baseline criteria with regard to technical merit and implementability — and recognize that it may not be the best use of the group’s time and energies to entertain continued discussion of such proposals indefinitely.
Being frank to people with regard to lack of viability of particular proposals in which they are personally (sometimes emotionally) invested may seem cruel — but I think it's far kinder than misleading people into investing further personal time in exploration of ideas that have little chance of actually making it into the final version of a spec, or zero chance of ever getting implemented.
The goal of working together on technical specifications is to produce standards that actually get implemented. We don't make standards for the sake of making standards, we do it with the goal of making them as implementable as possible — and to actually get them implemented as widely and interoperably as possible. Standards that don't get fully implemented are not real standards. At best, they're just wish lists. And we're not in the business of producing wish lists (or should not be, at least).
Ian Hickson, the editor of the current HTML5 draft, posted an Error handling in URIs message to the uri@w3.org mailing list outlining some issues related to browser error handling behaviour for URIs, and to IRIs and character encodings other than UTF-8 — and asking, “Is there any chance that the URI and IRI specifications might get updated to handle these issues?”.
That posting and question spawned some spirited discussion, with messages from Julian Reschke, Anne van Kesteren, Tim Bray, John Cowan, Frank Ellermann, and Martin Duerst, and provoking some comments like the following one:
That’s kind of what I said already, and why I guess that HTML5 will never fly: It tries to reinvent the Web, if not the Internet.
…and from Ian to the above, the following response:
Actually we’re trying to not reinvent the Web, but to document it, so that browser vendors can write browsers that handle existing Web content in a fashion compatible with legacy UAs without reverse-engineering each other.
(It’s true that this is requiring defining things that are at odds with existing specifications, but that’s mostly because those specifications aren’t in fact in line with real usage…)
A few months ago, I was explaining how you can participate to W3C work in a different way: writing tutorials, writing quick tips. I found out last week a new and original way to participate to W3C work.
Marcos Cáceres is an invited expert on the Web Application Formats Working Group and he is the editor of a few W3C specifications.
So far, there is nothing really surprising. But I noticed in his bio the following:
I’m currently doing a PhD full-time and also work as a developer for the Creative Industries’ Computing Services, Queensland University of Technology (QUT), Brisbane, Australia. […] My main research interests are in widgets, web widgets, and mobile widgets.
Marcos is doing his presentation of PhD Thesis by published paper, i.e. the W3C specifications, he is editing. We can can read in his confirmation document (pdf) for his PhD Thesis:
In addition, this confirmation document describes what methods will be employed to conduct the research, what publications will be produced, and how that knowledge will be disseminated within the two year timeframe remaining to complete this PhD. This confirmation document attempts to meet the requirements of Confirmation of Candidature as described in Section 12 of QUT’s Manual of Policies and Procedures (MOPP) (QUT, 2006). The final form of this PhD will be a Presentation of PhD Thesis by Published Papers as specified in Section 14 of the MOPP.
Kudos to Queensland University of Technology for being supportive in this original way to contribute and actively participate to W3C work. I wish him success.
At the news of the official release of Firefox 3 (FF3), I asked David Baron, Mozilla's Advisory Committee Representative at W3C (see photo), a few questions about the browser release and support for standards.
Note: I anticipate interviewing (lots of) other W3C Members about their involvement in W3C work and support for standards in products. Next week: Opera, on its recent browser release.
Q. So is the rumor true that Firefox 3 implements every W3C Recommendation perfectly?
A. No.
Q. Rats! Well, let's continue anyway. Your list of favorite changes mentions some changes related to CSS. What are some that you think authors will like in particular? Are there some noteworthy changes that will make cross-browser authoring easier? Can say a word about Mozilla's priorities in CSS support for the next year or so (and how they align with those of the CSS Working Group)?
A. Some of the ones I think authors will be most interested in are inline-block and inline-table, font-size-adjust, rgba() and hsla() colors, new values for width, min-width, and max-width, and white-space: pre-wrap. These are the ones I mentioned in that post.
One of the top things that will ease cross-browser authoring is inline-block. But a larger part of the work in easing cross-browser authoring is really the large numbers of small bug fixes that have gone into this release.
As far as priorities for CSS support go, we want to continue improving conformance to CSS 2.1. Fixing bugs in the details makes it more likely authors will find the same behavior in different browsers in real-world use of the specifications.
We're also looking at adding a bunch of additional features over the next year. It's hard to know which features will end up in which release because we don't really know how hard they are or how long they take to implement until we try. But some of the things we're looking at or working on are downloadable fonts (both OpenType and SVG) through @font-face, allowing some CSS properties from SVG (like clip-path, mask, and filter) to be used with other languages, new graphical properties like text shadows, border images, and box shadows, implementing CSS media queries, the remaining selectors in css3-selectors, some of the new functions in css3-values like calc(), and Apple's proposal for CSS transformations, and standardizing and improving the flexible box model that we use to construct the user interface of Firefox and other Mozilla-based applications. I think these align reasonably well with the priorities of the CSS working group, which is actively working on the specifications in many of these areas.
Q. I read that there are several security-related changes (phishing, malware). Mozilla is participating in the W3C Web Security Context (WSC) Working Group, chartered to address this sort of thing. Are FF3 updates influenced by the work of that group (or vice versa)?
A. Since this isn't in my area of expertise, I asked Johnathan Nightingale, who represents Mozilla on that group, for his answer, and he wrote:
Yes, there is definitely influence in both directions. The WSC is an interesting group because it's chartered with tackling things like UI guidelines that have not traditionally been the W3C's focus; that creates a really interesting tension between people who want to document and standardize best practice, and people who want to change the world. We try to pull the group towards the middle, towards a document that can set a good baseline and a high bar for future implementors. The security of users on the web is very important to us, and using this group to make user interfaces more intelligent and more human can help keep people safe, regardless of which browser they choose.
Q. Does FF3 ship with XForms in the default configuration? If not, is there a reason we should be aware of?
A. No. It's a complex specification that depends on a number of other complex specifications. We haven't seen enough demand for it to balance the cost of writing, testing, offering for download, and permanently supporting the code needed to implement all of these.
Q. Can you comment on the state of SVG implementation in FF3?
A. There are a bunch of new SVG features in Firefox 3: see SVG improvements in Firefox 3 for details. One of the highlights is that we now support putting HTML inside svg:foreignObject.
Q. Mozilla has been very actively involved in the work on the W3C Access Control for Cross-Site Requests draft specification, which provides a way securely make cross-site requests over the Web -- for example, cross-site XHR/Ajax requests. Are you still supporting that work? And if so, can you say a bit about why it's important?
A. Definitely. This spec gives sites a way to opt-out of cross-domain access restrictions for public data or other data that they wish to share with other sites. This is a big step forward in allowing Web sites ("mashups") that mix data from different sources to be built easily and securely.
Q. You are a developer. There is a new feature in a draft specification, when do you start to implement it? How do you proceed? What are the steps of modifying the source code of the browser? I've seen Tantek Çelik modify source code at the dinner table, for example.
A. Well, we're always thinking about what we can do to improve the Web. If there's already a solid specification for what we need, that makes our work easier, but if there isn't we can contribute to writing one. If there are already other implementations, that means there's a shorter path to our implementation being usable by Web authors. So implementation doesn't always start from seeing a feature in a draft specification.
When I start implementing something, my first step is to understand how the feature works and how it interacts with all the other features we implement. This leads naturally either to writing tests, or to designing and writing code. Both need to get done before the feature is complete. For more complex features, there's often more planning and coordination required, since when more work is involved, the time spent planning can save more time later from not having to redo things that were done wrong.
When you see Tantek writing code at the dinner table after a working group meeting, it probably means he already understands the feature well from working group discussion earlier in the day, and probably did at least some of the design in his head during the discussions. It's likely that he's just doing the typing (and the debugging) for a design that he already has in his head.
Q. Some W3C groups receive a lot of comments on specifications, and the more stakeholders the more comments. What changes have you observed at Mozilla with the growth of Firefox?
A. I think open source projects are quite different from the W3C. One of the guiding principles of open source software is that anybody has the source, and thus the ability to take that source and build a better product using it. This means we try to choose to use or ignore input depending on whether we think using it is the most efficient way to improve the software we make. (This applies in the long term, too: encouraging and taking input from new contributors can make us better in the long-term even if it costs us time or even bugs in the short-term.)
This is very different from the W3C, where groups have an obligation to respond to comments, whether or not they're going to improve the specification, or even whether or not they're intended to improve the specification.
So with the growth of Firefox, we do have a lot more people involved than we used to, which means more progress is happening at once. But there's still a relatively small group of people at the center who make the core architectural and planning decisions. I think in some cases these people do more filtering than they used to.
Q. Based on FF3 experience, are there any particular issues that you would like W3C to address as a priority?
A. It's hard to point out one or two particular issues, though one that comes to mind as particularly important is the lack of a widely-acceptable royalty-free video codec that can be used for interoperable video on the Web.
More generally, I'm glad to see W3C actively working to improve and complement the core Web specifications, like HTML, CSS and the DOM, that we already implement and that form a foundation used by most Web pages. I like seeing solid specifications and test suites that improve the Web as a platform as effectively as possible.
Q. The W3C community is currently discussing starting work on "Geolocation" at W3C, with a goal of creating an API that will expose device location-sensing capabilities (for example, GPS data) to Web applications. If work starts, would Mozilla get involved in that work? If so, what do you see as being important about it?
A. A number of people at Mozilla have already been participating in the discussion, so I think we're already involved.
One of the great things that having the Web available on mobile devices can provide is the ability to quickly get information about where you are. But entering location information, potentially without a keyboard, can be slow (and inaccurate, especially if the user is lost and looking for maps). A user who can click a button or two to send location data to a Web site has a faster and easier path to finding local maps, nearby restaurants, train schedules from the nearest station, or other location-specific information. And there's no reason this faster path wouldn't be useful for laptop or desktop users too.
Q. I sometimes read and hear people mentioning "mozilla-central". What is "mozilla-central" and what changes has it brought to the work on Firefox and Mozilla?
A. We've switched version control systems, from CVS to mercurial, which is a distributed version control system. Distributed version control systems have a lot of advantages over CVS, such as much better ability to work offline and better mechanisms for collaboration. mozilla-central is just the name of the mercurial repository in which we integrate changes for future releases of Mozilla. Work is pushed to the mozilla-central repository when it's thought to be close to the quality level it needs to be to be in a shipping release. (Sometimes, of course, it turns out that something isn't as ready as its author thought.) We generate our main nightly builds from the code in mozilla-central.
Many thanks to David for his answers.
Three documents have been published for HTML 5 by the HTML Working Group.
In addition of these 3 documents, the HTML Working Group has also published a W3C Note on May 30, 2008 about Offline Web Applications. The abstract is quite clear:
HTML 5 contains several features that address the challenge of building Web applications that work while offline. This document highlights these features (SQL, offline application caching APIs as well as online/offline events, status, and the localStorage API) from HTML 5 and provides brief tutorials on how these features might be used to create Web applications that work offline.
If you had any particular questions about these documents, just leave a comment here. If you want to comment on the technologies, send a comment to the appropriate mailing-list public-html-comments@w3.org.
On Google's blog, Mark Davis is explaining that Google is moving to Unicode 5.1. The article unfortunately mixes unicode and utf-8 as it has been noticed by David Goodger in Unicode misinformation. But the really interesting bit is the growth of utf-8 on the Web. These data should be interesting for the development of http, html 5 and validators.
© graph from Google.
I have noticed a discussion (I have cut some parts for readability) about vertical layout for text from the participants of the HTML WG.
<Hixie> ok for canvas text my proposal is:
<Hixie> drawHString(x, y, maxWidth, textAlign, s); and drawHString(x, y, maxHeight, textAlign, s);
<Hixie> drawVString(...) for the second one
<Lachy> what's the difference between them? drawVString for vertical stings where the letters are stacked on top of each other, and not just rotated 90 deg?
<Philip`> Hixie: They look complex and hard to use :-p
<Philip`> compared to e.g. translate(x,y);drawString(s)
<Hixie> lachy: drawVString() would be for vertical text (e.g. some CJK)
<Hixie> one is lack of support for vertical text :-)
In printed media, it is handled quite well for a long time. Japanese books have some complex layouts mixing western and japanese characters.

It happens not only in CJK (Chinese Japanese Korean) texts. Think about a neon sign of an hotel with the letters written vertically.
Felix Sasaki is my colleague at W3C/Keio and has worked with the Japanese Layout Task Force. He was sitting next to me when I was reading the logs of the discussion, so I just asked him some references. He sent me a link to 1.3 Directional Factors in Japanese Text Layout from the Requirements of Japanese Text Layout. He also reminded me about XSL 1.1: 7.29.3 "glyph-orientation-vertical" .
Wikipedia has a page on the topic of Horizontal and vertical writing in East Asian scripts and Unicode a note on Robust Vertical Text Layout.
All of that should help to define the API for Canvas Text.
Ian Hickson, one of the two editors of HTML 5 specification has sent this message this morning on HTML WG mailing-list.
Summary:
<font>is gone,style=""is made global.
What does it mean? The font element is part of the list of active formatting elements . The browsers (user agent) have to support the content which is available online following the guideline "Do not break the Web" but the font element has disappeared from the content model.
Basically, there is no way to use a font element to write a conforming HTML 5 document. You, or the authoring tool, will have to use the style attribute.