Posts tagged as:

HTML

Sharps and flats in HTML

by John on March 16, 2009

Apparently there’s no HTML entity for the flat symbol, ♭. In my previous post, I just spelled out B-flat because I thought that was safer; it’s possible not everyone would have the fonts installed to display B♭ correctly.

So how do you display music symbols for flat, sharp, and natural in HTML? You can insert any symbol if you know its Unicode value, though you run the risk that someone viewing the page may not have the necessary fonts installed to view the symbol. Here are the Unicode values for flat, natural, and sharp.

Since the flat sign has Unicode value U+266D, you could enter ♭ into HTML to display that symbol.

The sharp sign raises an interesting question. I’m sure most web pages referring to G-sharp would use the number sign # (U+0023) rather than the sharp sign ♯ (U+266F). And why not? The number sign is conveniently located on a standard keyboard and the sharp sign isn’t. It would be nice if people used sharp symbols rather than number signs. It would make it easier to search on specifically musical terms. But it’s not going to happen.

Related posts:

Entering Unicode characters in Linux
Three ways to enter Unicode characters in Windows
Greek letters and math symbols in (X)HTML

{ 2 comments }

Google Reader and HTML lists

by John on February 5, 2009

Yesterday I wrote a post about how to start numbering a list in HTML at some point other than 1. Mark Reid and Thomas Guest pointed out that my example did not show up correctly in Google Reader.

Here’s how the list shows up when I browse directly to the post using Firefox 3 on Windows XP.

browser screen shot

But here’s what the same list looks like when I look at the post inside Google Reader.

Google Reader screen shot

I don’t understand why the difference. In fact, I don’t understand in general why posts often look different in Google Reader. For example, the screenshots above are centered when you visit the blog directly, but are left-aligned in Google Reader. Also, the space between the images and the text is removed.

Do other RSS readers similarly mangle HTML? Any suggestions how to fix the problem?

Update: Changed the way images are centered per Thomas Guest’s suggestion.

{ 4 comments }

Starting number for HTML lists

by John on February 4, 2009

I recently found out how to make an HTML list start numbering somewhere other than at 1. This is handy when you have a list interrupted by some text and want to continue the original numbering without starting over. I’ve only been using HTML for 15 years. Maybe one of these days I’ll really learn it.

In the <ol> tag, add the attribute start="7", for example, to make the list start numbering with 7.  The start attribute can be any integer, even negative.

For example, the seven dwarfs are

  1. Dopey
  2. Grumpy
  3. Doc
  4. Happy
  5. Bashful
  6. Sneezy

and last but not least

  1. Sleepy.

Update: As pointed out in the comments below, the example in this post may not render correctly in your reader. See this post for a discussion of the problem.

{ 3 comments }

Faint praise for Expression Web

by John on December 16, 2008

I really like Expression Web, when it doesn’t crash. It generates standard-compliant XHTML, it’s pleasant to use, etc. But I’ve had it crash many times. It crashes when I have too many files open, or when I edit too big a file for too long.

I need to find something better, but I haven’t taken the time to evaluate other HTML editing environments. I’d like to keep using it if Microsoft would fix the crashes. They have posted an update, but the description implies it doesn’t fix the problems I’ve seen.

{ 4 comments }

Syntax coloring for code samples in HTML

by John on December 11, 2008

Syntax coloring makes it much easier to read source code, especially when you become accustomed to a particular color scheme. For example, I’m used to the default color scheme in Visual Studio: comments are green, keywords are blue, string literals are red, etc. Once you get used to color-coded source code, it’s hard to go back to black-and-white. However, the code samples here are monochrome and I’ve been thinking about doing something about that.

It’s possible to mark up code samples on the web just like any other chuck of HTML, but that would be time-consuming. Also, you want to leave code samples as plain text so readers can copy the source and paste it into their code. So what people usually do is use client-side JavaScript to change the way code samples are displayed in the browser. That way the code appears to be marked up but it’s still unadorned underneath.

I like the way code samples look on Scott Hanselman’s blog, and so I started to use the JavaScript solution that he recommends, SyntaxHighlighter. However, I decided to hold off when I read this comment on his blog.

I tried to use it. Unfortunately the scrolling errors I see on your and other sites are too much to bear.

More recently I heard that StackOverflow is using prettify.js to add syntax coloring for their code samples. I haven’t noticed any problems on that site, so I’m starting to try out prettify. I’ve heard that the script doesn’t always handle VB code correctly, but that doesn’t matter to me.

The prettify script is easy to use. You don’t have to tell it what programming language you’re highlighting. However, you do have the option of  specifying the language, which presumably can’t hurt. I’ve tried it on a page containing C++ code and on another page containing Python code and they both look OK. My only complaint is that the default color scheme is not what I would have chosen. However, the color scheme can be modified by editing a style sheet and I intend to do that.

I’m going to start by experimenting with static files on my site. I’m more cautious about incorporating syntax coloring with the blog since I don’t know what interaction problems there could be with the WordPress software on the server or with blog reader software on the clients. Scott Hanselman says on his blog

… I couldn’t find a syntax highlighting solution that worked in EVERY feed reader. There are lots of problems with online ones like Google Reader and BlogLines if I, as the publisher, try to get tricky with the CSS.

So I may leave the code samples on the blog alone. Also, I’m also not sure what I’ll do about PowerShell samples. The prettify script works well with C-family languages, but PowerShell syntax may be too far afield from what it expects.

Does anyone have suggestions in general? For PowerShell in particular?

{ 6 comments }

RDFa

by John on December 8, 2008

Phil Windley had a recent interview with Elias Torres and Ben Adida on RDFa. This is an emerging standard for adding semantic information to HTML documents. The “a” in RDFa stands for attributes. Rather than creating new documents, RDFa allows you to add RDF-like semantic information to your existing HTML pages via element attributes. A great deal of thought has gone into how to accomplish this without unwanted side effects such as changing how pages are rendered.

As I listened to the interview, I tried to think of how I would apply this to my web sites. There are standardized vocabularies for expressing friendship relationships, address and calendar information, audio file meta data, etc. However, little of this is relevant to my web sites. I’m ready to jump on the RDFa bandwagon once there are standard vocabularies more applicable to the kind of content I publish.

To learn more about RDFa, listen to Phil WIndley’s podcast or read the RDFa primer.

{ 2 comments }

HTML parsing landmines

by John on November 10, 2008

Phil Haack explains why parsing HTML isn’t as easy as it sounds in The Landmine of Parsing HTML and Stripping HTML comments.

Update: Phil Haack’s follow-up post.

{ 0 comments }

Manipulating the clipboard with PowerShell

by John on October 17, 2008

The PowerShell Community Extensions contain a couple handy cmdlets for working with the Windows clipboard: Get-Clipboard and Out-Clipboard. One way to use these cmdlets is to copy some text to the clipboard, munge it, and paste it somewhere else. This lets you avoid creating a temporary file just to run a script on it.

For example, occasionally I need to copy some C++ source code and paste it into HTML in a <pre> block. While <pre> turns off normal HTML formatting, special characters still need to be escaped: < and > need to be turned into &lt; and &gt; etc. I can copy the code from Visual Studio, run a script html.ps1 from PowerShell, and paste the code into my HTML editor. (I like to use Expression Web.)

The script html.ps1 looks like this.

$a = get-clipboard;
$a = $a -replace "&", "&amp;";
$a = $a -replace "<", "&lt;";
$a = $a -replace ">", "&gt;";
$a = $a -replace '"', "&quot;"
$a = $a -replace "'", "&#39;"
out-clipboard $a

So this C++ code

double& x = y;
char c = 'k';
string foo = "hello";
if (p < q) ...

turns into this HTML code

double&amp; x = y;
char c = &#39;k&#39;;
string foo = &quot;hello&quot;;
if (p &lt; q) ...

Of course the PSCX clipboard cmdlets are useful for more than HTML encoding. For example, I wrote a post a few months ago about using them for a similar text manipulation problem.

If you’re going to do much text manipulation, you may may want to look at these notes on regular expressions in PowerShell.

The only problem I’ve had with the PSCX clipboard cmdlets is copying formatted text. The cmdlets work as expected when copying plain text. But here’s what I got when I copied the word “snippets” from the CodeProject home page and ran Get-Clipboard:

Version:0.9
StartHTML:00000136
EndHTML:00000214
StartFragment:00000170
EndFragment:00000178
SourceURL:http://www.codeproject.com/
<html><body>
<!--StartFragment-->snippets<!--EndFragment-->
</body>
</html>

The Get-Clipboard cmdlet has a -Text option that you might think would copy content as text, but as far as I can tell the option does nothing. This may be addressed in a future release of PSCX; it has been assigned a work item.

{ 0 comments }

I played around with Google’s translator a little after adding some notranslate directives as discussed in my previous post. Google did honor my requests to mark some sections as literal text to not be translated. Google’s translator was also able to recognize my name as a name without special markup. Yahoo, on the other hand, translated my name, turning “Cook” into “Cuisinier” in French.

Google treated text inside <code> tags as literals that should not be translated. That is, Google would leave my source code snippets alone and only translate the English prose surrounding the code. Yahoo, on the other hand, would translate everything, including source code. For example, I had some PowerShell code on my page with the keyword matches that Google left alone but Yahoo translated into “allumettes,” presumably good French prose but not a legal PowerShell keyword.

One puzzling thing about the Google translation engine was that it would change which text was hyperlinked. For example, the text “My résumé” was changed to “Mon CV,” linking on the translation for “my.” Yahoo produced what I expected, “Mon résumé.” There were several other instances in which Google produced odd links, such as hyperlinking the | marker between words that were linked before. For example, the footer of my web site has these links:

Home | Sitemap | My blog | Search

Yahoo turned this into

Maison | Sitemap | Mon blog | Recherche

while Google produced

Accueil | Plan du site | Mon blog | Recherche

So Google incorporated the separator bars as part of words, and moved the last link from “Recherche” to the bar separating “blog” and “Rescherche.”

One advantage of Google’s translation is that it lets you hover your mouse over a line of translated text and see the original text.

{ 2 comments }

Giving hints to automatic translators

by John on October 14, 2008

One problem with machine translation is that machines don’t know when to stop translating. For example Yahoo’s Babel Fish translator translates my last name “Cook” literally to “Cocinero” in Spanish and “Cuisinier” in French.

Today Google announced a way to tell its translator that text should not be translated. Place such text inside a <span> tag with the attribute class="notranslate". I tried this on a web page that explained that a certain piece of code printed out “Hello world.” Since “Hello world” is literal output, it should be left untranslated, not turned into, for example, “Bonjour le monde.” The solution was to modify the HTML to say

The code above prints &ldquo;<span class="notranslate">Hello world</span>.&rdquo;

To prevent an entire page from being translated, add the following tag in the <head> section of the page.

<meta name="google" value="notranslate">

I suppose other machine translation efforts, such as those from Microsoft and Yahoo, will follow Google’s lead and support the class=notranslate directive.

{ 1 comment }

The Holy Grail of CSS

by John on August 14, 2008

Basic tasks are simple in CSS, but even slightly harder tasks can be incredibly difficult. Controlling fonts, margins, and so forth is a piece of cake. But controlling page layout is another matter. In his book Refactoring HTML, Elliotte Rusty Harold describes a technique as

so tricky that it took any smart people quite a few years of experimentation to develop the technique show here.  In fact, so many people searched for this while believing that it didn’t actually exist that this technique goes under the name “The Holy Grail.”

What is the incredibly difficult task that took so many years to discover? Teaching a web browser to play chess using only style sheets? No, three column layout. I kid you not. He goes on to say

The goal is simple: two fixed-width columns on the left and the right and a liquid center for the content in the middle.  (That something so frequently needed was so hard to invent doesn’t speak well of CSS as a language, but it s the language we have to work with.)

You can read more about the Holy Grail of CSS in an article by Matthew Levine.

I appreciate the advantages of CSS, though I do wish it didn’t have such a hockey stick learning curve. I’ve heard people say not to bother learning overly difficult technologies because if you find it too difficult, so will everyone else and it will die off. But CSS seems to be firmly established with no competitor.

{ 5 comments }

Side benefits of accessibility

by John on August 3, 2008

Elliotte Rusty Harold makes the following insightful observation in his new book Refactoring HTML:

Wheelchair ramps are far more commonly used by parents with strollers, students with bicycles, and delivery people with hand trucks than they are by people in wheelchairs. When properly done, increasing accessibility for the disabled increases accessibility for everyone.

For example, web pages accessible to the visually impaired are also more accessible to search engines and mobile devices.

{ 0 comments }

Migrating from HTML to XHTML

by John on August 3, 2008

I migrated the HTML pages on my web site to XHTML this weekend. I’ve been hesitant to do this after hearing a couple horror stories of how a slight error could have big consequences (for example, see how extra slashes caused Google to stop indexing CodeProject) but I took the plunge. Mostly this was a matter of changing <br> to <br /> etc. But I did discover a few errors such as missing </p> tags. I was surprised to find such things because I had previously validated the site as HTML 4.01 strict.

I thought that HTML entities such as &beta; for the Greek letter β were illegal in XHTML. Apparently not. Three validators (Microsoft Expression Web 2, W3C, and WDG) all seem to think they’re OK. Apparently they’re defined in XHTML though not in XML in general. I looked at the official W3C docs and didn’t see anything ruling these out.

Also, I’ve read that <i> and <b> are not allowed in strict XHTML. That’s what Elliotte Rusty Harold says in Refactoring HTML, and he certainly knows more about (X)HTML that I do. But the three validators I mentioned before all approved of these tags on pages marked as XHTML strict. I changed the <i> and <b> tags to <em> and <strong> respectively just to be safe, but I didn’t see anything in the W3C docs suggesting that the <i> and <b> tags were illegal or even deprecated. (I understand that italic and bold refer to presentation rather than content, but it seems pedantic to me to suggest that  <em> and <strong> are any different than their frowned-upon counterparts.)

{ 0 comments }

Validating a web site

by John on August 1, 2008

The W3C HTML validator is convenient for validating HTML on a single page. The site lets you specify a file to validate by entering a URL, uploading a file, or pasting the content into a form. But the site is not designed for bulk validation. There is an offline version of the validator intended for bulk use, but it’s a Perl script and difficult to install, at least on Windows. There is also a web service API, but apparently it has no WSDL file and so is inaccessible from tools requiring such a file.

The WDG HTML validator is easier to use for a whole web site. It lets you enter a URL and it will crawl the entire site report the validation results for each page. (It’s not clear how it knows what pages to check. Maybe it looks for a sitemap or follows internal links.) If you’d like more control over the list of files it validates, you can paste a list of URLs into a form. (This was the motivation for my previous post on filtering an XML sitemap to create a list of URLs.)

{ 0 comments }

I frequently need to look up how to add diacritical marks to letters in HTML, TeX, and Microsoft Word, though not quite frequently enough to commit the information to my long-term memory. So today I wrote up a set of notes on adding accents for future reference. Here’s a chart summarizing the notes.

Accent HTML TeX Word
grave grave \` CTRL + `
acute acute \' CTRL + '
circumflex circ \^ CTRL + ^
tilde tidle \~ CTRL + SHIFT + ~
umlaut uml \" CTRL + SHIFT + :
cedilla cedil \c CTRL + ,
æ, Æ æ, Æ \ae, \AE CTRL + SHIFT + & + a or A
ø, Ø ø, Ø \o, \O CTRL + / + o or O
å, Å å, Å \aa, \AA CTRL + SHIFT + @ + a or A

The notes go into more details about how accents function in each environment and what limitations each has. For example, LaTeX will let you combine any accent with any letter, but MS Word and HTML only support letter/accent combinations that are common in spoken languages.

{ 1 comment }