Trim un-HTML-friendly characters from song names...

GO TO ADMIN PANEL > ADD-ONS AND INSTALL VERTIFORO SIDEBAR TO SEE FORUMS AND SIDEBAR

wormyrocks

New member
Joined
Mar 8, 2009
Messages
20
Points
0
Hey folks,
For quite a while I've had a script that will take my iTunes library and print the names of the tracks into an HTML file; that way, when anyone asks about my music I can just point them to an automatically generated webpage.
However, when I have a song with special characters in the title or the artist, they are not converted into HTML-friendly ASCII codes by the applescript and so they show up with weird characters inserted:

Any way I can have the Applescript fix this and convert these characters so they show up on the website? If not, is there any way I can do it manually without having to hunt down every instance of a strange character?
Thanks!
 

S2_Mac

New member
Joined
Oct 24, 2006
Messages
4,878
Points
0
Location
About 3 feet in front of the monitor
However, when I have a song with special characters in the title or the artist, they are not converted into HTML-friendly ASCII codes by the applescript

It's been a while since Applescript generated ASCII chars*...unless you're running OSX10.4 (or older), your AS generates unicode by default. (*Small lie; prior to 10.5, Aplescript's default was MacRoman, not ASCII.) If the script is one of my "Library Lister" scripts, there's a config var in the script you can set to match up the html charset header with the encoding of the generated HTML. Dunno about other scripts; link to the one you're using or provide some code samples....

If your web pages are using 8859-1 or Mac-Roman encodings, this would be a great time to update to UTF-8. The 7-bit "plain ASCII" characters will render just the same as always, and exsting "high ASCII" chars (such as the Option-e-e from your Chopin example) can easily be converted in a modern text processor such as TextWrangler (free) or BBEdit ($$, but worth every penny).

Been a long time since I messed with this stuff....can't recall what encoding is the current AS default (Unicode-16 or -8), and the Mac in front of me is too old (10.5.8) to do competent testing (plus, I'm lazy ;-). But the script you're using should be easy to modify -- where the "html" output gets written to a file, change the code to something like this:
   write html_output_str to html_file as «class utf8»
and all should be well (might also have to modify whatever code or template provides the charset declaration, to make sure it matches the encoding).

If you're already using, say, TextWrangler, use it to open one of the problem pages; the text encoding will be displayed in the status bar at the bottom of the window. Might help you get a handle on what's going on.
 

wormyrocks

New member
Joined
Mar 8, 2009
Messages
20
Points
0
Okay, I really know very little about character sets. When I said "ASCII codes" I just meant that it was generating webpage-unfriendly characters. Sorry for the ambiguity, I thought I knew more than I did there. :p

So, my question is: which encoding will be the most webpage-friendly, and how do I set it? I tried appending '«class utf8»' into the write statement, which generated a page but still didn't properly display special characters. I also tried '«class utf16»', which glitched the compiler, and I also tried 'as unicode text', which ran but displayed an awful webpage that didn't render properly.
If you remember, you actually helped me write this script in the first place! Thanks a million, it's been invaluable... See if you recognize any of your code. ;)
Code:
set filepath to (path to home folder as string) & "Dropbox:Public:radiosite:playlist.html" as string
--^^Path to 'Advanced' playlist.
tell application "iTunes"
	set use_this_playlist to some playlist whose special kind is Music
	-- we'll use the ready-to-go "time" property instead of making calculations around "duration"
	set avglength to (duration of use_this_playlist) / (count of tracks of use_this_playlist)
	set avglengthstr to ((avglength / 60 as integer) as string) & ":" & (avglength mod 60 as integer)
	set out_list to {"<html><head><script type='text/javascript'>
function getURL(name,val) {
if (val==1){
newstr='http://www.amazon.com/s/ref=nb_ss_dmusic?url=search-alias%3Ddigital-music&field-keywords='+name.replace(/ /g,'+')+'&x=0&y=0';
win=window.open(newstr,'mywindow');
}
else{
newstr='http://www.lyrics007.com/cgi-bin/s.cgi?q='+name.replace(/ /g,'+')+'&submit=go';
win=window.open(newstr,'mywindow');
}
}
</script>
<link rel='stylesheet' type='text/css'
href='basestyle.css'>
</head>
<body bgcolor='#666666'>
<h1>Music</h1>" & "Number of tracks: " & (count of tracks of use_this_playlist) & "<br/ >" & return & "Last Updated: " & (do shell script "date +'%m/%d/%y'") & "<br />" & return & "Total time: " & time of use_this_playlist & "<br />" & return & "Average song length: " & avglengthstr & "<br />" & return & "<br />" & return & "<table><tr><td><b>Name</b></td><td><b>Artist</b></td><td><b>Time</b></td><td><b>Album</b></td></b><td><b>Genre</b></td><td><b>Search on Amazon</b></td></tr>" & return}
	set out_list_2 to {"<html><head>
<link rel='stylesheet' type='text/css'
href='basestyle.css'>
</head>
<body bgcolor='#666666'>
<h1>Music</h1>" & "Number of tracks: " & (count of tracks of use_this_playlist) & "<br/ >" & return & "Last Updated: " & (do shell script "date +'%m/%d/%y'") & "<br />" & return & "Total time: " & time of use_this_playlist & "<br />" & return & "Average song length: " & avglengthstr & "<br />" & return & "<br />" & return}
	
	repeat with i from 1 to count of tracks of use_this_playlist
		tell track i of use_this_playlist to set {art, nom, dur, alb, gen} to {artist, name, time, album, genre}
		set end of out_list to ("<tr><td>" & nom & "</td><td>" & art & "</td><td>" & dur & "</td><td>" & alb & "</td><td>" & gen & "</td>" & "<td><img src='amazon.gif' onClick=\"getURL('" & nom & " " & art & "',1)\" width='20' /> <img src='lyrics.gif' onClick=\"getURL('" & nom & " " & art & "',2)\" width='20' /></td></tr>" & return)
	end repeat
	set end of out_list to "</table><br/><p>Credit to <a href='http://old.casualcollective.com/#profiles/WormyRocks'>WormyRocks</a> and <a href='http://forums.ilounge.com/member.php?u=106653'>S2_Mac</a> for the format script.</body></html>"
	
	set some_file to filepath
	try
		set file_handle to open for access some_file with write permission
	on error
		display dialog "Couldn’t open the file “" & some_file & "”!" buttons {"OK"} default button 1
		return 0
	end try
	try
		set eof of file_handle to 0
		write (out_list as string) to file_handle as «class utf8»
		close access file_handle
	on error
		close access file_handle
	end try
end tell
 

S2_Mac

New member
Joined
Oct 24, 2006
Messages
4,878
Points
0
Location
About 3 feet in front of the monitor
If you remember, you actually helped me write this script in the first place!

Super; it's my fault ;-) Sorry for not catching this in the first place.

Text encodings are a world of fun (not!). The current best-to-use is UTF-8, IMO -- every browser can deal with it, pretty much every text processor (even TextEdit is "unicode-ware") can use it, AppleScript uses Unicode...it's good sauce, and easy to cook. Unicode (and its subset, UTF-8) use as many as 4 bytes to store the value of a glyph. The encoding you're used to using (Mac-roman) uses only one byte, and is thus limited to only 255 different glyphs -- not nearly enough in these days of internationalism. By setting your web page to use UTF-8 encoding, track info in just about any language using just about any goofy character are going to display just fine.

Your fix for the file-writing line -- write (out_list as string) to file_handle as «class utf8» -- is spot-on; you're now producing a file encoded with UTF-8. The rest of the fix is to tell web browsers to render the page using UTF-8, by supplying an appropriate charset declaration in the web page's Head section. To do that, make the script look like this:
Code:
}
}
</script>
[color=red]<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />[/color]
<link rel='stylesheet' type='text/css'
and you should be fine.

(I ran two tests of this fix: track names that contained e-with-acute-accent (é), a 2-byte UTF-8 code (utf8 C3A9); and track names containing heavy-rightwards-arrow (➙), a three-byte UTF-8 code (utf8 E29E99) -- and both displayed fine in Safari and Firefox.)
 

wormyrocks

New member
Joined
Mar 8, 2009
Messages
20
Points
0
Hey, thanks sooooo much for the explanation... but this still isn't working for me. :(
It still displays incorrectly in Safari and Chrome (although it looks different now) and I checked the source of the HTML file to ensure that the <meta> tag was added. It appears exactly as such:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

Screen shot 2011-10-24 at 22:58:55.png
 
Last edited:

S2_Mac

New member
Joined
Oct 24, 2006
Messages
4,878
Points
0
Location
About 3 feet in front of the monitor
Your screnshot indicates two things:
1) the page source is indeed UTF-8
2) the browser that rendered the page was reading it as ISO 8859-1 (aka "Western" or "ISO Latin 1")

So, the good news is that the web server isn't doing something silly behind the scenes like converting everything to ASCII or Windows Latin 1. However, the server may be sending a header that tells browsers to treat the page as ISO 8859-1. This can be trumped by info in the page, if needed. However, the most likely cause for the funky rendering is that your browsers aren't set up to use UTF-8...

I don't have access to a Chrome browser (and I'm surrounded by production machines, so not going to install it), and thus can't comment on its text encoding settings. In Firefox, load the page, then open the View menu->Character Encoding submenu and choose the "Unicode (UTF-8)" item. Refresh the page and all should be well; if not, perhaps the server is pulling a stunt with headers.

In Safari, load the page and then open View menu->Text Encoding and choose "Unicode (UTF-8)". Refresh the page and all should be well; if not, there's definitely something going on with the server. (If you're simply dropping these files onto a browser window right on your Mac and they still aren't rendering correctly -- after verifying the encoding setting - then there's something wrong with the Mac.)

If both Firefox and Safari aren't properly rendering the page with their text encodings set specifically to UTF-8, the page will need further modification. Sample code and explanation are at the end of this post.

Setting browser encoding defaults
These days, it's hard to go wrong choosing UTF-8 as a browser's default text encoding. Folks using accented characters in their pages (like you) can write to UTF-8 and every platform will render the content properly (no need to serve up separate pages for Windows Latin 1 and Mac Latin 1, for instance). For older pages, UTF-8 shares the same initial 127 character codes as ASCII, so no penalties there. Just about any text editor these days can deal with Unicode/UTF-8 (even WordPad).

And, of course, non-Latin languages/alphabets benefit hugely from Unicode; no more need to engage a special-purpose charset (ISO 2022-KR, or Korean-Windows, or Korean-Mac) when UTF-8 can do (mostly) all of their jobs.

With Firefox, using UTF-8 is just about painless. Open View menu->Character Encoding->Auto-Detect; at the bottom of the sub-list should be an item named "Universal" -- check that item and kick back...FF does a great job of auto-detecting.

I use an ancient Safari (from the v4 era) which doesn't have auto-detect; maybe that's changed in newer versions. Still, setting Prefs->Appearance->Default Encoding: to "Unicode (UTF-8)" is p'bly the most versatile option you can choose.

Chrome is an unknown to me....

New Code
If your generated page still doesn't render properly after manually specifying the UTF-8 encoding scheme from the browsers' View menus, we've got to get more standards-correct with the page structure. (Unless the screenshot you posted was from a page that you simply dropped onto a browser window from right on your Mac -- if you can't get a page loaded that way to render correctly, the solution lies in that browser's encoding settings.)

Copy'n'save this code into an empty script window; run it as-is and see if the results are any different. (I cleaned up some cruft, and rearranged a couple of things to meet W3C standards. Also added a page title that features a Unicode char; if the page title doesn't render correctly, something's up with your Mac. Oh yeah, also changed the Date format too, just for fun.)

Rather than spend time waiting for the whole Music playlist to get processed, this code builds a page from whatever playlist is currently being displayed (so's you can pick a short one). Also, for convenience, the output file is saved to the desktop. Generate a page, set Safari to use UTF-8, and drop the file onto a Safari window -- should render OK. Also set Firefox to use "Universal" as the Auto-Detect encoding, and drop it onto a Firefox window -- should render OK. If not, we'll bring in the big guns...

Here's the code:
Code:
--set filepath to (path to home folder as string) & "Dropbox:Public:radiosite:playlist.html" as string -- ## 1
--^^Path to 'Advanced' playlist.
set filepath to (path to desktop folder as string) & "playlist.html" as string -- ## 1
tell application "iTunes"
	--set use_this_playlist to some playlist whose special kind is Music -- ## 2
	set use_this_playlist to view of front browser window -- ## 2
	-- we'll use the ready-to-go "time" property instead of making calculations around "duration"
	set avglength to (duration of use_this_playlist) / (count of tracks of use_this_playlist)
	set avglengthstr to ((avglength / 60 as integer) as string) & ":" & (avglength mod 60 as integer)
	set out_list to {"<html>
<head>
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />
<title>WormyRocks ♫ Library</title>
<script type='text/javascript'>
function getURL(name,val) {
if (val==1){
newstr='http://www.amazon.com/s/ref=nb_ss_dmusic?url=search-alias%3Ddigital-music&field-keywords='+name.replace(/ /g,'+')+'&x=0&y=0';
win=window.open(newstr,'mywindow');
}
else{
newstr='http://www.lyrics007.com/cgi-bin/s.cgi?q='+name.replace(/ /g,'+')+'&submit=go';
win=window.open(newstr,'mywindow');
}
}
</script>
<link rel='stylesheet' type='text/css' href='basestyle.css'>
</head>
<body bgcolor='#666666'>
<h1>Music</h1>" & "Number of tracks: " & (count of tracks of use_this_playlist) & "<br/ >" & return & "Last Updated: " & (do shell script "date +'%B %d, %Y'") & "<br />" & return & "Total time: " & time of use_this_playlist & "<br />" & return & "Average song length: " & avglengthstr & "<br />" & return & "<br />" & return & "<table><tr><td><b>Name</b></td><td><b>Artist</b></td><td><b>Time</b></td><td><b>Album</b></td></b><td><b>Genre</b></td><td><b>Search on Amazon</b></td></tr>" & return}
	
	repeat with i from 1 to count of tracks of use_this_playlist
		tell track i of use_this_playlist to set {art, nom, dur, alb, gen} to {artist, name, time, album, genre}
		set end of out_list to ("<tr><td>" & nom & "</td><td>" & art & "</td><td>" & dur & "</td><td>" & alb & "</td><td>" & gen & "</td>" & "<td><img src='amazon.gif' onClick=\"getURL('" & nom & " " & art & "',1)\" width='20' /> <img src='lyrics.gif' onClick=\"getURL('" & nom & " " & art & "',2)\" width='20' /></td></tr>" & return)
	end repeat
	set end of out_list to "</table><br/><p>Credit to <a href='http://old.casualcollective.com/#profiles/WormyRocks'>WormyRocks</a> and <a href='http://forums.ilounge.com/member.php?u=106653'>S2_Mac</a> for the format script.</body></html>"
	
	set some_file to filepath
	try
		set file_handle to open for access some_file with write permission
	on error
		display dialog "Couldn’t open the file “" & some_file & "”!" buttons {"OK"} default button 1
		return 0
	end try
	try
		set eof of file_handle to 0
		write (out_list as string) to file_handle as «class utf8»
		close access file_handle
	on error
		close access file_handle
	end try
end tell
Still doesn't render properly?
If pages generated by that code aren't making it, replace this line:
set out_list to {"<html>

with these 5 lines:
set out_list to {"<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<!DOCTYPE html
PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"
\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">


Even a Safari set to "Western (ISO Latin 1)" should render the page OK....
 
Top