Archive for the “Miscellaneous” Category
In this modern day, HTML entities can reference arbitrary unicode codepoints. For example,
☃ is the entity for ☃. Not surprisingly, WebKit appears uses UTF-16 internally to represent unicode strings, or at the very least when interpreting HTML entities. One of the big benefits of using UTF-16 is every character is represented by 2 bytes (the 16 in UTF-16 means 16 bits). Contrast this with UTF-8, where a single character can be represented by anywhere from 1 to 4 bytes, or UTF-32 where every character requires 4 bytes (i.e. twice as much as UTF-16). Clearly UTF-16 seems to be useful, as it’s not too much larger than UTF-8 for ASCII strings (only double the size), and you can jump to any character with a simple index into the string. However, one of the often-ignored aspects of UTF-16 is the surrogate pair. Unicode contains more than 0xFFFF bytes, and yet a single UTF-16 “character” (or unichar) can only reference up to U+FFFF. The solution to this is to take the codepoints in Unicode planes 1-16 (
U+10FFFF) and represent them as 2 unichars. This is a surrogate pair. You can find more information on this in the wikipedia entry for UTF-16, but to put it simply, a surrogate pair uses a range of codepoints that don’t represent real characters (
U+DFFF) and uses them in combination to represent all the characters in the other planes.
The reason this is interesting is because it exposes an interesting quirk as to how WebKit interprets HTML entities. WebKit properly converts entities that represent characters outside of plane 0 into a surrogate pair, such as
𝍧 (𝍧). This gets converted into
0xD834DF67. The quirk is if you give it the surrogate pair codepoints directly, it doesn’t realize they’re not real characters individually and passes them through unscathed, so that same character can be written as
�� (). Now this doesn’t seem particularly harmful, except if you only write the first of these entities, WebKit will then get very confused. It will end up throwing away the entire rest of the line of rendered text. Interestingly, it starts displaying text again after a line break, even if it’s just an implicit line break.
The ideal behavior here is WebKit should just silently ignore any entities which reference a codepoint that’s part of a surrogate pair. The fact that it doesn’t really doesn’t hurt anything, but I thought it was worth documenting.
Update: A question was raised on twitter about how surrogate pairs affect indexing into a UTF-16 string. I didn’t know the answer, and strangely, I couldn’t find information on how to handle it with google either, so I tested empirically.
NSString uses UTF-16 internally, so it was a great way to test. And what I found was that each half of a surrogate pair is counted as a separate character. The
-length of the
NSString is increased by 2 when you add a surrogate pair, and
-substringFromIndex: will happily split up the surrogate pair for you. Of course, if you do split a surrogate pair, then attempting to convert the
NSString into another encoding, even with the simple
-UTF8String, will return NULL as such a conversion is illegal (when you generate a unicode stream it has to be well-formed, and so you cannot generate a stream with half of a surrogate pair - and half of a surrogate pair in UTF-16 will be converted into a single invalid 3-byte UTF-8 sequence).
Comments Off on WebKit and handling of surrogate pairs in HTML entities
I picked up an iPhone on friday and I’ll try and remember to write up my thoughts later, but for now I have a quick tip.
For those of you who are like me and listen to full albums and lament the lack of a Compilations preference for the iPhone, learn to love the CoverFlow view. Since this view is album-centric it handles compilations properly. The only real flaw with this view is it’s ugly if you don’t have album art.
Comments Off on Coverflow in the iPhone
Am I alone in thinking that Safari should be able to show web pages for upcoming iCal events in its bookmarks pane, just like it does for Address Book and Bonjour? The way it works right now it’s quite annoying to open up an event web page because I have to launch iCal first.
Comments Off on Safari bookmarks and iCal
I’m not going to write a long entry about it, but I am the proud owner of a
brand new Wii. Suffice to say, it’s very fun, especially the Tennis
game in Wii Sports.
I ended up camping out at Best Buy starting at 6PM on saturday. If I wanted
I probably could have shown up around 4 AM or so and still gotten a unit, but
I figured if I was going to camp, I’d do it right. I hung out with a group of
guys from WPI (guys I hadn’t met before), and it was a lot of fun. Unfortunately,
it was also freezing cold. From about 2AM to 5AM everybody was huddled under blankets
trying to stay warm. Some people actually managed to sleep, but I didn’t. Still,
I think it was worth it.
3 Comments »
Note: Mac OS X 10.4.9 seems to fix the bug described here.
pmTool, the process run by Activity Monitor to actually collect stats, appears to leak memory. If I leave Activity Monitor running for a good period of time, when I check up on it
pmTool is often using over 100MiB of Real Memory.
I just checked my laptop,
pmTool was using over 100MiB of Real Memory. Right now on my desktop it’s using 41MiB of Real Memory, but I don’t remember how long it’s been running for. I also believe a good deal of memory is currently paged out.
After checking up on it,
pmTool on my desktop has a Private Memory size of 91MiB.
Read the rest of this entry »
30 Comments »
Recently I purchased Ninja Gaiden Black, on the advice of a friend. I had heard of this game before, described as ridiculously hard, but also supposed to be pretty fun. Well, what I had heard was wrong. It’s not ridiculously hard, it’s impossibly hard (and the fact that it’s Ninja Gaiden Black and not plain old Ninja Gaiden doesn’t help). And it’s not pretty fun, it’s ridiculously fun. I played it for about an hour after I bought it, but when the boss of the second level (the first being training) kept whipping my butt with incredible ease I put it down. Two days ago my friend started playing it, and yesterday he finally managed to beat the boss (after spending a lot of time trying). So, knowing it’s actually possible, I picked it back up again today and not only beat the boss, but spent probably about 4 or 5 more hours playing it and beating the next 3 levels. I only put it down because my roommate wanted to watch TV.
Not only is that game a lot of fun, it’s also a really visually appealing game (surprisingly beautiful for an Xbox game - I can only wonder at how it would look if it took advantage of the Xbox 360). All the moves look great, all the attacks have a lot of style and flair, and Ryu (the main character) is an incredible badass.
Usually when games are really hard and I can’t seem to progress past a certain point, I get discouraged. But ever since my success with that first boss, the seeming impasses in this game only drive me to try harder. There are points in this game where you have to do the same segment of the level over and over, and over, and over, ad infinitum. Again, such repetition would normally discourage me, but instead I use it as the chance to practice and hone my fighting skills until I can breeze past the previously-impossible segments with nary a scratch.
If you have never tried Ninja Gaiden, and you have a chance to (say, you or a friend own an Xbox), you owe it to yourself to purchase Ninja Gaiden Black and give it a try.
Comments Off on Late to the Ninja Gaiden party
eddienull on #macdev posted a great link to a video of George Bush singing Sunday Bloody Sunday. I thought it was pretty great.
Comments Off on Sunday Bloody Sunday by George Bush
The DS Lite was released today, and I picked up my pre-order. I have yet to actually play with it yet (had to wait for it to charge), but it’s finally ready. I took some pictures of it first, though.
Comments Off on DS Lite arrives
This morning, Tom Cruise flew over (in his private WWII bomber plane) and came on to Yahoo! Campus to give a talk as part of Yahoo!’s Inspirational Speakers program. And let me tell you, it was pretty darn cool. Turns out Tom is a really nice guy, and pretty funny too. Tom and Terry (CEO of Yahoo) talked about Tom’s movie career, and had a few amusing anecdotes about Tom and Terry. Quite enjoyable.
After the event I managed to get a few pictures of Tom with my cameraphone. Click on the picture below to view the set.
Comments Off on Tom Cruise at Yahoo!
Tristan and I went to see V for Vendetta with a couple friends tonight. And I was completely blown away. I was told the movie was great, but it exceeded all of my expectations. I went into it knowing absolutely nothing about the plot, and I liked it that way so I won’t say anything here about it. To put it simply, you should watch this movie.
Comments Off on V for Very Good