Dr Nic

Another use for const_missing – generating unicode characters in strings

I just spotted something very sexy on the RubyForge News-vine (subscribe to all RubyForge news and new releases): a new use for Ruby’s const_missing method – charesc.

If the unicode character for ü is #00FC, then you can insert that into a string using the global constant U00FC. Where is U00FC defined? Its not. You ask for it and a unicode string is returned. Oohlala.

For the demo we’ll use a Rails app as it has built-in unicode support to make the output look nicer:

Using any old irb:

$ sudo gem install charesc
$ irb
>> $KCODE = "u"  # thanks Paul Battley
>> require 'rubygems'
>> require 'charesc'
>> "charesc is made by Martin D#{U00FC}rst"
=> "charesc is made by Martin Dürst"
>> U00FC
=> "ü"

If you have any more fun uses for const_missing please let me know. Feed my fetish.

Related posts:

  1. 53 cheat sheets and growing I said previously that errtheblog’s cheat app would have 100s+...
  2. [ANN] Generating new gems for graceful goodliness I don’t like you [1]. You don’t share code....
  3. [BTS] Magic Models – Class creation [BTS] = Behind the Scenes; also a news-like TV show...

9 Responses to “Another use for const_missing – generating unicode characters in strings”

  1. Paul Battley says:

    Very clever!

    (The only thing script/console does that’s relevant here is to set $KCODE=’u', by the way.)

  2. This reminds me of something that I worked on recently for one of our clients that deals with publishing. Their editors aren’t going to want to remember all the unicode characters and we’re also using textile, so we created a library that converts things like so:

    (u`) (u~) for about eighty characters. To take it a step further, as they start to type in their edit panels, it provides a help menu of available options. Hoping that we’ll be able to demo some of that stuff soon. :-)

  3. Dr Nic says:

    @paul – thx; example updated. (NOTE: need to require ‘rubygems’ to access the gem now we’re not in the comfortable world of Rails console)

    @robby – that’s nice. I’m not sure if I like the surrounding parentheses, personally, because I often forget that I want a variation of u, for example, and I’d need to press the back arrow to put the bracket in, then forwards again etc. But I assume (u~) is easier to parse, than u~.

  4. Dr Nic says:

    @paul – your “death miffy” gravatar is disturbing. I don’t think I’ll be able to look at my son’s miffy doll the same way again… :)

  5. @dr nic,

    It’s based on the convention that Textile already has. For example, ™creates a trademark symbol and we wanted to keep it consistent with new characters. In a nutshell, we extended the Textile process with another 80 characters/symbols.

  6. Ah, your blog renders textile.. ;-)

    ((tm)) => ™

  7. sans the extra parens… argh. :-p

  8. Paul Battley says:

    That’s not ‘Death Miffy’ – it’s Borg Kitty!

    (You’re right about needing ‘require “rubygems”‘; I didn’t need it because I have ‘RUBYOPT=rubygems’ in my environment variables, but I expect that most people don’t.)

  9. Dr Nic says:

    @robby – ah ok I understand where you’re coming from. Now I also realise that I don’t like ( TM ) either! Perhaps TM* or TM# instead. I’ll talk to W3C about fixing that… ;)