<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Ruby 1.9 Encodings: A Primer and the Solution for Rails</title>
	<atom:link href="http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/feed/" rel="self" type="application/rss+xml" />
	<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/</link>
	<description>Random Geek-Related Thoughts</description>
	<lastBuildDate>Sat, 20 Apr 2013 07:23:38 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Arne Brasseur</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-26058</link>
		<dc:creator>Arne Brasseur</dc:creator>
		<pubDate>Mon, 15 Apr 2013 17:03:24 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-26058</guid>
		<description><![CDATA[The reason the Japanese don&#039;t like simply converting everything to Unicode is because of a process adopted by the Unicode consortium called Han Unification. Unicode encodes abstract &quot;glyphs&quot; rather than concrete &quot;graphemes&quot;. For example, the letter &#039;a&#039; can be written as a circle with a line to the right, or it can have an extra curl at the top. These are two graphemes of the same glyph, so there is only one Unicode code point for both, the difference is in the font that renders it.

Chinese, Japanese, traditional Korean and to some extent Vietnamese all use Han characters (known as Chinese characters, Hanzi or Kanji). Often the same glyph is used in all four languages, but with small regional variants in the graphemes. So to correctly display a Japanese text encoded in Unicode, you need to use a font that&#039;s specific for Japanese, or you might end up seeing Chinese variants for instance. These variations might seem very small to a western eye, but they are considered an important part of these nations&#039; culture and history.

The Japanese especially have always strongly opposed Han unification. It is however possible to do a lossless round trip convertion from JIS to UTF-8 and back, in fact Unicode has several code points that were added specifically to make these kind of round trip conversion lossless.]]></description>
		<content:encoded><![CDATA[<p>The reason the Japanese don&#8217;t like simply converting everything to Unicode is because of a process adopted by the Unicode consortium called Han Unification. Unicode encodes abstract &#8220;glyphs&#8221; rather than concrete &#8220;graphemes&#8221;. For example, the letter &#8216;a&#8217; can be written as a circle with a line to the right, or it can have an extra curl at the top. These are two graphemes of the same glyph, so there is only one Unicode code point for both, the difference is in the font that renders it.</p>
<p>Chinese, Japanese, traditional Korean and to some extent Vietnamese all use Han characters (known as Chinese characters, Hanzi or Kanji). Often the same glyph is used in all four languages, but with small regional variants in the graphemes. So to correctly display a Japanese text encoded in Unicode, you need to use a font that&#8217;s specific for Japanese, or you might end up seeing Chinese variants for instance. These variations might seem very small to a western eye, but they are considered an important part of these nations&#8217; culture and history.</p>
<p>The Japanese especially have always strongly opposed Han unification. It is however possible to do a lossless round trip convertion from JIS to UTF-8 and back, in fact Unicode has several code points that were added specifically to make these kind of round trip conversion lossless.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Serge Bedzhik</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-25226</link>
		<dc:creator>Serge Bedzhik</dc:creator>
		<pubDate>Tue, 13 Mar 2012 07:20:21 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-25226</guid>
		<description><![CDATA[Hi there! Thanks for the great article :)

I have a little question. I’m trying to deal with ASCII-8BIT (as it detected), coming from TCP socket with Russian symbols like \xD1, \xD0 etc., but get scrambled output when force_encoding(&#039;UTF-8&#039;) on the BINARY data, exactly like you say. What I can to do, to get normal UTF-8 output when I send this symbols over telnet? I&#039;m talking about TCPServer from stdlib.]]></description>
		<content:encoded><![CDATA[<p>Hi there! Thanks for the great article :)</p>
<p>I have a little question. I’m trying to deal with ASCII-8BIT (as it detected), coming from TCP socket with Russian symbols like \xD1, \xD0 etc., but get scrambled output when force_encoding(&#8216;UTF-8&#8242;) on the BINARY data, exactly like you say. What I can to do, to get normal UTF-8 output when I send this symbols over telnet? I&#8217;m talking about TCPServer from stdlib.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonas Elfström</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-25172</link>
		<dc:creator>Jonas Elfström</dc:creator>
		<pubDate>Sun, 05 Feb 2012 22:55:51 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-25172</guid>
		<description><![CDATA[Please consider clarifying, or giving a source that explains, the complicated reasons of UTF-8 not mapping perfectly to SHIFT-JIS that you mentioned.]]></description>
		<content:encoded><![CDATA[<p>Please consider clarifying, or giving a source that explains, the complicated reasons of UTF-8 not mapping perfectly to SHIFT-JIS that you mentioned.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Rose</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-23308</link>
		<dc:creator>Mike Rose</dc:creator>
		<pubDate>Wed, 24 Aug 2011 00:42:03 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-23308</guid>
		<description><![CDATA[Just for the sanity of those visiting this blog article in hopes of dealing with MySQL Ascii8 &amp; UTF-8, I found that the Mysql2 Gem actually solves all of the issues I had when using the default Mysql gem for ruby-mysql interactions. Give that a try if you get odd UnknownConversion errors :)]]></description>
		<content:encoded><![CDATA[<p>Just for the sanity of those visiting this blog article in hopes of dealing with MySQL Ascii8 &amp; UTF-8, I found that the Mysql2 Gem actually solves all of the issues I had when using the default Mysql gem for ruby-mysql interactions. Give that a try if you get odd UnknownConversion errors :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Maher Sllam</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-23171</link>
		<dc:creator>Maher Sllam</dc:creator>
		<pubDate>Wed, 10 Aug 2011 05:02:14 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-23171</guid>
		<description><![CDATA[Thank you for this great article.

I think that ruby should add an encoding detection method. Most Arabic websites uses windows-1256. Some websites read about the benefits of UTF-8 and tried to switch to it... Can you guess what they did? They just changed the encoding in their html  tag. It worked for websites with static content, but failed for the other websites which used database content.

Some were luck to revert back before new data got it&#039;s way into the database (that&#039;s because they saw weird text all over the place when opening their websites), others were not. Consequently, now support forums are filled with help requests asking about how to correct this kind of errors.

In these situations, an encoding detection method will be AWESOME! Writing such functionality might be impossible, but I think there should be something which can help in these situations.]]></description>
		<content:encoded><![CDATA[<p>Thank you for this great article.</p>
<p>I think that ruby should add an encoding detection method. Most Arabic websites uses windows-1256. Some websites read about the benefits of UTF-8 and tried to switch to it&#8230; Can you guess what they did? They just changed the encoding in their html  tag. It worked for websites with static content, but failed for the other websites which used database content.</p>
<p>Some were luck to revert back before new data got it&#8217;s way into the database (that&#8217;s because they saw weird text all over the place when opening their websites), others were not. Consequently, now support forums are filled with help requests asking about how to correct this kind of errors.</p>
<p>In these situations, an encoding detection method will be AWESOME! Writing such functionality might be impossible, but I think there should be something which can help in these situations.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-22665</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Wed, 04 May 2011 08:43:54 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-22665</guid>
		<description><![CDATA[I&#039;d also love to see a reference for the part about SHIFT-JIS to UTF-8 being lossy, as I searched and couldn&#039;t find anything (but perhaps that&#039;s because I can&#039;t speak japanese ;)).]]></description>
		<content:encoded><![CDATA[<p>I&#8217;d also love to see a reference for the part about SHIFT-JIS to UTF-8 being lossy, as I searched and couldn&#8217;t find anything (but perhaps that&#8217;s because I can&#8217;t speak japanese ;)).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Towfiq</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-22642</link>
		<dc:creator>Mark Towfiq</dc:creator>
		<pubDate>Mon, 25 Apr 2011 18:08:05 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-22642</guid>
		<description><![CDATA[&quot;For a variety of complicated reasons, Japanese encoding, such as SHIFT-JIS, are not considered to losslessly encode into UTF-8.&quot;

What are the complicated reasons? I can&#039;t find references to them. They appear to be the root cause of a major &quot;configuration over convention&quot; choice which runs counter to core Ruby philosophy.]]></description>
		<content:encoded><![CDATA[<p>&#8220;For a variety of complicated reasons, Japanese encoding, such as SHIFT-JIS, are not considered to losslessly encode into UTF-8.&#8221;</p>
<p>What are the complicated reasons? I can&#8217;t find references to them. They appear to be the root cause of a major &#8220;configuration over convention&#8221; choice which runs counter to core Ruby philosophy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pankaj</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-22640</link>
		<dc:creator>Pankaj</dc:creator>
		<pubDate>Sat, 16 Apr 2011 10:58:44 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-22640</guid>
		<description><![CDATA[Hi Yehuda,
 I have been searching for this from last so many days, Thanks for such a great article.
 I have come across some other issue, my application has admin side and client side both using the same database, Database contains the name like &quot;Crème Fraîche&quot;.
When I do force encoding like r.name.force_encoding(&quot;ISO-8859-1&quot;).encode(&quot;UTF-8&quot;)
admin side it displays the data properly but on client side it gives error &quot;incompatible character encodings: ISO-8859-1 and UTF-8&quot;

Before doing force encoding when I see the name in logs they are proper both admin as well as client side,

Am I missing something on client side, some other change might be required, please help I have been seeing for this problem from last 5 days.]]></description>
		<content:encoded><![CDATA[<p>Hi Yehuda,<br />
 I have been searching for this from last so many days, Thanks for such a great article.<br />
 I have come across some other issue, my application has admin side and client side both using the same database, Database contains the name like &#8220;Crème Fraîche&#8221;.<br />
When I do force encoding like r.name.force_encoding(&#8220;ISO-8859-1&#8243;).encode(&#8220;UTF-8&#8243;)<br />
admin side it displays the data properly but on client side it gives error &#8220;incompatible character encodings: ISO-8859-1 and UTF-8&#8243;</p>
<p>Before doing force encoding when I see the name in logs they are proper both admin as well as client side,</p>
<p>Am I missing something on client side, some other change might be required, please help I have been seeing for this problem from last 5 days.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shevy</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-22207</link>
		<dc:creator>shevy</dc:creator>
		<pubDate>Sat, 05 Feb 2011 01:42:29 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-22207</guid>
		<description><![CDATA[The encoding stuff bores me to no ends and is total crap anyway.

I want the default behaviour of Ruby 1.8.x

I don&#039;t care about the change in 1.9

I am not japanese, I don&#039;t even use UTF-8

Why am I forced to deal with encoding in Ruby 1.9 now???]]></description>
		<content:encoded><![CDATA[<p>The encoding stuff bores me to no ends and is total crap anyway.</p>
<p>I want the default behaviour of Ruby 1.8.x</p>
<p>I don&#8217;t care about the change in 1.9</p>
<p>I am not japanese, I don&#8217;t even use UTF-8</p>
<p>Why am I forced to deal with encoding in Ruby 1.9 now???</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shir</title>
		<link>http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/comment-page-1/#comment-22193</link>
		<dc:creator>shir</dc:creator>
		<pubDate>Sun, 23 Jan 2011 12:04:54 +0000</pubDate>
		<guid isPermaLink="false">http://yehudakatz.com/?p=476#comment-22193</guid>
		<description><![CDATA[hi yehuda
 can you explain me how to use hebrew characters with ruby 1.9.2,
thanks]]></description>
		<content:encoded><![CDATA[<p>hi yehuda<br />
 can you explain me how to use hebrew characters with ruby 1.9.2,<br />
thanks</p>
]]></content:encoded>
	</item>
</channel>
</rss>
