Author |
Message |
wilcosky
Joined: 28 Jun 2008
Posts: 11
|
Posted:
Mon Dec 06, 2010 8:17 am |
  |
Hi,
My feed appears to be correctly coded in UTF-8, however, every now and then certain characters show up like:
Doc’s
or
juste à côté du Bataclan
My feed is:
http://i.ndiesho.ws/rss/rss-latest.php
I even tried forcing the text string itself to be utf-8 by using php's utf_encode() function, that just cause the characters to become more messed up which is odd. |
|
|
 |
 |
wilcosky
Joined: 28 Jun 2008
Posts: 11
|
Posted:
Mon Dec 06, 2010 8:24 am |
  |
Also, I just want to add, that it is no my database. I know this because all the characters are fine on my site. The following link will take you to the same event that is shown in my above rss feed with the messed up character for the word "Doc's".
http://i.ndiesho.ws/event/3171/
You'll see that "Doc's" looks just fine. But, in my feed, the apostrophe is messed up. So, it's not messed up in my database, for some reason the sql2rss script is having trouble reading certain types of apostrophes and other characters even though again, it looks like everything is in UTF-8. In fact the script comes with UTF-8 by default right? I haven't changed that.
It's so weird... |
|
|
 |
 |
wilcosky
Joined: 28 Jun 2008
Posts: 11
|
Posted:
Mon Dec 06, 2010 8:31 am |
  |
Sorry to keep replying to my own topic but I also just noticed that this person was having this exact same problem:
http://www.feedforall.com/forum/viewtopic.php?t=3016
But, it seems like they must have figured out how to fix it because their feed looks fine. Hmmm... |
|
|
 |
 |
Tech Support
Joined: 27 Aug 2004
Posts: 2791
|
Posted:
Tue Dec 07, 2010 11:09 am |
  |
|
 |
 |
wilcosky
Joined: 28 Jun 2008
Posts: 11
|
Posted:
Wed Dec 08, 2010 5:20 pm |
  |
Thank you for the suggestion. I could not get the iconv function to change the weird characters into the correct characters so instead, I used iconv to simply remove any weird characters. Not the best fix, but it will work for now. Basically I used:
iconv("UTF-8", "ASCII//IGNORE", $rssString);
This takes the text in my RSS feed and changes it to standard latin, then if there are any characters it does not recognize, it ignores them and removes them from the string. So now, instead of seeing "Doc’s" I see "Docs". Even though it is stripping out the apostrophe, it is still more readable since all those strange characters are gone.
After hours and hours of testing this is the best I could come up with.
It's still weird, my site is able to read the text from my database and interpret it correctly in UTF-8 format. The only script that has trouble is this sql2rss script. Maybe this sql2rss script tries convert to UTF too many times, or maybe it's just not converting to UTF-8 correctly. I'm not sure... All I know is, once again, my website can read my database fine, by my RSS feed cannot.
Maybe in the future you guys can look into update and re-coding this script? |
|
|
 |
 |
Tech Support
Joined: 27 Aug 2004
Posts: 2791
|
Posted:
Thu Dec 09, 2010 10:53 am |
  |
|
 |
 |
wilcosky
Joined: 28 Jun 2008
Posts: 11
|
Posted:
Fri Dec 10, 2010 2:10 pm |
  |
That didn't work either. I read the following a site called ask-leo.com:
"Now, what happens when the UTF-8 series of numbers is interpreted as if it were ISO-8859-1?
’
Look familiar?
0xE28099 breaks down as 0xE2 (â), 0x80 (€) and 0x99 (™). What was one character in UTF-8 (’) gets mistakenly displayed as three (’) when misinterpreted as ISO-8859-1."
Maybe that is what is happening?
Everything in my feed appears to be encoded in UTF-8. Even the browser says it is in UTF-8. But maybe something is cause the feed script to think it's ISO-8859-1?
I don't know. I'm totally lost. |
|
|
 |
 |
Tech Support
Joined: 27 Aug 2004
Posts: 2791
|
Posted:
Wed Dec 15, 2010 9:50 am |
  |
I think I may have found your problem.
If you use a tool to check the headers that your webserver is returning (like http://www.seoconsultants.com/tools/headers), and check out the headers when your feed is accessed, you will see that your webserver is saying that the feed file is in the ISO-8859-1
character set.
The headers your webserver is returning for your feed are:
HTTP Status Code: HTTP/1.1 200
Date: Wed, 15 Dec 2010 14:49:37 GMT
Server: Apache/2.2.0 (Unix)
Connection: close
Content-length: 493
Content-Type: text/html; charset=ISO-8859-1
You might ask your web host about that. |
_________________ Create RSS Feeds
Audio Recording and Editing |
|
 |
 |
|