FeedForAll Forum Index  
 Home  •  FAQ  •   Search   •  Register   •  Profile   •  Log in
 UTF-8 not working? View next topic
View previous topic
Post new topicReply to topic
Author Message
wilcosky



Joined: 28 Jun 2008
Posts: 11

PostPosted: Mon Dec 06, 2010 8:17 am Reply with quoteBack to top

Hi,

My feed appears to be correctly coded in UTF-8, however, every now and then certain characters show up like:
Doc’s

or

juste côté du Bataclan

My feed is:
http://i.ndiesho.ws/rss/rss-latest.php


I even tried forcing the text string itself to be utf-8 by using php's utf_encode() function, that just cause the characters to become more messed up which is odd.
View user's profile
wilcosky



Joined: 28 Jun 2008
Posts: 11

PostPosted: Mon Dec 06, 2010 8:24 am Reply with quoteBack to top

Also, I just want to add, that it is no my database. I know this because all the characters are fine on my site. The following link will take you to the same event that is shown in my above rss feed with the messed up character for the word "Doc's".

http://i.ndiesho.ws/event/3171/


You'll see that "Doc's" looks just fine. But, in my feed, the apostrophe is messed up. So, it's not messed up in my database, for some reason the sql2rss script is having trouble reading certain types of apostrophes and other characters even though again, it looks like everything is in UTF-8. In fact the script comes with UTF-8 by default right? I haven't changed that.

It's so weird...
View user's profile
wilcosky



Joined: 28 Jun 2008
Posts: 11

PostPosted: Mon Dec 06, 2010 8:31 am Reply with quoteBack to top

Sorry to keep replying to my own topic but I also just noticed that this person was having this exact same problem:

http://www.feedforall.com/forum/viewtopic.php?t=3016

But, it seems like they must have figured out how to fix it because their feed looks fine. Hmmm...
View user's profile
Tech Support



Joined: 27 Aug 2004
Posts: 2782

PostPosted: Tue Dec 07, 2010 11:09 am Reply with quoteBack to top

Perhaps the PHP iconv function will work.

_________________
Create RSS Feeds
Audio Recording and Editing
View user's profile
wilcosky



Joined: 28 Jun 2008
Posts: 11

PostPosted: Wed Dec 08, 2010 5:20 pm Reply with quoteBack to top

Thank you for the suggestion. I could not get the iconv function to change the weird characters into the correct characters so instead, I used iconv to simply remove any weird characters. Not the best fix, but it will work for now. Basically I used:

iconv("UTF-8", "ASCII//IGNORE", $rssString);

This takes the text in my RSS feed and changes it to standard latin, then if there are any characters it does not recognize, it ignores them and removes them from the string. So now, instead of seeing "Doc’s" I see "Docs". Even though it is stripping out the apostrophe, it is still more readable since all those strange characters are gone.

After hours and hours of testing this is the best I could come up with.

It's still weird, my site is able to read the text from my database and interpret it correctly in UTF-8 format. The only script that has trouble is this sql2rss script. Maybe this sql2rss script tries convert to UTF too many times, or maybe it's just not converting to UTF-8 correctly. I'm not sure... All I know is, once again, my website can read my database fine, by my RSS feed cannot.

Maybe in the future you guys can look into update and re-coding this script?
View user's profile
Tech Support



Joined: 27 Aug 2004
Posts: 2782

PostPosted: Thu Dec 09, 2010 10:53 am Reply with quoteBack to top

One more option would be the mb_convert_encoding function. Maybe something like this:

mb_convert_encoding($text, "HTML-ENTITIES", "UTF-8")

_________________
Create RSS Feeds
Audio Recording and Editing
View user's profile
wilcosky



Joined: 28 Jun 2008
Posts: 11

PostPosted: Fri Dec 10, 2010 2:10 pm Reply with quoteBack to top

That didn't work either. I read the following a site called ask-leo.com:

"Now, what happens when the UTF-8 series of numbers is interpreted as if it were ISO-8859-1?


Look familiar?

0xE28099 breaks down as 0xE2 (), 0x80 () and 0x99 (). What was one character in UTF-8 () gets mistakenly displayed as three (’) when misinterpreted as ISO-8859-1."


Maybe that is what is happening?

Everything in my feed appears to be encoded in UTF-8. Even the browser says it is in UTF-8. But maybe something is cause the feed script to think it's ISO-8859-1?


I don't know. I'm totally lost.
View user's profile
Tech Support



Joined: 27 Aug 2004
Posts: 2782

PostPosted: Wed Dec 15, 2010 9:50 am Reply with quoteBack to top

I think I may have found your problem.

If you use a tool to check the headers that your webserver is returning (like http://www.seoconsultants.com/tools/headers), and check out the headers when your feed is accessed, you will see that your webserver is saying that the feed file is in the ISO-8859-1
character set.

The headers your webserver is returning for your feed are:

HTTP Status Code: HTTP/1.1 200
Date: Wed, 15 Dec 2010 14:49:37 GMT
Server: Apache/2.2.0 (Unix)
Connection: close
Content-length: 493
Content-Type: text/html; charset=ISO-8859-1

You might ask your web host about that.

_________________
Create RSS Feeds
Audio Recording and Editing
View user's profile
Display posts from previous:      
Post new topicReply to topic


 Jump to:   



View next topic
View previous topic


Powered by phpBB © 2001, 2002 phpBB Group :: FI Theme