PHP: fixing character encoding problems

When dealing with character encoding in PHP, the first thing you need to do is to synchronize your PHP pages with your database. Your pages must have the same character encoding of your database. So if your database is in UTF-8, your pages must be in UTF-8 too. To accomplish this task, you should never rely on the HTML meta elements. These elements simply tells the browser that your page should be interpreted in a given encoding, not that your page is actually encoded in that encoding. So you should first check if your PHP editor correctly encodes your pages. Do this simple test:

  1. create a blank page without any meta element, for example <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  2. save your page in the desired encoding (here UTF-8)
  3. open your page in a web browser and check the page info section
    1. if your browser says that the page is in UTF-8, then everything worked fine
    2. otherwise, your PHP editor doesn't save your pages in the specified encoding.

Most PHP editors claim that they save your pages in UTF-8, but they actually don't do that. What's next then?

  1. use the utf8_encode() and utf8_decode() PHP functions
  2. use the iconv() PHP function

These functions should be used either when you save something on your database or when you retrieve something from it. If you properly escape user's input, then there won't be no problem at all. On the contrary, when displaying characters from the database, you can use the solution proposed by Sergio in his post:

function fixEncoding($in_str)
{
  $cur_encoding = mb_detect_encoding($in_str);
  
  if($cur_encoding == "UTF-8" && mb_check_encoding($in_str,"UTF-8")) {
  
    return $in_str;
    
  } else {
    
    return utf8_encode($in_str);
    
  }
}

This entry was posted in by Gabriele Romanato. Bookmark the permalink.

One thought on “PHP: fixing character encoding problems”

Leave a Reply

Note: Only a member of this blog may post a comment.