Solved the Telugu character display problem

Browsed quite a bit on UTF8 and PHP. There is a lot of stuff on it. Apparently PHP support is not very good for UTF8 i.e. looks like you need to do some additional programming for it. Was quite a pain as a lot of suggestions did not work.

Finally this link http://akrabat.com/php/utf8-php-and-mysql/ gave me the solution for a PHP only page using mysqli. All I needed was a:
mysqli_set_charset($con, “utf8”);
where $con is the link identifier returned by mysqli_connect().
Now the Telugu characters show up properly in the PHP page.

PHP page sourcecode:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<?php
header('Content-type: text/html; charset=UTF-8') ;
?>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Untitled Document</title>
</head>

<body>

<?php
$con=mysqli_connect("localhost","root","");
if(!$con)
{
echo "Errorinall<br>";
}
mysqli_select_db($con,"ci");
mysqli_set_charset($con, "utf8");
$result = mysqli_query($con,"SELECT * FROM telugu2english");

echo "<br><table border='1'>
<tr>
<th>Telugu sentence in Telugu script</th>
<th>English sentence in English script</th>
<th> English sentence audio </th>
</tr>";

while($row = mysqli_fetch_array($result))
{
echo "<tr>";
$a = $row['telugu'];
//   $a = "అ";
?>
<?php
// echo "<td>" . $row['id'] . "</td>";
//  echo "<td>" . $row['telugu'] . "</td>";
echo "<td>" . $a . "</td>";
echo "<td>" . $row['english'] . "</td>";
echo "<td>" . $row['english_audio_path'] . "</td>";
echo "</tr>";
}
echo "</table>";
mysqli_close($con);
?>
</body>
</html>

Need to now look at the CodeIgniter framework based code and see how to fix that.
http://philsturgeon.co.uk/blog/2009/08/UTF-8-support-for-CodeIgniter has some useful info.

Update on 21st Sept. 2011:

Had a weird experience! Initially the code as given in https://ravisiyer.wordpress.com/2011/09/20/display-telugu2english/ was displaying junk characters. I played around with it by changing it to what was being used in the above mentioned PHP only page (or equivalent in CI framework). If I recall correctly I did the following:

  • In t2e_view.php added HTML tags including the meta tag where charset got specified as UTF-8
  • Included this php code in t2e_view.php: header('Content-Type: text/html; charset=utf-8');
  • Included this ci code in t2e_model.php: $db['default']['char_set'] = "utf8";

Then the CI program showed Telugu characters properly! Problem solved!

But I wanted to know which directive did the trick. So I deleted all the above three changes. Rebooted the system so that any transient database settings are forgotten. And then tried again expecting to see junk characters. But I saw Telugu characters!

Did the last bullet code go and change the database default char_set permanently (not for just the program run/php page execution)? Well, phpmyadmin shows collation as latin1_swedish_ci for the two tables that ci database have (including telugu2english table). Of course, column telugu of telugu2english table is shown as having collation utf8_general_ci. So no changes from what it was before.

Right now things work. But my knowledge of underlying utf-8 stuff in both mySQL and CI/PHP database access functions/classes is poor. Will need to understand it properly somewhere down the line.

This entry was posted in Spoken English App. Blog. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s