java - Characters with diacritics are converted to ascii -


i trying fetch value textarea max 500 characters. facing problem characters diacritics each special character replaced 4 characters. consequence if give text of 500 characters includes french characters, actual length required persist in database (field length 500) exceeding 500 chars , transaction fails.

some examples:

  • oubliée converted oubliée
  • désiriez converted désiriez

can correct me if doing wrong or how fetch actual text entered user in java code? using following snippets of code:

form definition:

<form id="contform" method="post" name="formcont"     action="/wps/customforms/participationrequest"     enctype="multipart/form-data"> 

textarea definition:

<div class="spec textarea small" id="inpspec">     <label class="label" for="inp"><%=content.getlangmap().get(langcode)%>         <span class="required">*</span> <span class="hint"></span> </label>     <div class="value">         <div class="control">             <textarea cols="5" rows="3"                 id="<%=string.valueof(content.getid())%>"                 name="<%=string.valueof(content.getid())%>"></textarea>         </div>     </div> </div> 

servlet snippet:

list<fileitem> items = new servletfileupload(new diskfileitemfactory()).parserequest(request); string description = null; (fileitem item : items) {         if (item.isformfield()) {                 if (item.getfieldname().equalsignorecase(string.valueof(content.getid()))) {                          // here invalid data french characters.                         description =  item.getstring());                     }         } } 

additional information:

  • application server used:
  • platform: linux
  • i tried set character encoding type of request utf-8, iso-8951... doesn't seem work.

all characters above 127 in utf-8 escaped in 2 or more bytes. seem have encoding mismatch between data writer , data writer:

  • a text showing é typical of french accents characters saved utf-8 encoding later read latin-1 (aka iso-8859-1) : é (hex e9) in latin-1 when saved in utf-8 becomes c3 a9 ( = é if displayed "raw" characters)
  • but if é in turn saved utf-8, becomes c3 83 c2 a9 shows é displayed raw characters or latin encoding

i don't know application server, text written utf-8 read plain iso-8859 / latin text .


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -