java - Characters with diacritics are converted to ascii -
i trying fetch value textarea max 500 characters. facing problem characters diacritics each special character replaced 4 characters. consequence if give text of 500 characters includes french characters, actual length required persist in database (field length 500) exceeding 500 chars , transaction fails.
some examples:
- oubliée converted oubliée
- désiriez converted désiriez
can correct me if doing wrong or how fetch actual text entered user in java code? using following snippets of code:
form definition:
<form id="contform" method="post" name="formcont" action="/wps/customforms/participationrequest" enctype="multipart/form-data">
textarea definition:
<div class="spec textarea small" id="inpspec"> <label class="label" for="inp"><%=content.getlangmap().get(langcode)%> <span class="required">*</span> <span class="hint"></span> </label> <div class="value"> <div class="control"> <textarea cols="5" rows="3" id="<%=string.valueof(content.getid())%>" name="<%=string.valueof(content.getid())%>"></textarea> </div> </div> </div>
servlet snippet:
list<fileitem> items = new servletfileupload(new diskfileitemfactory()).parserequest(request); string description = null; (fileitem item : items) { if (item.isformfield()) { if (item.getfieldname().equalsignorecase(string.valueof(content.getid()))) { // here invalid data french characters. description = item.getstring()); } } }
additional information:
- application server used:
- platform: linux
- i tried set character encoding type of request utf-8, iso-8951... doesn't seem work.
all characters above 127 in utf-8 escaped in 2 or more bytes. seem have encoding mismatch between data writer , data writer:
- a text showing
é
typical of french accents characters saved utf-8 encoding later read latin-1 (aka iso-8859-1) :é
(hexe9
) in latin-1 when saved in utf-8 becomesc3 a9
( =é
if displayed "raw" characters) - but if
é
in turn saved utf-8, becomesc3 83 c2 a9
showsé
displayed raw characters or latin encoding
i don't know application server, text written utf-8 read plain iso-8859 / latin text .
Comments
Post a Comment