algorithm - How to Handing String Values While Performing Linear Regression Using Weka Java API -


i performing linear regression using weka java api. data set consists of userid, url visited user, time spend on page. url string attribute, facing problem while performing linear regression above dataset. ready use method converts string equivalent int values in weka. have seen similar kind of functionality in mahout not find in weka. can create function output int values of string calculating sum of ascii if each characters, want more reliable , tested solution.

you correct linear regression operates on numeric values. however, not @ true old conversion categorical values numbers fine. example, hashing string gives number, give meaningless results feature linear regression.

numeric values expected have ordering , meaningful magnitude. mean "foo.com" 135092 , "bar.com" 985882? linear regression try interpret "bar.com" "something 5 times larger foo.com" nonsense.

you may thinking of 1-of-n encoding, create new 0/1 feature every possible value (url). won't feasible urls. domains -- maybe.


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -