Bad performance compared to MySql with Neo4j -
i migrated mysql database neo4j , tested simple request. surprised see equivalent request 10 100 times longer in neo4j in mysql. i'm working on neo4j 2.0.1.
in original mysql schema have 3 following tables :
- countries : containing 'code', 'continent_id' , 'selected' boolean,
- cities : containing 'country_code', 'name' , 'status' boolean,
- theaters : containing 'city_id' , 'public' boolean,
with indexes on each property. want display theater count city given continent several conditions. request :
select count(*) nb, c.name `cities` c left join theaters t on c.id = t.city_id c.country_code in (select code countries selected true , continent_id = 4) , c.status=1 , t.public = 1 group c.name order nb desc
database schema in neo4j following :
(:continent)-[:include]->(:country{selected:bool})-[:include]->(:city{name:string, status:bool})-[:include]->(:theater{public:bool})
there's index defined on each property. cypher request :
match (:continent{code: 4})-[:include]->(:country{selected:true})-[:include]->(city:city{status:true})-[:include]->(:theater{public: true}) return city.name, count(*) nb order nb desc
there 70.000 cities , 140.000 theaters in each database.
on continent id 4 mysql request took 0.02s whereas neo4j took 0.4s. moreover, if introduce in cypher request variable relation length between country , city (...(:country{selected:true})-[:include*..3]->(city:city{status:true})...
) want able add intermediate levels regions, there request take more 2 seconds.
i'm aware in particular case there no benefit use neo4j instead of mysql, expect see approximatively equivalent performance between 2 technologies, , want take advantage of neo4j geographical hierarchy capabilities.
am missing or limitation of neo4j?
thank answers.
edit : first you'll find database dump files here. neo4j server configuration 1 out of box. work in ruby environment, , use neography gem. run neo4j server separatly since i'm not on jruby, sends cypher request through rest api.
the database contains 244 countries, 69000 cities , 138,000 theaters. continent_id 4, there 46,982 cities (37,210 have status boolean set true) , 74,420 theaters.
the request returned 2256 rows. @ third run, took 338 ms. here request output profiling infos :
profile match (:continent{code: 4})-[:include]->(country:country{selected:true})-[:include*..1]->(city:city{status:true})-[:include]->(theater:theater{public: true}) return city.name, count(*) nb order nb desc; ==> columnfilter(symkeys=["city.name", " internal_aggregate85ca19f3-9421-4c18-a449-1097e3deede2"], returnitemnames=["city.name", "nb"], _rows=2256, _db_hits=0) ==> sort(descr=["sortitem(cached( internal_aggregate85ca19f3-9421-4c18-a449-1097e3deede2 of type integer),false)"], _rows=2256, _db_hits=0) ==> eageraggregation(keys=["cached(city.name of type any)"], aggregates=["( internal_aggregate85ca19f3-9421-4c18-a449-1097e3deede2,countstar())"], _rows=2256, _db_hits=0) ==> extract(symkeys=["city", " unnamed27", " unnamed7", "country", " unnamed113", "theater", " unnamed72"], exprkeys=["city.name"], _rows=2257, _db_hits=2257) ==> filter(pred="(haslabel(theater:theater(3)) , property(theater,public(5)) == true)", _rows=2257, _db_hits=2257) ==> simplepatternmatcher(g="(city)-[' unnamed113']-(theater)", _rows=2257, _db_hits=4514) ==> filter(pred="(((haslabel(city:city(2)) , haslabel(city:city(2))) , property(city,status(4)) == true) , property(city,status(4)) == true)", _rows=2257, _db_hits=74420) ==> traversalmatcher(start={"label": "continent", "query": "literal(4)", "identifiers": [" unnamed7"], "property": "code", "producer": "schemaindex"}, trail="( unnamed7)-[ unnamed27:include (((haslabel(nodeidentifier():country(1)) , haslabel(nodeidentifier():country(1))) , property(nodeidentifier(),selected(3)) == true) , property(nodeidentifier(),selected(3)) == true) , true]->(country)-[:include*1..1]->(city)", _rows=37210, _db_hits=37432)
you're right, tried myself , got down 100ms query.
match (:continent{code: 4})-[:include]-> (country:country{selected:true})-[:include]-> (city:city{status:true})-[:include]-> (theater:theater{public: true}) return city.name, count(*) nb order nb desc; | "forbach" | 1 | | "stuttgart" | 1 | | "mirepoix" | 1 | | "bonnieux" | 1 | | "saint cyprien plage" | 1 | | "crissay sur manse" | 1 | +--------------------------------------+ 2256 rows **85 ms**
please note cypher of 2.0.x has not yet performance optimized, work started in neo4j 2.1 , continue until 2.3. there more performance work planned in kernel speed things up.
i implemented solution in java , got down 19ms. of course not pretty that's aiming cypher too:
class city { node city; int count = 1; public city(node city) { this.city = city; } public void inc() { count++; } @override public string tostring() { return string.format("city{city=%s, count=%d}", city.getproperty("name"), count); } } private list<?> queryjava3() { long start = system.currenttimemillis(); node continent = iteratorutil.single(db.findnodesbylabelandproperty(continent, "code", 4)); map<node,city> result = new hashmap<>(); (relationship rel1 : continent.getrelationships(direction.outgoing,include)) { node country = rel1.getendnode(); if (!(country.haslabel(country) && (boolean) country.getproperty("selected", false))) continue; (relationship rel2 : country.getrelationships(direction.outgoing, include)) { node city = rel2.getendnode(); if (!(city.haslabel(city) && (boolean) city.getproperty("status", false))) continue; (relationship rel3 : city.getrelationships(direction.outgoing, include)) { node theater = rel3.getendnode(); if (!(theater.haslabel(theater) && (boolean) theater.getproperty("public", false))) continue; city city1 = result.get(city); if (city1==null) result.put(city,new city(city)); else city1.inc(); } } } list<city> list = new arraylist<>(result.values()); collections.sort(list, new comparator<city>() { @override public int compare(city o1, city o2) { return integer.compare(o2.count,o1.count); } }); output("java", start, list.iterator()); return list; } java time = 19ms first = city{city=val de meuse, count=1} total-count 22561
Comments
Post a Comment