Neo4j and Memgraph cypher implentation differences?

hi,

is there a list somewhere of known differences between the Neo4j and the Memgraph Cypher query language implementations? (Does Memgraph fully adhere to the technology compatibility kit available at https://www.opencypher.org/resources?)

I have an application using Neo4j as data storage backend, and I would like to also be able to use Memgraph as the backend, but Memgraph seems to behave differently. The code being run on the application side is the same, and I’m using the hasbolt library as the Bolt driver. No errors were thrown, and AFAIK I’m not using any fancy cypher extensions, no APOC, no stored procedures, etc…

I would like to some pointers on where I should look for differences first, because debugging all queries I make is not going to be easy nor fast…

thank you,

Hi @odanoburu!

Sorry for the slow response.

Here https://docs.memgraph.com/memgraph/reference-overview/differences you can find the differences between Memgraph and Neo4j. On the protocol side, Memgraph is compatible with v1 of Bolt. Neo4j introduced significant changes in v4. Memgraph doesn’t support v4 yet, but we do have plans to support it in the future.

I’m not exactly sure what’s the problem here because it’s not Bolt v4. Could you please share some error logs?

Thank you!

Thank you, that link is useful! I’m sorry that I missed it.

It doesn’t seem like the problem (as far as I’ve looked into it) is an incompatibility between cypher-as-neo4j-implements-it and cypher-as-memgraph-implements-it.

I ran a minimal query manually and it worked as expected, while when my program runs it using the hasbolt library it does not fail, but the results are not as expected. Is there a way of logging the queries received by memgraph? The memgraph logs at /var/log/memgraph/memgraph.log only show connection information and snapshot creation information, nothing that seems useful (and verbosity seems to be at the maximum level, --min-log-level=0)

I have logged the queries I have made and confirmed that they are the same ones I ran manually, but the hasbolt library might be adding something to them, and the transactions are not the same (when using hasbolt they are ran as one transaction while manually they run as separate transactions, but that shouldn’t change the output in this case.)

For reference, the queries are:

CREATE (crown:Crown)-[:Meta]->(root:Rule:Root {label: $label})-[:DerivedBy {name: "root"}]->(g:Goal {hyps: [], discards: []})-[:Meta]->(crown)
CREATE (n:Formula {formula: $rootFormula})
SET n.root = id(root)
CREATE (g)-[:Derives]->(n)
RETURN id(root) AS rootId, id(g) AS goalId

// id(root) was 48

MERGE (l:Formula {formula: "(-> 1 2)", root: 48})
MERGE (r:Formula {formula: "(-> (-> 1 (-> 2 3)) (-> 1 3))", root: 48})
MERGE (n:Formula {formula: "(-> (-> 1 2) (-> (-> 1 (-> 2 3)) (-> 1 3)))", root: 48})
MERGE (l)-[:Builds{ant: true}]->(n)<-[:Builds{csq: true}]-(r)
RETURN n
UNION ALL
MERGE (l:Formula {formula: "1", root: 48})
MERGE (r:Formula {formula: "2", root: 48})
MERGE (n:Formula {formula: "(-> 1 2)", root: 48})
MERGE (l)-[:Builds{ant: true}]->(n)<-[:Builds{csq: true}]-(r)
RETURN n
UNION ALL
MERGE (l:Formula {formula: "(-> 1 (-> 2 3))", root: 48})
MERGE (r:Formula {formula: "(-> 1 3)", root: 48})
MERGE (n:Formula {formula: "(-> (-> 1 (-> 2 3)) (-> 1 3))", root: 48})
MERGE (l)-[:Builds{ant: true}]->(n)<-[:Builds{csq: true}]-(r)
RETURN n
UNION ALL
MERGE (l:Formula {formula: "1", root: 48})
MERGE (r:Formula {formula: "(-> 2 3)", root: 48})
MERGE (n:Formula {formula: "(-> 1 (-> 2 3))", root: 48})
MERGE (l)-[:Builds{ant: true}]->(n)<-[:Builds{csq: true}]-(r)
RETURN n
UNION ALL
MERGE (l:Formula {formula: "2", root: 48})
MERGE (r:Formula {formula: "3", root: 48})
MERGE (n:Formula {formula: "(-> 2 3)", root: 48})
MERGE (l)-[:Builds{ant: true}]->(n)<-[:Builds{csq: true}]-(r)
RETURN n
UNION ALL
MERGE (l:Formula {formula: "1", root: 48})
MERGE (r:Formula {formula: "3", root: 48})
MERGE (n:Formula {formula: "(-> 1 3)", root: 48})
MERGE (l)-[:Builds{ant: true}]->(n)<-[:Builds{csq: true}]-(r)
RETURN n

(The second one doesn’t use parameters and is dynamically built from the response from the first query, which is bad practice, I know!)

When running manually I see all nodes as expected:

memgraph> MATCH (n) RETURN n;
+-------------------------------------------------------------------------------+
| n                                                                             |
+-------------------------------------------------------------------------------+
| (:Crown)                                                                      |
| (:Rule:Root {label: ""})                                                      |
| (:Goal {discards: [], hyps: []})                                              |
| (:Formula {formula: "(-> (-> 1 2) (-> (-> 1 (-> 2 3)) (-> 1 3)))", root: 52}) |
| (:Formula {formula: "(-> 1 2)", root: 52})                                    |
| (:Formula {formula: "(-> (-> 1 (-> 2 3)) (-> 1 3))", root: 52})               |
| (:Formula {formula: "1", root: 52})                                           |
| (:Formula {formula: "2", root: 52})                                           |
| (:Formula {formula: "(-> 1 (-> 2 3))", root: 52})                             |
| (:Formula {formula: "(-> 1 3)", root: 52})                                    |
| (:Formula {formula: "(-> 2 3)", root: 52})                                    |
| (:Formula {formula: "3", root: 52})                                           |
+-------------------------------------------------------------------------------+
12 rows in set (0.001 sec)

when using my app (and hasbolt):

memgraph> MATCH (n) RETURN n;
+-------------------------------------------------------------------------------+
| n                                                                             |
+-------------------------------------------------------------------------------+
| (:Crown)                                                                      |
| (:Rule:Root {label: ""})                                                      |
| (:Goal {discards: [], hyps: []})                                              |
| (:Formula {formula: "(-> (-> 1 2) (-> (-> 1 (-> 2 3)) (-> 1 3)))", root: 48}) |
+-------------------------------------------------------------------------------+
4 rows in set (0.001 sec)
2 Likes

Hey, there!

I’ve tried to recreate your problem.

The issue I’m having is the following - you have two consecutive RETURN clauses that aren’t joined with a UNION operator. Even if they were, the query would fail because RETURN clauses joined by UNION must return the same number of columns, of same name.

The RETURN clauses I’m talking about are

RETURN id(root) AS rootId, id(g) AS goalId

and the first

RETURN n

after it.

My guess is that when you punch in this query manually, everything comes out as expected (even though, if you punch it in literally as it is, it should fail if you don’t add ‘;’ at the end of the first RETURN clause, and the last one).
However, when you run this query from your app, it might be that, due to the specifics of the hasbolt library, the query is partitioned in some way so that the top-level query runs after all, but the following subqueries silently fail.

Could you provide a minimal example query that recreates the problem your’re experiencing?

Thanks,
J.

1 Like

hi!

thanks for looking at it!

The issue I’m having is the following - you have two consecutive RETURN clauses that aren’t joined with a UNION operator.

I’m sorry, my previous message was a bit confusing. I put a comment between them to make them separate, but they still look like one query; they are actually two separate queries. The Haskell code looks more or less like this:

do
  (root, goal) <- createRoot -- first query
  _ <- populateFormula root -- second query
  return (root, goal)

and of course, it works fine with neo4j!

Could you provide a minimal example query that recreates the problem your’re experiencing?

sure, but it’d involve the hasbolt library; would that (installing Haskell and some libraries) be a problem?

PS: I sent this message by mistake before completing it, hence the many edits!

1 Like

Not a problem at all. Post an example, and I’ll look into it.

J.

1 Like

alright, thank you! so I made a very minimal example and it still doesn’t work as expected, so I imagine I must be doing something wrong (but what? since Neo4j also works in this example).

anyway, here’s the example:

#!/usr/bin/env stack
{- stack
  script
  --resolver lts-14.16
  --package hasbolt
  --package connection
-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where

import Control.Exception (try)
import Control.Monad.IO.Class (liftIO)
import Database.Bolt
import Network.Connection (HostCannotConnect)

boltCfg :: BoltCfg
boltCfg
  = BoltCfg { magic         = 1616949271
            , version       = 1
            , userAgent     = "hasbolt/1.3"
            , maxChunkSize  = 65535
            , socketTimeout = 5
            , host          = "127.0.0.1"
            , port          = 7687
            , user          = ""
            , password      = ""
            , secure        = False
            }

main :: IO ()
main = runBolt . transact $ do
  query_ "CREATE (node:Test) RETURN true"
  query_ "CREATE (anotherNode:Test2) RETURN true"
  return ()

runBolt :: BoltActionT IO a -> IO a
runBolt ac = do
  pipeOrErr <- liftIO . try $ connect boltCfg
  case pipeOrErr of
    Right pipe
      -> do
      result <- liftIO $ run pipe ac
      liftIO $ close pipe
      return result
    Left (_ex :: HostCannotConnect) -> error "can't connect to database."

it’s easiest to run it with Stack, just run stack Memgraph.hs (assuming you name the script Memgraph.hs as I did). I’m including these instructions even if you have a Haskell environment set up since someone reading this might not have one! Stack will install everything for you — compiler and libraries — and then run the script. be aware that haskell has long compilation times!

w.r.t. the environment, I run memgraph in a container like so:

[odanoburu@cocanha]~% sudo podman run -it -p 7687:7687 \
  -v mg_lib:/var/lib/memgraph -v mg_log:/var/log/memgraph -v mg_etc:/etc/memgraph \
  --entrypoint bash memgraph
memgraph@9b2ef2c1a4c8:/usr/lib/memgraph$ ./memgraph --also-log-to-stderr=true --bolt-cert-file="" --bolt-key-file="" --telemetry-enabled=false
You are running Memgraph v1.1.0-community
I0827 19:25:44.864904     3] Starting snapshot recovery from "/var/lib/memgraph/snapshots/20200827192526854019_timestamp_104"
I0827 19:25:44.865149     3] Snapshot recovery successful!
I0827 19:25:44.866453     3] Loading module "/usr/lib/memgraph/query_modules/wcc.py" ...
I0827 19:25:45.034400     3] Loaded module "/usr/lib/memgraph/query_modules/wcc.py"
I0827 19:25:45.034440     3] Loading module "/usr/lib/memgraph/query_modules/py_example.py" ...
I0827 19:25:45.036274     3] Loaded module "/usr/lib/memgraph/query_modules/py_example.py"
I0827 19:25:45.036293     3] Loading module "/usr/lib/memgraph/query_modules/graph_analyzer.py" ...
I0827 19:25:45.040452     3] Loaded module "/usr/lib/memgraph/query_modules/graph_analyzer.py"
I0827 19:25:45.040473     3] Loading module "/usr/lib/memgraph/query_modules/pagerank.py" ...
I0827 19:25:45.042673     3] Loaded module "/usr/lib/memgraph/query_modules/pagerank.py"
I0827 19:25:45.042690     3] Loading module "/usr/lib/memgraph/query_modules/example.so" ...
I0827 19:25:45.043313     3] Loaded module "/usr/lib/memgraph/query_modules/example.so"
Starting 8 Bolt workers
Bolt server is fully armed and operational
Bolt listening on 0.0.0.0:7687
I0827 19:25:49.842890    17] Accepted a Bolt connection from 10.88.0.1:32974
I0827 19:25:49.843209    14] Client connected 'hasbolt/1.3'
I0827 19:25:49.848294    14] Bolt client 10.88.0.1:32974 closed the connection.
I0827 19:27:34.379678    17] Accepted a Bolt connection from 10.88.0.34:50654
I0827 19:27:34.380610     8] Client connected 'mg_client/1.1.0-community'
I0827 19:28:12.883147    17] Accepted a Bolt connection from 10.88.0.1:33030
I0827 19:28:12.883404    11] Client connected 'hasbolt/1.3'
I0827 19:28:12.884402    10] Bolt client 10.88.0.1:33030 closed the connection.
I0827 19:28:50.084679    17] Accepted a Bolt connection from 10.88.0.1:33032
I0827 19:28:50.085000    12] Client connected 'hasbolt/1.3'
I0827 19:28:50.087579     8] Bolt client 10.88.0.1:33032 closed the connection.
^CBolt shutting down...
I0827 19:29:30.110759     3] Closing module "/usr/lib/memgraph/query_modules/wcc.py" ...
I0827 19:29:30.111057     3] Closed module "/usr/lib/memgraph/query_modules/wcc.py"
I0827 19:29:30.111131     3] Closing module "/usr/lib/memgraph/query_modules/py_example.py" ...
I0827 19:29:30.111320     3] Closed module "/usr/lib/memgraph/query_modules/py_example.py"
I0827 19:29:30.111389     3] Closing module "/usr/lib/memgraph/query_modules/pagerank.py" ...
I0827 19:29:30.111500     3] Closed module "/usr/lib/memgraph/query_modules/pagerank.py"
I0827 19:29:30.111564     3] Closing module "/usr/lib/memgraph/query_modules/graph_analyzer.py" ...
I0827 19:29:30.111675     3] Closed module "/usr/lib/memgraph/query_modules/graph_analyzer.py"
I0827 19:29:30.111737     3] Closing module "/usr/lib/memgraph/query_modules/example.so" ...
I0827 19:29:30.112015     3] Closed module "/usr/lib/memgraph/query_modules/example.so"
I0827 19:29:30.114583     3] Starting snapshot creation to "/var/lib/memgraph/snapshots/20200827192930114534_timestamp_114"
I0827 19:29:30.118126     3] Snapshot creation successful!

as you can see connections were made, no error was reported, and in this case none of the nodes were created (whereas in my other example the first query seemed to work and the second one seemed to be ignored).

[odanoburu@cocanha]~/sites/gtp/memgraph-debug% sudo podman ps
[sudo] password for odanoburu: 
CONTAINER ID  IMAGE                      COMMAND  CREATED         STATUS             PORTS                   NAMES
e2825d0c456e  localhost/memgraph:latest           46 seconds ago  Up 45 seconds ago  0.0.0.0:7687->7687/tcp  condescending_khayyam
[odanoburu@cocanha]~/sites/gtp/memgraph-debug% sudo podman inspect -f "{{.NetworkSettings.IPAddress}}" e28 # get ip of container running memgraph
10.88.0.35
[odanoburu@cocanha]~/sites/gtp/memgraph-debug% sudo podman run -it \
  -v mg_lib:/var/lib/memgraph -v mg_log:/var/log/memgraph -v mg_etc:/etc/memgraph \
  --entrypoint bash memgraph
memgraph@5b3941375980:/usr/lib/memgraph$ mg_client --use-ssl=False --host="10.88.0.35"
mg_client 1.1.0-community
Type :help for shell usage
Quit the shell by typing Ctrl-D(eof) or :quit
Connected to 'memgraph://10.88.0.35:7687'
memgraph> MATCH (n) RETURN n;
Empty set (0.006 sec)
1 Like

A very interesting problem. @odanoburu thank you for very detailed explanations.

From the recent logs, MATCH (n) RETURN n; was the only Client connected 'mg_client/1.1.0-community? CREATE queries were after that?

In the meantime, I’ve re-run the given code, and yes, the issue is there. Still not sure where exactly…

I’m not sure I understood exactly what’s your question, but I should’ve been more careful sending the logs; they didn’t include only what I reported because I tried it more than once and probably deleted all data in-between. These are the logs after running memgraph as before, then stack Memgraph.hs once, then mg_client (as before) with the query MATCH (n) RETURN n; and quitting:

You are running Memgraph v1.1.0-community
I0828 18:27:45.392627     3] Starting snapshot recovery from "/var/lib/memgraph/snapshots/20200827201508374854_timestamp_125"
I0828 18:27:45.394572     3] Snapshot recovery successful!
I0828 18:27:45.396034     3] Loading module "/usr/lib/memgraph/query_modules/wcc.py" ...
I0828 18:27:45.662467     3] Loaded module "/usr/lib/memgraph/query_modules/wcc.py"
I0828 18:27:45.662504     3] Loading module "/usr/lib/memgraph/query_modules/py_example.py" ...
I0828 18:27:45.664356     3] Loaded module "/usr/lib/memgraph/query_modules/py_example.py"
I0828 18:27:45.664373     3] Loading module "/usr/lib/memgraph/query_modules/graph_analyzer.py" ...
I0828 18:27:45.668781     3] Loaded module "/usr/lib/memgraph/query_modules/graph_analyzer.py"
I0828 18:27:45.668840     3] Loading module "/usr/lib/memgraph/query_modules/pagerank.py" ...
I0828 18:27:45.671274     3] Loaded module "/usr/lib/memgraph/query_modules/pagerank.py"
I0828 18:27:45.671291     3] Loading module "/usr/lib/memgraph/query_modules/example.so" ...
I0828 18:27:45.671958     3] Loaded module "/usr/lib/memgraph/query_modules/example.so"
Starting 8 Bolt workers
Bolt server is fully armed and operational
Bolt listening on 0.0.0.0:7687
I0828 18:27:58.541304    17] Accepted a Bolt connection from 10.88.0.1:36852
I0828 18:27:58.544466    11] Client connected 'hasbolt/1.3'
I0828 18:27:58.552783    11] Bolt client 10.88.0.1:36852 closed the connection.
I0828 18:30:24.397536    17] Accepted a Bolt connection from 10.88.0.38:44586
I0828 18:30:24.398619    11] Client connected 'mg_client/1.1.0-community'
I0828 18:30:53.222039     9] Bolt client 10.88.0.38:44586 closed the connection.

I hope this answers your question, if not I might need some more explanation to fully understand what you mean!

1 Like

Hi @odanoburu! We figured out what’s is going on. So, hasbolt instead of sending PULL_ALL (https://boltprotocol.org/v1/#message-pull-all) sends DISCARD_ALL (https://boltprotocol.org/v1/#message-discard-all) because data isn’t used anywhere (an assumption). Memgraph, on the other side, creates vertices during result pulling, but in this case, pulling never happens.

Could you rewrite the code so that client fetches data for every query? Just read somehow data for each query (even in the case of CREATE queries). By doing so, hasbolt has to send PULL_ALL, and the nodes will be created.

We have to discuss internally do we create nodes even if results are discarded. I will keep you posted about the progress on that.

Thank you for an amazing bug report!

hello @buda, that makes sense! indeed, in the original example where I noticed this only one query seemed to have effects, and in this minimal example I gave none of the queries had effects, and this is because of the use of the query_ function, whose documentation states that it runs Cypher query and ignores response. so simply using the query function instead should work.

I see the query_ function (actually the way the DISCARD_ALL is used in these queries) as giving the cypher planner/executor an opportunity for optimizing a query that is performed only for side-effects; when we use it we are effectively telling the planner/executor that there’s no need to collect results, because we don’t care about them (but we do care about the effects, else why would we perform the query at all?)

thank you for finding out what the problem is, I’ll make the necessary changes and see how it goes!

1 Like

I might have found a documentation bug; since it’s still on-topic, I’m reporting it here. I got this error message:

user error (code: "Memgraph.TransientError.MemgraphError.MemgraphError", message: "Not yet implemented: atom expression '[ixINrange(0,len)WHEREgoal.hyps[ix]=dedId|goal.discards[ix]]'")

but I don’t seem to find anything in https://docs.memgraph.com/memgraph/reference-overview/differences that would explain the error, so maybe the page needs to be updated (or maybe it was I who missed something in the page?) From my testing it seems that it’s list comprehensions that are not implemented yet.

Unfortunately, list comprehensions are not supported yet (under upcoming features https://docs.memgraph.com/memgraph/upcoming-features#list-comprehensions). There might be a way to quickly rewrite the query or write a Python query module to implement more advanced stuff (https://docs.memgraph.com/memgraph/reference-overview/query-modules). At least in the short term.

If you could share the whole query, I’m happy to help to rewrite the query :smiley:

it’s okay, I can implement it in Haskell, it’s not a performance-sensitive part of the app, just wanted to raise the issue on the documentation page!

another apparent problem is that queries may not begin with comments, either on mg_client or using hasbolt:


memgraph> CREATE (node:Test)
CREATE (node:Test)
       -> // a comment
// a comment
       -> RETURN true;
RETURN true;
Empty set (0.001 sec)
memgraph> // a comment
// a comment
       -> RETURN true;
RETURN true;
Client received exception: line 1:25 no viable alternative at input '<EOF>' 

here’s another difference between memgraph and neo4j:

memgraph> MATCH (n) DETACH DELETE n;
MATCH (n) DETACH DELETE n;
Empty set (0.001 sec)
memgraph> OPTIONAL MATCH (f:Test)
RETURN f IS NULL AND id(f) <> 3;
OPTIONAL MATCH (f:Test)
RETURN f IS NULL AND id(f) <> 3;
       -> Client received exception: 'id' argument at position 1 must be either 'node' or 'edge'.
memgraph> CREATE (f:Test) RETURN f;
CREATE (f:Test) RETURN f;
+---------+
| f       |
+---------+
| (:Test) |
+---------+
1 row in set (0.001 sec)
memgraph> OPTIONAL MATCH (f:Test)
RETURN f IS NULL AND id(f) <> 3;
OPTIONAL MATCH (f:Test)
RETURN f IS NULL AND id(f) <> 3;
       -> +--------------------------+
| f IS NULL AND id(f) <> 3 |
+--------------------------+
| false                    |
+--------------------------+
1 row in set (0.000 sec)

whereas in Neo4j:

neo4j> MATCH (n) DETACH DELETE n;
0 rows available after 11 ms, consumed after another 0 ms
neo4j> OPTIONAL MATCH (f:Test)
RETURN f IS NULL AND id(f) <> 3;
+--------------------------+
| f IS NULL AND id(f) <> 3 |
+--------------------------+
| NULL                     |
+--------------------------+

1 row available after 1 ms, consumed after another 0 ms
neo4j> CREATE (f:Test) RETURN f;
+---------+
| f       |
+---------+
| (:Test) |
+---------+

1 row available after 27 ms, consumed after another 1 ms
Added 1 nodes, Added 1 labels
neo4j> OPTIONAL MATCH (f:Test)
RETURN f IS NULL AND id(f) <> 3;
+--------------------------+
| f IS NULL AND id(f) <> 3 |
+--------------------------+
| FALSE                    |
+--------------------------+

1 row available after 3 ms, consumed after another 0 ms

am I correct in assuming that you consider most (all?) Cypher implementation differences between neo4j and memgraph to be bugs?

@odanoburu The semantic is some times questionable. Memgraph tries to follow the semantic as much as possible, but it might happen the semantic is different, mainly because it was/is more logical from the user perspective. In other words, if semantic is different, it’s not necessarily a bug.

In the mg_client case, space is used to concatenate multiline queries, and that is a bug. It will be fixed and include in the next release.

In the id(f) case, that’s also a bug. Memgraph’s id(f) should return null. We’ll fix that, and the fix will be included in the next release.

You are Memgraph’s MVP when it comes to finding bugs. Please keep going! :smiley:

I see! thank you for the information. I’ll keep an eye out for the next release!

1 Like