Query Times in Memgraph Lab vs other methods

I apologize if this has already been talked about somewhere.

I am working on a project where I need to provide a graph to the frontend as quickly as possible (Memgraph has already been a tremendous help since I originally started with the data in Postgres, thinking ‘How long could it take?’)

One of my sample queries returns a map containing 5.5k nodes and ~33k relationships. Running this query in Memgraph Lab takes about 0.3s. At least that’s what it tells me, but I believe it, since the visualization starts pretty much right away.

Unfortunately, when I try to run the same query in other ways, it takes much longer.

Using mgclient from python, I get this from the profiler:
ncalls tottime percall cumtime percall filename:lineno(function)
1 3.736 3.736 3.736 3.736 {method ‘execute’ of ‘mgclient.Cursor’ objects}

Using mgconsole it provides this:
(round trip in 3.715 sec)

Using the mgclient library in a little c++ program based on one of the examples I get:
time to connect: 0.10363
time to execute: 0.00786637
time to fetch: 3.7809
Fetched 1 row(s)
total time: 3.90357s

The total times for all the things I tried make sense, since they should all be using the same library, but I don’t really understand how Memgraph Lab seems to be so much faster. Is it just that Memgraph Lab is running on same machine as Memgraph, or is there something else at play that I’m misunderstanding? Any thoughts on speeding up the query/fetch ?

Thanks

Hi @Canso :slight_smile: Can you provide the query you are running (at least a mock version) and a part of the code you are running from Python and C++?

I would love to!

So copying the query from the lab input (and changing the names)

MATCH p=(n: MGPerson {match_name: "jonqpublic"}) - [l:CRAB *..1] - (m) - [r:CRAB|FISH *..1] - (s)
with project(p) as f
return f

The exact query (with formatting changes) is used in the python and the c++

The c++ code looks like this

#include <cstdlib>
#include <iostream>

#include <mgclient.hpp>
#include <chrono>


int main(void) {
    typedef std::chrono::high_resolution_clock Time;
    typedef std::chrono::seconds sec;
    typedef std::chrono::duration<float> fsec;
    
  auto st = Time::now();
    
    
  mg::Client::Init();
    
  mg::Client::Params params;
  params.host = "192.168.0.10";
  params.port = static_cast<uint16_t>(atoi("7687"));
  params.use_ssl = false;
  auto client = mg::Client::Connect(params);

  if (!client) {
    std::cerr << "Failed to connect!\n";
    return 1;
  }

  auto connect_time = Time::now();
  fsec ttc = connect_time - st;
  std::cout << "time to connect: " << ttc.count() << std::endl;
  
  std::string gql = "MATCH p=(n: MGPerson {match_name: \"jonqpublic\"}) - [l:CRAB *..1] - (m) - [r:CRAB|FISH *..1] - (s)\nwith project(p) as f\nreturn f";
  
  if (!client->Execute(gql)) {
    std::cerr << "Failed to execute query!";
    return 1;
  }

  auto execute_time = Time::now();
  fsec tte = execute_time - connect_time;
  std::cout << "time to execute: " << tte.count() << std::endl;
  
  int rows = 0;
  while (const auto maybeResult = client->FetchOne()) {
      ++rows;
  }

  auto fetch_time = Time::now();
  fsec ttf = fetch_time - execute_time;
  std::cout << "time to fetch: " << ttf.count() << std::endl;
  
  std::cout << "Fetched " << rows << " row(s)\n";

  // Deallocate the client because mg_finalize has to be called globally.
  client.reset(nullptr);

  mg::Client::Finalize();
    
  auto et = Time::now();  
  
  fsec fs = et - st;
  std::cout << "total time: " << fs.count() << "s" << std::endl;
    
 return 0;   
}

And the relevant bit of the python code looks like this:

import mgclient
def fetch_and_map_proc(inputs):
    tag = inputs[0]
    poi_mn = inputs[1]
    query = inputs[2]
    custom_logger = inputs[3]
    # connection = inputs[3]
    verbose = True
    if verbose:
        custom_logger(tag)

    p1.enable()

    connection = mgclient.connect(host="192.168.0.10", port=7687)

    cursor = connection.cursor()
    cursor.execute(query, {"poi_mn": poi_mn})
    res = cursor.fetchall()  # gdb.execute_and_fetch(query_funds)
    cursor.close()
    connection.close()
    res = list(res)[0][0]
    p1.disable()

    s = io.StringIO()
    sortby = SortKey.CUMULATIVE
    ps = pstats.Stats(p1, stream=s).sort_stats(sortby)
    ps.print_stats()
    custom_logger(s.getvalue())

    return tag, res


I haven’t had a chance to do more investigation yet, so I’m still at the same point.

I would love to!

Copying the query out of the lab input looks like this:

MATCH p=(n: MGPerson {match_name: "jonqpublic"}) - [l:CRAB *..1] - (m) - [r:CRAB|FISH *..1] - (s)
with project(p) as f
return f

The C++ code looks like this:

#include <cstdlib>
#include <iostream>

#include <mgclient.hpp>
#include <chrono>


int main(void) {
    typedef std::chrono::high_resolution_clock Time;
    typedef std::chrono::seconds sec;
    typedef std::chrono::duration<float> fsec;
    
    auto st = Time::now();
    
    
    mg::Client::Init();
    
    mg::Client::Params params;
  params.host = "192.168.0.10";
  params.port = static_cast<uint16_t>(atoi("7687"));
  params.use_ssl = false;
  auto client = mg::Client::Connect(params);

  if (!client) {
    std::cerr << "Failed to connect!\n";
    return 1;
  }

  auto connect_time = Time::now();
  fsec ttc = connect_time - st;
  std::cout << "time to connect: " << ttc.count() << std::endl;
  
  std::string gql = "MATCH p=(n: MGPerson {match_name: \"jonqpublic\"}) - [l:CRAB *..1] - (m) - [r:CRAB|FISH *..1] - (s)\nwith project(p) as f\nreturn f";
  
  if (!client->Execute(gql)) {
    std::cerr << "Failed to execute query!";
    return 1;
  }

  auto execute_time = Time::now();
  fsec tte = execute_time - connect_time;
  std::cout << "time to execute: " << tte.count() << std::endl;
  
  int rows = 0;
  while (const auto maybeResult = client->FetchOne()) {
    //std::cout << maybeResult << std::endl;
      ++rows;
  }

  auto fetch_time = Time::now();
  fsec ttf = fetch_time - execute_time;
  std::cout << "time to fetch: " << ttf.count() << std::endl;
  
  std::cout << "Fetched " << rows << " row(s)\n";

  // Deallocate the client because mg_finalize has to be called globally.
  client.reset(nullptr);

  mg::Client::Finalize();
    
  auto et = Time::now();  
  
  fsec fs = et - st;
  std::cout << "total time: " << fs.count() << "s" << std::endl;
    
    
 return 0;   
}

and the relevant bit of the python script is probably this:

import mgclient
def fetch_and_map_proc(inputs):
    tag = inputs[0]
    poi_mn = inputs[1]
    query = inputs[2]
    custom_logger = inputs[3]
    # connection = inputs[3]
    verbose = True
    if verbose:
        custom_logger(tag)

    p1.enable()

    connection = mgclient.connect(host="192.168.0.10", port=7687)

    cursor = connection.cursor()
    cursor.execute(query, {"poi_mn": poi_mn})
    res = cursor.fetchall()  # gdb.execute_and_fetch(query_funds)
    cursor.close()
    connection.close()
    res = list(res)[0][0]
    p1.disable()

    s = io.StringIO()
    sortby = SortKey.CUMULATIVE
    ps = pstats.Stats(p1, stream=s).sort_stats(sortby)
    ps.print_stats()
    custom_logger(s.getvalue())

    return tag, res

I haven’t had time to do any other investigations, so I’m still at the same spot, I did manage to update to memgraph 2.5.2 today though

My last reply hasn’t been released by the anti-spam bot yet, but I did some more poking around.

Since it was in the last reply, here is the query again:

MATCH p=(n: MGPerson {match_name: "jonqpublic"}) - [l:CRAB *..1] - (m) - [r:CRAB|FISH *..1] - (s)
with project(p) as f
return f;

I tried profiling the query from a few different places: Memgraph Lab, mgconsole, and Python. The results where all about the same. This was the output from the Python test:

* Produce                     2        2.329103 %    2.754256 ms 
* Produce                     2        2.953431 %    3.492548 ms 
* Aggregate                   2       40.975408 %   48.455036 ms 
* ConstructNamedPath          93605   38.071282 %   45.020792 ms 
* EdgeUniquenessFilter        93605    4.953313 %    5.857488 ms 
* ExpandVariable              93786   10.669720 %   12.617364 ms 
* ExpandVariable              182      0.037080 %    0.043849 ms 
* ScanAllByLabelPropertyValue 2        0.010545 %    0.012470 ms 
* Once                        2        0.000118 %    0.000140 ms 
total time was 118.25394299999999 ms

However, when I actually execute and fetch from Python and mgconsole, it still takes 3-4 seconds to get the results. I tried copying the database to a docker instance on my local machine, but there was no change.

So now I’m guessing that the huge time difference is coming from some part of packing up the results and sending them out.

Hi @Canso, this is strange, and I am not sure why that is happening. If you still did not manage to resolve it and this is a blocker for you, we can jump on a quick call to see what’s happening.