Segfault in the memgraph daemon

Hello,

We are facing another critical issue with a seg fault. (It’s seems the only one we have)
We have stress tested memgraph and without this trouble bellow it will be the most solid graph database :slight_smile:

We use :
Memgraph 1.4-ce on Ubuntu 20.04
Python 3.6 support with networkx

This segfault crashes the memgraph daemon.
This issue has been reproduced

  • with the bolt php drivers / protocol 4.1 or 4.0 → frequently (approx 1/10 queries)
  • with pymgclient → rarely (approx 1/1000 queries)

Here the kern.log entry:
May 27 18:08:44 app1-staging kernel: [1378825.233608] Bolt worker 2[10265]: segfault at a0 ip 00007f617a1d3f48 sp 00007f60e6bb9e90 error 4 in libpython3.6m.so.1.0[7f6179fb2000+3d7000]

Here the sample code :

import mgclient

conn = mgclient.connect(host='127.0.0.1', port=7687)
cursor = conn.cursor()
cursor.execute(""" MATCH (c:C {name:'john'} ) CALL myprodule.foo(c, 2) YIELD * RETURN ids""")
row = cursor.fetchone()
print(row[0])

Hope this will be usefull :: we are not experts in snakes ;(

jeremie

1 Like

This seems much more critical. We’ll check tomorrow morning, it seems like Memgraph Python code has some memory related issue… I think it’s pretty clear how to recreate. We’ll let you now if we fail to recreate.

Hi @buda
Thank you for your fix yesterday night about python 3.8. Commenting lines 624-642 in mgp.py fixed the issue and memgraph starts now without any python error. Calling our custom py lib is now possible with python 3.8 on ubuntu 20.04

I come back to you about the sigfault 11 error.

With memgraph 1.5 - python3.8 - networkX - Ubuntu 20.04
The issue occurs again.
Here the kern.log entry :

May 28 07:15:14 staging kernel: [46375.480760] Bolt worker 5[75782]: segfault at b8 ip 00007f51590205d6 sp 00007f5136db8c30 error 4 in libpython3.8.so.1.0[7f5158e21000+25b000]
May 28 07:15:14 staging kernel: [46375.480766] Code: ff ff 66 90 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f e9 0d bf fe ff 0f 1f 44 00 00 48 8b 05 a9 b6 27 00 4c 8b a0 58 05 00 00 <41> 8b 84 24 b8 00 00 00 83 f8 31 7f 15 83 c0 01 41 89 84 24 b8 00

Do you need any info from us about this ?
I can setup a temporary VM with ssh access if you want ?

Thank you
Jeremie

1 Like

Hi @jeremos,
I’m from the Core team, and I’m trying to figure out the issue here.

Could you provide the dataset you’re using and the exact module?

In any case, thank you a lot for reporting those problems!

Antonio

1 Like

hello @toni
Thank you for coming up on this issue.
We work on a test environment for you.
We tried first to configure a VM with 1 core / 2 threads but we didn’t succeed to reproduce the error on this VM.
So we are configuring a test environment on a dedicated server with the same configuration as our staging server.
I will come back soon to you with ssh ids.
Jeremie

1 Like

So, we managed to locate the bug, and it will be fixed with the next release which should be soon!

If you want to know more:
Our embedded Python interpreter has a single point that can be called from multiple threads. To achieve thread-safety we use the GIL.
The GIL was taken a little bit too late making a single line of code allowed to be called from multiple threads while not being protected.
The fix is small but the results solidified the stability of the query modules.

2 Likes