Data import in Memgraph

Hello, I would like some advices on data importing.

I see you used to have mg_import_csv tool for that purposes, but in current installed version and documentation it’s not present.

In documentation, you suppose “Importing Cypher Queries” instead. What does it really mean? Is it something like LOAD CSV FROM "file:///data.csv; written inside a .txt file and then mg_client < file.txt just runs them?

If yes, why and what for are there “–csv-doublequote” and “–csv-escapechar” in manual for mb_client issued by mb_client --help?

1 Like

Hi @muzos07,

We’re currently reworking our CSV import tool to improve overall performance and ease-of-use. We have decided we remove the CSV import support for now, but it’s going to be added back very soon.

As of now, the only way to import data is by executing Cypher queries. For example:

CREATE (node2)-[:edge_type]->(node2);

One option would be to write a simple Python script to generate cypher queries from a CSV and then execute them against Memgraph through the command-line tool, mg_client, or using Memgraph Lab.

Sorry for the inconvenience. I’ll keep you posted on the release timeline as soon as we have more details.

If you would like some help with the Python script, please let me know and I’ll help the best I can.

Cheers!
Karim

Hi @Karim.T,

thank you for your response. I understand your intentions behind reworking CSV tool and I don’t think I’ll need help with Python script for creating these Cypher scripts.

However, I have one more question: Do you think using Cypher queries will be significantly slower then using your reworked CSV loader?

We are talking about loading ± 60 million nodes and 70 million edges.
I’m afraid these numbers will take forever using Cypher queries (I’m guessing that mb_client uses Bolt connections to make these queries).

In case that this wouldn’t be a solution, I have no problem waiting for new CSV import tool :slight_smile:

Best Regards,
Petr

1 Like

Hey Petr,

thanks a lot for the additional info regarding your dataset. ± 60 million nodes and 70 million edges isn’t a big dataset and loading it through Cypher queries shouldn’t be a problem at all.

Cypher queries are the fastest way to import data in general, the bottleneck here will be on the Python side when running thought a loop to generate the queries. Our importer is written in C/C++ and will help a lot on that front. We’re also working on ways to integrate the importer as tightly as possible with Memgraph (we don’t have benchmarks yet)

In the meantime, the Python route is the best option. The load should be < 1h

I hope this helps!

Cheers,
Karim

Hey @muzos07,

As promised, the CSV import tool is back! You can find the documentation here.

Make sure you have the latest version of Memgraph installed

If you have any questions, please let me know :slight_smile:

Cheers,
Karim