Empty Dataset after importing data from csv

Hi,

I am trying to test some query on the loaded dataset but I am always getting an “empty set” as shown below.

Dataset was loaded successfully

My test dataset looks like this :

nodes.csv

relationships.csv

What am I missing here?

Also, I would like to know if there is a good programmatical way to load and query using Python?

PS: I have read the Python example in the docs but I am not sure how do I load the data.

Any point in the right direction would be appreciated. :slight_smile:

1 Like

Hi @red_devil !

Empty Dataset after importing data from CSV

I am not sure which command you are using to start Memgraph, but I am going to assume it’s the source of your problem because everything else seems alright.

A typical command for starting Memgraph would be:

docker run -p 7687:7687 memgraph

Even though Memgraph will be up and running, this command isn’t the right way of starting a database instance that intends to load stored data.

When you use the CSV Import Tool, a snapshot is created from the supplied CSV files in the directory /var/lib/memgraph inside the Docker container. This snapshot needs to be loaded when Memgraph starts next time to populate the database.

The problem is that Docker containers don’t persist data (all changes are lost when the container is stopped) so we need to specify a local volume in which to store the data permanently.

From the official Docker documentation:

  • Volumes are stored in a part of the host filesystem which is managed by Docker ( /var/lib/docker/volumes/ on Linux). Volumes are the best way to persist data in Docker.

With this in mind, the correct command for starting Memgraph would be:

docker run -v mg_lib:/var/lib/memgraph -p 7687:7687 memgraph/memgraph --data-directory /var/lib/memgraph

Also, I would like to know if there is a good programmatical way to load and query using Python?

The simplest way of querying the database programmatically would be using the pymgclient driver for Python. You can find the instructions on how to install it here.

If you, for example, needed to load the following CSV file (nodes.csv) into Memgraph:

0,General,Forum
1,Support,Forum
2,Music,Forum
3,Film,Forum
4,Programming,Forum

the Python code in example.py could look like this:

import csv
import mgclient

# Make a connection to the database
connection = mgclient.connect(host='172.17.0.2', port=7687)

# Create a cursor for query execution
cursor = connection.cursor()

# Load the CSV file and create a node for each row
with open('nodes.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    for row in reader:
        query = """CREATE (n:{label})
                   SET n.title = '{title}'""".format(title=row[1], label=row[2])
        cursor.execute(query)

# A query to return all nodes
query = "MATCH (n) RETURN n"

# Execute the query
cursor.execute(query)

# Fetch one row of query results
row = cursor.fetchone()

# Print the first member in row
print(row[0])

# Make database changes persistent
connection.commit()

Note that this example requires the CSV file to be located in the same directory as example.py.

I hope that this will help you with your work. :smile:

1 Like

Hey @idespot :smiley:

Thank you for your response.

Basically, I am using a docker-compose to load Memgraph as a service into a separate container. The aim of my Project (Thesis) is to run some queries across Neo4j, Memgraph, and RedisGraph on the same dataset and build a query system around it. Here’s my configuration :

docker-compose.yml

version: "3"
services:
 redis:
  image: redislabs/redisgraph
  container_name: redis
  volumes:
   - ./redis/data:/data
  restart: unless-stopped
  ports:
   - "6379:6379"

memgraph:
 image: memgraph:latest
 container_name: memgraph
 volumes:
   - ./memgraph/modules:/usr/lib/memgraph/query_modules
   - ./memgraph/import-data:/usr/lib/memgraph/import-data
 networks:
   - test_network
 ports:
   - "7688:7687"

neo4j:
 image: neo4j:latest
 container_name: neo4j
 volumes:
   - ./neo4j/data:/data
   - ./neo4j/plugins:/plugins
   - ./neo4j/import:/import
   - ./neo4j/logs:/logs
   - ./neo4j/conf:/conf
 ports:
   - "7474:7474"
   - "7687:7687"
 environment:
   - NEO4J_apoc_export_file_enabled=true
   - NEO4J_apoc_import_file_enabled=true
   - NEO4J_apoc_import_file.use_neo4j_config=false
   - NEO4J_dbms_security_procedures_unrestricted=apoc.*,algo.*
   - NEO4J_dbms_memory_heap_initial__size=512m
   - NEO4J_dbms_memory_heap_max__size=2G
   - NEO4J_apoc_uuid_enabled=true
   - NEO4J_dbms_default__listen__address=0.0.0.0
   - NEO4J_dbms_allow__upgrade=true
   - NEO4J_dbms_default__database=neo4j
   - NEO4J_AUTH=neo4j/test
   - NEO4JLABS_PLUGINS=["apoc"]
 networks: ["Neo4j-test"]
 restart: unless-stopped

networks:
Neo4j-test:
test_network:
 driver: bridge

# use docker volume to persist data outside of a container.
volumes:
 Neo4j-test:

I simply run docker-compose up -d to bring all the containers up.

This loads correctly as per the logs,

I have to either use mgconsole or go inside the container to manually run queries.

I cannot use the approach in the long run so that is why I need a programmatic way to access and run my queries.

Regarding the CSV files, I have converted JSON to CSV from my Neo4j implementation and tried to format them according to the documentation.

However, I am not even sure about this but at least the mg_import_csv tool does not complain as earlier.

I am still exploring pymgclient from the documentation pymgclient. I tried to replicate one of the example implementations.

app.py

import mgclient

# Make a connection to the database
connection = mgclient.connect(host='172.19.0.2', port=7687)

# Create a cursor for query execution
cursor = connection.cursor()

# Delete all nodes and relationships for a cleaner start
query = "MATCH (n) DETACH DELETE n"

# Execute the query
cursor.execute(query)

# Create a node with the label FirstNode and its class
query = ("""CREATE (n:Cow {name: 'Cow1'})-[e:TYPE]->
              (m:Class {name: 'ABOX'})
       RETURN n, e, m""")

# Execute the query
cursor.execute(query)

# Fetch one row of query results
row = cursor.fetchone()

# Print the first member in row
print(row)

Output:

(<mgclient.Node(id=33, labels={'Cow'}, properties={'name': 'Cow1'}) at 0x7fba3acecea0>, <mgclient.Relationship(start_id=33, end_id=34, type='TYPE', properties={}) at 0x7fba3ac61b30>, <mgclient.Node(id=34, labels={'Class'}, properties={'name': 'ABOX'}) at 0x7fba3acece70>)

I can see multiple entries for obvious reasons (I was testing) but I am not able to figure out how do I inspect the rest of the ids?

It’s very unclear in which direction am I going at the moment. A piece of advice to a beginner would be appreciated :slight_smile:

PS: Loving the Memgraph’s idea and implementation from the start. Good job guys!

Update!

I ran this in a separate container. (without running previous ones)

Voila! I got something out of my python snippet

import mgclient

# Make a connection to the database
connection = mgclient.connect(host='172.17.0.2', port=7687)

# Create a cursor for query execution
cursor = connection.cursor()

# A query to return all nodes with associated relationships
query = """MATCH(n)-[r]->(m)
        RETURN n, r, m
        """
# Execute the query
cursor.execute(query)

# Fetch one row of query results
row = cursor.fetchall()

# Print the first member in row
print(row[0])

# Make database changes persistent
connection.commit()

Output :

(<mgclient.Node(id=4, labels={'Abox'}, properties={'id': '4', 'properties__admin': '', 'properties__comment': '', 'properties__description': 'A Cow', 'properties__html_info': '', 'properties__identifier': 'http://example.org/abox/Cowebcc054f', 'properties__namespace': '', 'properties__ontology_level': '', 'properties__pl': '', 'properties__sing': '', 'properties__title': 'Cow1', 'properties__unique': '', 'properties__uri': '', 'properties__url': '', 'properties__version': '', 'properties__xsd_type': ''}) at 0x7fbdff2defc0>, <mgclient.Relationship(start_id=4, end_id=2, type='relationship', properties={'end__labels__001': 'Abox', 'end__labels__002': '', 'label': 'relation', 'properties__description': '', 'properties__level': '', 'properties__namespace': '', 'properties__title': '', 'properties__uri': '', 'start__labels': 'Abox'}) at 0x7fbdfee59770>, <mgclient.Node(id=2, labels={'Abox'}, properties={'id': '2', 'properties__admin': '', 'properties__comment': '', 'properties__description': 'A Sensor Collar', 'properties__html_info': '', 'properties__identifier': 'http://example.org/abox/SensorCollar8153e23d', 'properties__namespace': '', 'properties__ontology_level': '', 'properties__pl': '', 'properties__sing': '', 'properties__title': 'SensorCollar1', 'properties__unique': '', 'properties__uri': '', 'properties__url': '', 'properties__version': '', 'properties__xsd_type': ''}) at 0x7fbdff2def90>)

From what I understand -

  1. I was querying the wrong instance and data was not getting persisted.
  2. Port mapping is an issue that I initially thought shouldn’t be.

My questions -

  1. What is the output format? I am assuming it is a graph data structure. (hard to understand)
  2. How do I run this in my docker-compose?
  3. How do I persist data after querying?

Other Results:

Thank you for your assistance.

1 Like

Hi @red_devil!

I hope you are doing well with your thesis. :smile:

What is the output format? I am assuming it is a graph data structure. (hard to understand)

The detailed documentation for pymgclient can be found here.
There are three main types of objects that are returned:

  • mgclient.Node

  • mgclient.Relationship

  • mgclient.Path

Here is a small code snippet that demonstrates all of them:

import mgclient

# Make a connection to the database
connection = mgclient.connect(host='172.17.0.2', port=7687)

# Create a cursor for query execution
cursor = connection.cursor()

# Create two nodes and a relationship
query = """CREATE (:Movie {title: 'Avatar', year: 2009})-
           [:DIRECTED_BY {announced: '2006'}]->(:Person {name: 'James Cameron'})"""
cursor.execute(query)

# Execute a query to return every node
query = "MATCH (n) RETURN n"
cursor.execute(query)
row = cursor.fetchone()

print(row[0].id)
print(row[0].labels)
print(row[0].properties)
print(row[0].properties['title'])

# Execute a query to return every relationship
query = "MATCH ()-[r]-() RETURN r"
cursor.execute(query)
row = cursor.fetchone()

print(row[0].id)
print(row[0].start_id)
print(row[0].end_id)
print(row[0].type)
print(row[0].properties)
print(row[0].properties['announced'])

# Execute a query to return every path
query = "MATCH p=((n)-[r]-(m)) RETURN p"
cursor.execute(query)
row = cursor.fetchone()

print(row[0].relationships)
print(row[0].nodes)

# Make database changes persistent
connection.commit()

How do I run this in my docker-compose?

Ideally, you would create a Docker Container that would run a Python script to query all of your databases. You can take a look at this post where I have utilized Memgraph with docker-compose to query the database programmatically.

The most important step is to find submit the right IP address of the container where Memgraph is running. This can be done with environment variables. For example:

version: "3"
  services:
    memgraph:
      image: "memgraph"
      ports:
        - "7687:7687"
    sng_demo:
      build: .
      volumes:
        - .:/app
      ports:
        - "5000:5000"
      environment:
        MG_HOST: memgraph
        MG_PORT: 7687
      depends_on:
        - memgraph

The variables MG_HOST and MG_PORT contain the IP address of the container where Memgraph will be started and the port we have specified. They can be retrieved in a Python script with:

#The second argument is the default value if the variable isn't found
MG_HOST = os.getenv('MG_HOST', '127.0.0.1') 
MG_PORT = int(os.getenv('MG_PORT', '7687'))

How do I persist data after querying?

You can add the directories where database snapshots and logs are stored just like you have specified volumes for importing data and using query modules in your compose file. The database will be repopulated with these files when you start it again.

For example:

version: '3'
services:
  memgraph:
    image: "memgraph"
    ports:
    - "7687:7687"
    volumes:
        - ./mg_lib:/var/lib/memgraph
        - ./mg_log:/var/log/memgraph
        - ./mg_etc:/etc/memgraph

From the querying perspective, you need to send a commit message (connection.commit()) so the database knows the transaction is finished.

1 Like

Hi @idespot

Thanks for your help. I hope you’re doing well. The thesis is going quite well, with a lot of pressure though.

I researched and tried a lot of ways to run memgraph in docker-compose.yaml but it seems like it does not work. Talking about the main use of this docker-compose it to instantiate 3 services - Neo4j, Memgraph, and RedisGraph simultaneously in their respective containers. Neo4j works fine but Memgraph only works with docker run -v mg_lib:/var/lib/memgraph -p 7687:7687 memgraph/memgraph --data-directory /var/lib/memgraph command. I have tried write docker-compose config like this :

docker-compose.yaml :

version: "3"
services:
  redis:
    image: redislabs/redisgraph
    container_name: redis
    volumes:
      - ./redis/data:/tmp/redis
    restart: unless-stopped
    ports:
      - "6379:6379"

  memgraph:
    image: memgraph
    container_name: memgraph
    volumes:
      - ./memgraph/import-data:/usr/lib/memgraph/import-data
      - ./memgraph/mg_lib:/var/lib/memgraph
      - ./memgraph/mg_log:/var/log/memgraph
      - ./memgraph/mg_etc:/etc/memgraph
    ports:
      - "7687:7687"
    environment:
      MG_HOST: memgraph
      MG_PORT: 7687

  neo4j:
    image: neo4j:latest
    container_name: neo4j
    volumes:
      - ./neo4j/data:/data
      - ./neo4j/plugins:/plugins
      - ./neo4j/import:/import
      - ./neo4j/logs:/logs
      - ./neo4j/conf:/conf
    ports:
      - "7474:7474"
      - "7688:7687"
    environment:
      - NEO4J_apoc_export_file_enabled=true
      - NEO4J_apoc_import_file_enabled=true
      - NEO4J_apoc_import_file.use_neo4j_config=false
      - NEO4J_dbms_security_procedures_unrestricted=apoc.*,algo.*
      - NEO4J_dbms_memory_heap_initial__size=512m
      - NEO4J_dbms_memory_heap_max__size=2G
      - NEO4J_apoc_uuid_enabled=true
      - NEO4J_dbms_default__listen__address=0.0.0.0
      - NEO4J_dbms_allow__upgrade=true
      - NEO4J_dbms_default__database=neo4j
      - NEO4J_AUTH=neo4j/test
      - NEO4JLABS_PLUGINS=["apoc"]
    networks: [ "Neo4j-test" ]
    restart: unless-stopped

networks:
  Neo4j-test:
    driver: bridge

# use docker volume to persist data outside of a container.
volumes:
  Neo4j-test:

Well, I can explain to you my thesis in short - I am creating an API on the system which translates plain/raw Cypher from the client and performs queries over datasets loaded in data stores (Memgraph, Neo4j, and RedisGraph), then returns a graph data structure output.

System architecture :

Memgraph implementation looks something like this at the moment :

import utilities.memgraphConfig as mcfg
from persistence.connection import Connection
import mgclient


class MemgraphImpl(Connection):

    def __init__(self):
        self.connect(host=mcfg.host, port=mcfg.port)

    def connect(self, host, port):
        self.connection = mgclient.connect(host=host, port=port)
        print("Memgraph connection successful")

    def execute(self, query):
        cursor = self.connection.cursor()
        cursor.execute(query)
        row = cursor.fetchall()
        self.connection.commit()
        return row


    def disconnect(self):
        self.connection_class = mgclient.Connection.close(self.connection)
        print("Memgraph connection terminated")

There is an orchestrator (custom driver) which runs on the system and takes care of datastore config and inputs.

It is an awesome library! :smiley: really helped me speed up things.

Your input would be much appreciated here! :slight_smile:

Hi @red_devil

I am glad to hear that you are doing well with your thesis. :grinning_face_with_smiling_eyes:

To answer your question about running Memgraph with docker-compose. The problem you are experiencing most likely occurs because the Memgraph configuration file /etc/memgraph/memgraph.conf is missing from your container. This happens because you mount the empty ./memgraph/mg_etc/ volume from your host machine and it overrides the directory /etc/memgraph in your container thus deleting the configuration file. To mitigate this problem first you need to copy the Memgraph configuration file from your docker container to your host machine.
For example:

  1. docker run memgraph
  2. Execute the command docker ps to find your CONTAINER ID.
  3. Execute the command docker cp <CONTAINER ID>:/etc/memgraph ./memgraph/mg_etc/

Move the memgraph.conf file and the ssl directly to ./memgraph/mg_etc/. This way, next time Memgraph will be started with a valid configuration file.

One more note, you may get an error similar to the following:

memgraph exited with code 139

This is because docker-compose messes up the ownership of directories and files that were added through volumes. You can simply bypass this problem by adding the user option to your docker-compose.yml file:

version: "3"
services:
  memgraph:
    image: memgraph
    user: root
    container_name: memgraph
    volumes:
      - ./memgraph/import-data:/usr/lib/memgraph/import-data
      - ./memgraph/mg_lib:/var/lib/memgraph
      - ./memgraph/mg_log:/var/log/memgraph
      - ./memgraph/mg_etc:/etc/memgraph
    ports:
      - "7687:7687"
    environment:
      MG_HOST: memgraph
      MG_PORT: 7687

I hope this will be of help to you. :slightly_smiling_face:

1 Like

Hi @idespot

Thanks for your response. I tried the above solution and the respective files are now successfully mapped with the container. However, I cannot run mg_import_csv command inside the container. From the documentation stated here

It is also important to note that importing CSV data using the mg_import_csv command should be a one-time operation before running Memgraph. In other words, this tool should not be used to import data into an already running Memgraph instance.

Without creating the container for docker-compose.yml , I cannot access dataset and view it in Memgraph Lab.

After running : docker run -v mg_lib:/var/lib/memgraph -p 7687:7687 memgraph/memgraph --data-directory /var/lib/memgraph command externally. It creates 3 folders namely - mg_etc, mg_import, mg_lib in /var/lib/docker/volumes/ on host system.

I would like to imitate this behavior in docker-compose

Log

root@d97f05a0c726:/usr/lib/memgraph# mg_import_csv --nodes import-data/nodes.csv --relationships import-data/relationships.csv [2021-05-10 21:23:50.475] [critical] Assertion failed with message: 'Couldn't acquire lock on the storage directory "/var/lib/memgraph"! Another Memgraph process is currently running with the same storage directory, please stop it first before starting this process!' terminate called without an active exception Aborted (core dumped) it should load when the container is instantiated

The default way specified in the documentation says you have to create a separate container and import CSVs, but in my case, it should load when the container is instantiated. There is no way to do that. So it again brings back to my first question in this post. :slightly_smiling_face:

PS: docker-compose is making things hard to understand. Sorry.

Hi @red_devil!

I have created a small GitHub repository to demonstrate how you can import data into Memgraph using Docker Compose.

The main idea is to have an entrypoint script docker-entrypoint.sh that is executed when the container starts. The script will call the CSV import tool mg_import_csv which in turn will create a database snapshot from the specified CSV files.
The only other command in the script is ./memgraph which will start a database instance with the previously created snapshot.

The CSV files are inside the directory /memgraph/import-data which is mounted as a volume to the container directory /usr/lib/memgraph/import-data.

I hope this is of help to you. :grinning_face_with_smiling_eyes:

1 Like

Thank you @idespot I will try this with my configuration. :slightly_smiling_face:

[Update]

After some changes, this repo worked partially, but there are still errors

  • Change shell script permission: chmod +x docker-entrypoint.sh

  • Edited docker-entrypoint.sh according with :

    #!/bin/bash
    mg_import_csv memgraph \
     --nodes ./import-data/nodes.csv 
     --relationships ./import-data/relationships.csv
    ./memgraph
    
  • Replaced nodes.csv and relationships.csv with the initial ones in the repo.

  • user : 1000:1000 in docker-compose.yml

Log :

$ docker-compose up
                                                                                                          
Creating memgraph ... done
Attaching to memgraph
memgraph    | [2021-05-24 15:02:05.561] [warning] Since Memgraph was not supposed to recover on startup and durability is enabled, your current durability files will likely be overridden. To prevent important data loss, Memgraph has stored those files into a .backup directory inside the storage directory.
memgraph    | [2021-05-24 15:02:05.561] [info] Loading ./import-data/nodes.csv
memgraph    | [2021-05-24 15:02:05.575] [info] Loaded all data in 0.014s
memgraph    | [2021-05-24 15:02:05.575] [info] Starting snapshot creation to "/var/lib/memgraph/snapshots/20210524150205575949_timestamp_202"
memgraph    | [2021-05-24 15:02:05.580] [info] Snapshot creation successful!
memgraph    | ./entrypoint/docker-entrypoint.sh: line 4: --relationships: command not found
memgraph    | terminate called after throwing an instance of 'spdlog::spdlog_ex'
memgraph    |   what():  Failed opening file /var/log/memgraph/memgraph_2021-05-24.log for writing: Permission denied
memgraph    | ./entrypoint/docker-entrypoint.sh: line 5:    12 Aborted                 (core dumped) ./memgraph
memgraph exited with code 134

Not sure why command throws error ./entrypoint/docker-entrypoint.sh: line 4: --relationships: command not found means?

This one as well - terminate called after throwing an instance of 'spdlog::spdlog_ex' memgraph | what(): Failed opening file /var/log/memgraph/memgraph_2021-05-24.log for writing: Permission denied

Thanks!

Hi @red_devil

I have updated the GitHub repository to address the first error but sadly wasn’t able to recreate the second one.

Have you tried running the command with sudo?

sudo docker-compose up

If that doesn’t work, make sure that your directories /memgraph/mg_log and /memgraph/mg_lib are empty before starting Memgraph.

Hope this helps. :slightly_smiling_face:

1 Like

Hey @idespot :smiley:

Thanks for your help. I have successfully integrated this config into mine and there are no more errors as of yet. However, I had to make docker-entrypoint.sh executable with chmod +x docker-entrypoint.sh before running the compose command.

But, I am wondering will make memgraph load every time I start the container, or will it persist changes from the current session? (I am already using connection.commit() after every query inside my python code)

UPDATE: Yes it reloads and refreshes the dataset again.

UPDATE 2: So after digging into docker-compose a bit more I have found a way to avoid running the docker-entrypoint.sh again by adding a condition like this :

docker-entrypoint.sh :

#!/bin/bash
CONTAINER_ALREADY_STARTED="CONTAINER_ALREADY_STARTED_PLACEHOLDER"
if [ ! -e $CONTAINER_ALREADY_STARTED ]; then
    touch $CONTAINER_ALREADY_STARTED
    echo "-- First container startup --"
        mg_import_csv memgraph \
        --nodes ./import-data/nodes.csv \
        --relationships ./import-data/relationships.csv
    ./memgraph
else
    echo "-- Not first container startup --"
    ./memgraph
fi

source : Stackoverflow

Could you verify is this the right way to do this?

Greetings :smiley:

Hi @red_devil!

Yes, if the idea is to only populate the database once, then you are right to use the posted script. Each time you start the database after the first time, the data will be loaded from the volumes /memgraph/mg_lib and /memgraph/mg_log.

1 Like