4.05.2021 - by Max Böttcher & Lydia Hertel
With the growing amount and complexity of data in business processes, the importance of getting an understanding of the data increases. In the blog post „Data Analysis Away from Excel – Getting Started“, the key points of data analysis with Python and Panda have already been presented. This article aims to go a step further and explores a setup for data science applications using Jupyter Notebook running on a remote server.
Jupyter Notebook is a product of project Jupyter, that “exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.” 
The multilingual, interactive computing environment Jupyter Notebook promises to be a perfect playground for data science and AI applications. Its comprehensive functionality allows users to combine code, annotations, multimedia, and visualizations into one interactive document.
Since Jupyter Notebook runs through a web browser, the Notebook itself can be easily hosted on a remote server, which is a strong plus for use cases with special computational needs, such as more CPU cores, RAM or GPUs. Typical examples are complex data processing, big data management or learning of extensive AI models like neural networks.
Therefore, it is worth taking a critical look at the possibilities and limitations and understanding how to set up Jupyter Notebook with OpenStack.
➔ For exploration and playing around with data, Jupyter Notebook is a great tool with broad functionalities. The ability to run it on remote servers makes it an easy way to go when you have special compute requirements and only need some light scripting.
➔ For problems that require complex code development for production, Jupyter Notebook may not be the right tool because of the difficulty in enabling good code versioning, structuring code reasonably, packaging code into functions, and developing tests for them. For this reason, also be cautious about using Jupyter Notebook when collaborating in cross-functional or larger teams.
This part of the blog post shows a minimal example on how to setup a Jupyter Notebook server on our infrastructure and how to reach it from the outside.
ssh ubuntu@<floating-ip> $ sudo apt update $ sudo apt install -y python3-pip $ pip3 install --upgrade pip (recommended, but optional) setup an environment via venv/conda $ pip3 install notebook $ python3 -m notebook --generate-config
Open the file ~/.jupyter/jupyter_notebook_config.py in your favorite text editor, change the following parameters, remove the line comments in the edited lines and save the configuration:
#c.NotebookApp.ip = 'localhost' –> change to c.NotebookApp.ip = '*' #c.NotebookApp.open_browser = True –> change to c.NotebookApp.open_browser = False
cd to dir where notebook should be executed (i.e. ~/user.directory/notebook-test)
$ python3 -m notebook
This command runs the notebook server in the current directory and creates a token for authentication that is used in the request.
Access the Jupyter on remote machine via browser:
Create a notebook and start creating markdown or code cells
Please note that the Notebook just uses the generated token for authentication. The traffic to the Notebook is NOT encrypted, therefore additional security mechanisms should be implemented. For example, you can setup and access the Virtual Machine and the notebook through a Wireguard VPN. Get in contact with us to get further information on how to securely operate your infrastructure: firstname.lastname@example.org
This article described the pros and cons of running code inside a Jupyter Notebook. Additionally, we described on how to setup a rudimentary Notebook server via the Cloud&Heat infrastructure.
As mentioned before, the Jupyter Notebook is useful in a range of applications. Hosting on a remote server allows different groups of users to use different features of the platform. Some companies are building entire business models around it, like https://www.kaggle.com/ or Google Colaboratory (https://colab.research.google.com). IBM even allows access to their quantum computer via Jupyter Notebooks (IBM Quantum Lab).
For provisioning a data science playground with appropriate dependencies, project Jupyter also provides a list of pre-configured docker images for different use cases. Find more information at https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html
The docker-images can also be deployed on the Cloud&Heat managed kubernetes service to use features like monitoring, ssl-encryption, load balancing and more. For more insights feel free to contact us: email@example.com