Doing Science with Python
Contents
Doing Science with Python#
Python Working Group: Monday 18 April 2022#
In this seminar, I will demonstrate our scientific workflow using Python related tools to create a small project for solving the Schrödinger equation. Prior to the seminar, please do the following:
About CoCalc
CoCalc is an online platform that allows you to run code in the cloud on fully provisioned virtual machines. These machines run Linux, and a huge array of software, including the full Anaconda python stack.
The unique advantage of CoCalc is collaboration: it uses a custom interface to [Jupyter] notebooks that allows users to collaborate in real-time, executing the same code, and sharing the same output. To my knowledge, no other system allows such interactivity.
Similar collaborative interfaces exist for LaTeX, R, Octave, Julia, Sage, and others.
Brand new is a collaborative Whiteboard which allows you to run code, and then annotate the output. This was a feature CoCalc developed for us to help teach, and will soon have a PDF annotation feature so that we can use it for collaborative analysis and commenting on lecture notes, research papers, etc.
CoCalc is also a complete computing platform: you can install whatever software you need, and you can use SSH to connect to your project with a terminal, or edit files remotely with Tramp or similar tools. It also has the most amazing backup system via a Time Travel feature that allows you to roll back your changes by the minute, hour, day, week, month etc. This alone makes the platform worthwhile and has saved me several times from data loss.
Creating an account on CoCalc is free, but the free projects do not have internet access, and will run slowly. Their business model is to charge for computing resources, and this is quite modest – ~$40/year for a simple license that you can use to run one project at a time. (You are free to use this on as many projects as you have, but only 1 can run simultaneously under a simple license – you may have to shut others down.)
The software behind CoCalc is also open-source, and you can run a version on your own hardware with Docker if you need to, so there is no “lock-in”. The team is also small enough that they respond to requests – often within hours. This is extremely refreshing compared with large companies where you submit requests to a forum to have them ignored for years…
There are a couple of downsides to using CoCalc. Running everything remotely means that an unstable internet connection can make things difficult. Also, unless you purchase sufficient computing resources, your laptop/desktop is likely going to perform better. For this reason, I usually develop alone on my computer, then use my VCS to push everything to the cloud, and ultimately to CoCalc where I share my results with collaborators/students (and take advantage of the backup capabilities).
Another “problem” is due to CoCalc’s huge feature set. A colleague likened it to getting behind the controls of a fighter jet: there so many options that it can be confusing. For this reason, other tools like Overleaf might be preferable for collaborating on LaTeX documents with less tech-savvy colleagues. But CoCalc is so much more powerful, that it is worth considering. (The advantages immediately become clear when working on a complex multi-part document.)
Despite these downsides, I highly recommend CoCalc. It provides an extremely fast way to explore ideas, and ultimately has become an integral part of our workflow. The collaborative opportunities are unprecedented, and there are none of the downsides of other collaborative tools which often lock you in to using their products.
Create an account on CoCalc and sign in. For the seminar, we will share a provisioned product that you can copy for your own use later. Note: the free account will allow you to work with your project, running code (slowly), etc. but will lack internet access (people were abusing this). You can have full access for a fairly modest cost (~$40/year).
Create an account on GitLab, GitHub, or similar. (I have both and often mirror repositories from one to the other to take advantage of different testing services). This is optional, but highly recommended.
If you plan to follow along on your own computer, please make sure you have the following installed:
A working python system with Miniconda or Anaconda (optionally with Mamba which is a bit faster).
Working development tools (i.e.
make
) and a linux-like shell. (Window’s users will need to install something like the Windows Subsystem for Linux – I do not have experience with this, so others might need to help.)(Optional) A working version control system like Mercurial or Git.
Enough disk space to create some Conda environments.
To prepare, you might do the following, which will download all of the needed software (which might take some time, and hence slow you down during the seminar).
# Pure conda: Took about 1m40s on my Mac conda create -n pwg_2022_base -c conda-forge "python>=3.8" anaconda-project hg-git hg-evolve
# Using mamba: conda create -n pwg_2022_base -c conda-forge mamba conda activate pwg_2022_base mamba install -c conda-forge anaconda-project hg-git hg-evolve
Once you have this base environment, we install the required version of
cookiecutter
. Then you can clone our existing project and runmake init
which will download everything you need.conda activate pwg_2022_base pip install --upgrade --user git+https://github.com/cookiecutter/cookiecutter.git@2.0.2#egg=cookiecutter==2.0.2 hg clone https://gitlab.com/mforbes/wsu-python-working-group-demo-2022.git # OR git clone https://gitlab.com/mforbes/wsu-python-working-group-demo-2022.git cd wsu-python-working-group-demo-2022 make init