README.md 30.1 KB
Newer Older
Johannes Kiesel's avatar
Johannes Kiesel committed
1
# Webis FAQ
Shahbaz Syed's avatar
Shahbaz Syed committed
2

Johannes Kiesel's avatar
Johannes Kiesel committed
3
If you are new to this document, please read "[How to use this FAQ?](#how-to-use-this-faq)"
Johannes Kiesel's avatar
Johannes Kiesel committed
4

Johannes Kiesel's avatar
Johannes Kiesel committed
5
[[_TOC_]]
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
6
7

## How to ask for help?
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
8
- Make sure you know [how to use this FAQ](#how-to-use-this-faq)
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
9
10
11
- See the section on [communication](https://webis.de/for-students.html#onboarding)
- In doubt, mail to our `webisstud@listserv.uni-weimar.de` mailing list

Johannes Kiesel's avatar
update    
Johannes Kiesel committed
12

Johannes Kiesel's avatar
Johannes Kiesel committed
13
## How to do ...?
Johannes Kiesel's avatar
Johannes Kiesel committed
14
Category of questions on tasks one wants to accomplish. Not questions on [fixing things](how-to-fix-) and not on [using specific tools](#how-to-use-).
Johannes Kiesel's avatar
Johannes Kiesel committed
15

Johannes Kiesel's avatar
Johannes Kiesel committed
16
17
### How to do a demo/service?
Learn [how to use Docker](#how-to-use-docker) and see our notes on [web services setup](https://webis.de/facilities.html?q=web+services+setup).
Johannes Kiesel's avatar
Johannes Kiesel committed
18

19
20
Also learn about [permissions](#how-to-do-a-demoservice-permission-setup).

Johannes Kiesel's avatar
Johannes Kiesel committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#### How to do a demo/service logger?
We are establishing [JSON Lines](https://jsonlines.org/) as the format for our logs. For interoperability of our software, use these field names as appropriate (add your own names if something you log is not covered here):
```ts
{
  "timestamp": string, // ISO 8601 Date, e.g. "2020-09-24T06:29:42Z"
  "user": string,      // An identifier for the user (possibly the IP Address) that triggered the log event
  "url": string,       // URL of the request that triggered the log event
  "message": unknown,  // The Protobuf request message of the request that triggered the log event (for gRPC services)
  "query": string      // The plain text search query of the request that triggered the log event
}
```
Note that you have to write the JSON above as a single line to your log.

Web services should log both `timestamp` and `user`, at least one of `url`, `message`, or `query`, and further service-specific fields.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
35

Johannes Kiesel's avatar
Johannes Kiesel committed
36
[gRPC](https://grpc.io/docs/what-is-grpc/introduction/) services should log both the `url` and `message` fields. The `message` field should be set to the JSON representation of the [Protobuf](https://developers.google.com/protocol-buffers/docs/proto3) request message.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
37

Johannes Kiesel's avatar
Johannes Kiesel committed
38
39
40
- TODO: How to code logging to Elasticsearch
- TODO: How to code logging to CephFS (and then send it to Elasticsearch)

41
42
43
44
45
46
#### How to do a demo/service permission setup?
Pick or create an authentication group `auth/auth-services/<name>` as subgroup of [auth-services](https://git.webis.de/auth/auth-services): students in that group will be able to deploy the demo/service
- If you created an new group, create the Kubernetes namespace:
  - Add it to the `kubernetes.podpriority.webisservices.extra_namespaces` (as `services-<name>`) and `kubernetes.group_namespaces` (named `services-<name>` with group `auth/auth-services/<name>`) in the [controller.sls](https://git.webis.de/code-generic/code-saltstack/-/blob/master/src/srv/salt/pillars/kubernetes/betaweb/controller.sls)
  - Run `salt "betaweb001.medien.uni-weimar.de" state.apply kubernetes.controller`
- Deploy your demo/service to the Kubernetes namespace `services-<name>` (set `metadata.namespace` in the `.yaml` to `services-<name>` for all entries (deployments, services, and so on))
47
- If you add someone to `auth/auth-services/<name>`, they might need to reset their Kubernetes token by removing the `id-token`-line from your `~/.kube/config` for the changes to take effect
48

Johannes Kiesel's avatar
Johannes Kiesel committed
49
50
### How to do a presentation?
This depends a lot on the kind of presentation.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
51

Johannes Kiesel's avatar
Johannes Kiesel committed
52
Especially as a project student but sometimes also as a thesis student and HiWi you have to or want to [present your week's work](#how-to-do-a-presentation-for-my-weeks-work). Sometimes you also have to [present a scientific pulication of someone else](#how-to-do-a-presentation-for-a-scientific-publication-of-someone-else).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
53

Johannes Kiesel's avatar
Johannes Kiesel committed
54
55
56
57
58
59
In some occasions you may also [present your own scientific material](#how-to-do-a-presentation-for-a-scientific-publication-of-me).

#### How to do a presentation for a scientific publication of me?
See our [oral presentations unit](https://webis.de/lecturenotes.html#unit-en-oral-presentations). You might also be interested in [how to promote your publication](#how-to-do-promotion-for-my-publication).

#### How to do a presentation for a scientific publication of someone else?
Johannes Kiesel's avatar
Johannes Kiesel committed
60
The presentation should give answers to these questions:
Johannes Kiesel's avatar
Johannes Kiesel committed
61
62
63
- What is the problem?
- Why should I care? (e.g., why is this relevant to our current project?) and
- What are the solutions/results?
Johannes Kiesel's avatar
Johannes Kiesel committed
64
65
66
67
68
69
Every content you put on the slides should help to answer these questions. It is usually a good idea to order your presentation like the questions above.

Furthermore, here are some hints:
- Clearly state the name, authors, publication year, and venue of the publication (best already on the title slide)
- Do not put content on the slides that you do not understand! If you feel the content you do not understand is important: ask your supervisor whether they can help you to understand it (tell them what you do understand and what you feel is missing)
- You can use figures and tables from the original publication
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
70

Johannes Kiesel's avatar
Johannes Kiesel committed
71
72
73
74
75
76
77
78
79
#### How to do a presentation for my week's work?
Slides
- A 5-minute presentation is usually enough for a week; sometimes a single slide is sufficient
- You can use [this template](weekly-presentation-template/weekly-presentation-template.pdf): just replace the text in "()".
- Provide context at the start: how is your work related to the bigger goal of the project/thesis/student assistant task?
- Use bullet points for what you accomplished and problems you solved
- Use bullet points for what you could do next; think of what is needed and what is possible
- When you think you are done, look at your slides as if you were a colleague of yours: Would you understand everything? Do you have more questions?
- If you need to explain complicated things, have a look at our answer for [scientific publications](how-to-do-a-presentation-for-a-scientific-publication-of-me)
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
80

Johannes Kiesel's avatar
Johannes Kiesel committed
81
82
Presentation
- Mention how you checked (or double-checked) your accomplishments
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
83

Johannes Kiesel's avatar
Johannes Kiesel committed
84
85
### How to do a scientific paper?
See our [scientific writing unit](https://webis.de/lecturenotes.html#unit-en-scientific-writing).
Lukas Gienapp's avatar
Lukas Gienapp committed
86

Johannes Kiesel's avatar
Johannes Kiesel committed
87
88
### How to do a thesis?
Be sure to check the generic [how to do work](#how-to-do-work) first.
Johannes Kiesel's avatar
Johannes Kiesel committed
89

Johannes Kiesel's avatar
Johannes Kiesel committed
90
Then check our [thesis notes](https://webis.de/facilities.html?q=thesis) for advice on writing.
Martin Potthast's avatar
Martin Potthast committed
91

Johannes Kiesel's avatar
Johannes Kiesel committed
92
93
94
95
96
97
### How to do an answer for this FAQ?
Placing the answer:
- Questions should be sorted alphabetically: I know it sounds nice to order them by topic, but we can not maintain such an ordering for long
- Instead, the FAQ should guide people to other questions that are relevant: use links!
- Moreover, the FAQ is structured hierarchically, and you need to place your new question into this hierarchy to achieve the best effect.
- See the top level questions to check whether your question belongs there.
98

Johannes Kiesel's avatar
Johannes Kiesel committed
99
100
101
102
Writing the answer:
- Be short. Give an overview rather than going into details.
- Use links. Everything you write is in danger of being outdated soon.
- Consider to split the question. Ask yourself: Could there be people that are just interested in one part of your answer? If so, then split the question but let the answers link to each other.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
103

Jan Heinrich Reimer's avatar
Jan Heinrich Reimer committed
104
### How to do data annotation?
105
106
107
108
109
110
Depending on your task, you can use:
- [Doccano](https://doccano.webis.de/) for text annotation. (Here's [how to use it](https://git.webis.de/code-generic/code-admin-knowledge-base/-/blob/master/services/doccano/README.md).)
  Doccano supports annotation for:
  - Text classification (e.g., relevance judgements, [sentiment analysis](https://doccano.webis.de/demo/sentiment-analysis))
  - Sequence labelling (e.g., [named entity recognition](https://doccano.webis.de/demo/named-entity-recognition))
  - Sequence to sequence (e.g., [translation](https://doccano.webis.de/demo/translation))
Jan Heinrich Reimer's avatar
Jan Heinrich Reimer committed
111

Johannes Kiesel's avatar
Johannes Kiesel committed
112
113
### How to do filenames?
Use **only** English lowercase alphabet, numbers and hyphens in the filename. 
Shahbaz Syed's avatar
Shahbaz Syed committed
114

Johannes Kiesel's avatar
Johannes Kiesel committed
115
Among others, this means:
Shahbaz Syed's avatar
Shahbaz Syed committed
116
- For author names with umlaut, decompose them to vowel+e (ä -> ae, ö -> oe). 
Johannes Kiesel's avatar
Johannes Kiesel committed
117
118
- For author names with accents (e.g., à, ç) use the corresponding English letter (a, c). 
- For special characters/ Greek alphabet in the paper's title (a\*, χ2) make them explicit (astar, chi-square). 
Shahbaz Syed's avatar
Shahbaz Syed committed
119

Johannes Kiesel's avatar
Johannes Kiesel committed
120
Publications are named as `<last-name-first-author><two-digits-year>-<title>.[pdf, ...]`
Shahbaz Syed's avatar
Shahbaz Syed committed
121

Johannes Kiesel's avatar
Johannes Kiesel committed
122
For example: daume06-bayesian-query-focused-summarization.pdf
Shahbaz Syed's avatar
Shahbaz Syed committed
123
124
125
- Title: Bayesian Query-Focused Summarization
- Authors: Hal Daumé III and Daniel Marcu
- Year: 2006
Shahbaz Syed's avatar
Shahbaz Syed committed
126

Shahbaz Syed's avatar
Shahbaz Syed committed
127
128
When in doubt, ask someone before committing the file.

Johannes Kiesel's avatar
Johannes Kiesel committed
129
130
131
132
### How to do first steps at Webis?
- Make sure you are on our `webisstud@listserv.uni-weimar.de` mailing list (if not: ask your supervisor; maybe you wonder then "[Why do I get all these mails?](#why-do-i-get-all-these-mails)")
- Make sure you have an account in our [GitLab](https://git.webis.de) (if not: ask your supervisor)
- Make sure you know [how to ask for help](#how-to-ask-for-help), [how to do work](#how-to-do-work), and [how to do meeting preparations](#how-to-do-meeting-preparations)
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
133

Johannes Kiesel's avatar
Johannes Kiesel committed
134
135
### How to do literature research?
See our [literature research unit](https://webis.de/lecturenotes.html#unit-en-literature-research).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
136

Johannes Kiesel's avatar
Johannes Kiesel committed
137
138
139
140
141
### How to do meeting preparations?
- Check [how to do a presentation?](#how-to-do-a-presentation)
- If you have insights, problems, or ideas that will probably need longer discussion, tell these your group (a day) in advance so that they can prepare
- Especially for online meetings you should be ready a few minutes in advance to make sure your equipment works
- Take equipment for taking notes with you
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
142

Johannes Kiesel's avatar
Johannes Kiesel committed
143
### How to do onboarding?
Johannes Kiesel's avatar
Johannes Kiesel committed
144
See below.
Johannes Kiesel's avatar
Johannes Kiesel committed
145
146

#### How to do onboarding for a undergrad student?
Johannes Kiesel's avatar
Johannes Kiesel committed
147
148
149
150
151
152
We are [restructuring the webis command](https://git.webis.de/code-generic/code-admin-knowledge-base/-/issues/248) to do, amongst other things, most of this in one go. Until then:
- Git
  - Create [a new user](https://git.webis.de/admin/users/new) as External (use same login name af for their university)
  - Use `webis git onboard` to add them to all groups mentioned there
- Add them to the [webisstud](https://listserv.uni-weimar.de/mailman/admin/webisstud/members/add) mailing list (you have to remove the `^.*$` from the [ban list](https://listserv.uni-weimar.de/mailman/admin/webisstud/?VARHELP=privacy/subscribing/ban_list) beforehand and add it back afterwards--this is to avoid subscription request spam). The password is in the password file next to the [webis-organization-notes.txt](https://webis.de/facilities.html?q=webis-organization-notes): search for `mailman`
- [Invite](https://support.discord.com/hc/en-us/articles/208866998-Invites-101) them to our Discord server and give them the role of their university
Johannes Kiesel's avatar
Johannes Kiesel committed
153
154
155

#### How to do onboarding for a staff member?
We are [restructuring the webis command](https://git.webis.de/code-generic/code-admin-knowledge-base/-/issues/248) to do, amongst other things, most of this in one go. Until then:
Johannes Kiesel's avatar
Johannes Kiesel committed
156
157
158
- CVS
  - Use `webis cvs onboard <same-login-as-for-git> "<first-name> <last-name>"` and give the account a random password (e.g., using the generator of keypassxc; they have to change the password on first login)
  - Send them the password and tell them to use `ssh <same-login-as-for-git>@webis.uni-weimar.de` once to change the password.
Johannes Kiesel's avatar
Johannes Kiesel committed
159
160
161
162
- Git
  - Create [a new user](https://git.webis.de/admin/users/new) as Admin and External (use same login name af for their university)
  - Use `webis git onboard` to add them to all groups mentioned there
  - Add them to [auth-webis](https://git.webis.de/groups/auth/auth-webis/-/group_members)
Johannes Kiesel's avatar
Johannes Kiesel committed
163
- GitHub: Add them to these organizations as appropriate (usually as owner): [webis-de](https://github.com/orgs/webis-de/people), [netspeak](https://github.com/orgs/netspeak/people), [pan-webis-de](https://github.com/orgs/pan-webis-de/people), [tira-io](https://github.com/orgs/tira-io/people)
Johannes Kiesel's avatar
Johannes Kiesel committed
164
165
- Add them to the [webisstud](https://listserv.uni-weimar.de/mailman/admin/webisstud/members/add) and [webis](https://listserv.uni-weimar.de/mailman/admin/webis/members/add) mailing lists (you have to remove the `^.*$` from the ban list [[webisstud](https://listserv.uni-weimar.de/mailman/admin/webisstud/?VARHELP=privacy/subscribing/ban_list), [webis](https://listserv.uni-weimar.de/mailman/admin/webis/?VARHELP=privacy/subscribing/ban_list)] beforehand and add it back afterwards--this is to avoid subscription request spam). The password is in the password file next to the [webis-organization-notes.txt](https://webis.de/facilities.html?q=webis-organization-notes): search for `mailman`
- [Invite](https://support.discord.com/hc/en-us/articles/208866998-Invites-101) them to our Discord server and give them the role of their university, `staff`, and `<their-university>-staff`
Johannes Kiesel's avatar
Johannes Kiesel committed
166
- Make sure they are aware of [facilities](https://webis.de/facilities.html), [FAQ](https://faq.webis.de), [for students](https://webis.de/for-students.html#onboarding)
Johannes Kiesel's avatar
Johannes Kiesel committed
167

Johannes Kiesel's avatar
Johannes Kiesel committed
168
169
### How to do promotion for my publication?
See the research-generic-notes in the [CVS](https://webis.de/facilities.html#cvs) (staff members only).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
170

Johannes Kiesel's avatar
Johannes Kiesel committed
171
172
### How to do the name shuffle in BigBlueButton?
Add a [bookmarklet](https://en.wikipedia.org/wiki/Bookmarklet) to the browser containing the following JavaScript in the URL field: `javascript:(function(){var u = []; document.querySelectorAll("[class^=userNameMain]").forEach(i => u.push(i.textContent)); for (let l = u.length - 1; l > 0; l--) { const s = Math.floor(Math.random() * (l + 1)); [u[l], u[s]] = [u[s], u[l]]; }; document.getElementById('message-input').value = u.join("\n");})();`
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
173

Johannes Kiesel's avatar
Johannes Kiesel committed
174
### How to do work?
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
175
176
- As soon as you get or choose a new task, look out for problems. Your supervisor may not see problems that occur to you immediately. It is best to discuss problems as soon as you notice them.
- Always think about what you are doing. If it does not make sense to you: contact your supervisor. If you just continue, the risk is very high that what you do is indeed useless. Do not waste your time!
Johannes Kiesel's avatar
Johannes Kiesel committed
177
178
- If you encounter problems, do *not* wait for the next meeting! Ask yourself: 1) Who can help me to find a solution? (often Google is a good first guess); 2) Which information do they need to solve the problem? Contact them. Find a solution.
- Do not be ashamed of encountering problems. Problems are inevitable. If you encounter new problems you made progress.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
179

Johannes Kiesel's avatar
Johannes Kiesel committed
180
If you use a machine (workstation or server) of ours, check the generic [how to do work with ...?](#how-to-do-work-with-).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
181

Johannes Kiesel's avatar
Johannes Kiesel committed
182
183
184
185
Also check the more specific questions below.

#### How to do work at home/some remote place?
Be sure to check the generic [how to do work](#how-to-do-work) first.
Shahbaz Syed's avatar
Shahbaz Syed committed
186

Johannes Kiesel's avatar
Johannes Kiesel committed
187
188
189
190
191
192
You have to know [how to use SSH](#how-to-use-ssh) to log in to your remote machine.
- You probably want to know [how to use Screen or tmux](#how-to-use-screen-or-tmux).
  - Your program does not fail when your SSH connection is closed or interrupted
  - You can start a command line session while you are at the university and continue it at home (or vice versa)
  - You need just one SSH connection to have several terminal *windows* open
- You may want to know [how to use an SSH-tunnel](#how-to-use-ssh-tunnel) to access a web service on your remote machine.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
193

Johannes Kiesel's avatar
Johannes Kiesel committed
194
You probably want to know [how to use VPN](#how-to-use-vpn) to access services that are available within the webis network only.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
195

Johannes Kiesel's avatar
Johannes Kiesel committed
196
197
#### How to do work at the lab?
Be sure to check the generic [how to do work](#how-to-do-work) first. You can find floor plans of our labs [here](https://webis.de/facilities.html#lab-space): click on the room numbers.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
198
199
200
201
202
203
204
205
206
207
208
- Keep the labs clean:
  - Remove bottles/snack packages
  - Write your notes, but don’t leave a mess of papers lying around
  - Wipe your desk
  - Leave your desk as you would like to find it
- The last one to leave:
  - **Close** all the windows
  - **Switch off** the lights
- Weimar: The labs open by your Thoska. Ask your supervisor to register your Thoska


Johannes Kiesel's avatar
Johannes Kiesel committed
209
210
#### How to do work on code?
Be sure to check the generic [how to do work](#how-to-do-work) first.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
211

Johannes Kiesel's avatar
update    
Johannes Kiesel committed
212
See our [project templates](https://webis.de/facilities.html?q=project+templates) for the usual project structure. In addition, you might add these directories:
Shahbaz Syed's avatar
Shahbaz Syed committed
213
```
Johannes Kiesel's avatar
Johannes Kiesel committed
214
data/       Input/result data (put intermediate data into .gitignore); see [how to do work on data](#how-to-do-work-on-data)
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
215
216
doc/        Documentation files, including presentations
material/   Papers, books, links, ... 
Shahbaz Syed's avatar
Shahbaz Syed committed
217
```
Shahbaz Syed's avatar
Shahbaz Syed committed
218

Johannes Kiesel's avatar
Johannes Kiesel committed
219
Check [how to do filenames](#how-to-do-filenames).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
220

Johannes Kiesel's avatar
Johannes Kiesel committed
221
222
#### How to do work on data?
Be sure to check the generic [how to do work](#how-to-do-work) first.
Shahbaz Syed's avatar
Shahbaz Syed committed
223

Johannes Kiesel's avatar
update    
Johannes Kiesel committed
224
- Only use the GitLab repository for small example data, or for resources up to 10 MB (like word lists).
Johannes Kiesel's avatar
Johannes Kiesel committed
225
- We use Ceph for everything else. You should be able to access the CephFS at `/mnt/ceph/storage` on your workstation [in the lab](#how-to-do-work-at-the-lab) or [remotely](#how-to-do-work-at-homesome-remote-place).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
226
227
228
- Put data into `/mnt/ceph/storage/data-in-progress/`. The location inside `data-in-progress` should reflect the path of the project in GitLab. For example, the data for a repository `code-research/conversational-search/conversational-news` should be in `/mnt/ceph/storage/data-in-progress/data-research/conversational-search/conversational-news` (note that it should be `data-research` instead of `code-research`).
- Also see our [data page](https://webis.de/data.html) for an overview of the datasets we have, as they might be useful for you. You should be able to access the data at `/mnt/ceph/storage/corpora`.
- If you download a new dataset, ask your supervisor where to put it and to add an entry to the [data page](https://webis.de/data.html).
Shahbaz Syed's avatar
Shahbaz Syed committed
229

Johannes Kiesel's avatar
Johannes Kiesel committed
230
#### How to do work on web-archive data?
Johannes Kiesel's avatar
Johannes Kiesel committed
231
232
- Log into the [webis jupyterlab](https://jupyter2.webis.de/) with your gitlab credentials
- Launch a new terminal and check out the [aitools4-aq-cluster-computing repository](https://git.webis.de/code-lib/aitools/aitools4-aq-cluster-computing)
Maik Fröbe's avatar
Maik Fröbe committed
233
234
235
236
237
- Ensure that you have a user directory in the HDFS (ask your supervisor to run the following in ssh.webis.de):
  ```
  HADOOP_USER_NAME=hdfs hdfs dfs -mkdir /user/<username>
  HADOOP_USER_NAME=hdfs hdfs dfs -chown -R <username>:<username> /user/<username>
  ```
Maik Fröbe's avatar
Maik Fröbe committed
238
239
- Ask your supervisor to put the S3 credentials of the `internet-archive-ro` user into `~/.aws/config`:
  ```
Maik Fröbe's avatar
Maik Fröbe committed
240
  [DEFAULT]
Maik Fröbe's avatar
Maik Fröbe committed
241
242
243
244
  host_base = s3.dw.webis.de:7480
  access_key=<TODO>
  secret_key=<TODO>
  ```
Sven Fibelkorn's avatar
Sven Fibelkorn committed
245
246
- Launch the notebook [web-archive-tutorial.ipynb](https://git.webis.de/code-lib/aitools/aitools4-aq-cluster-computing/-/blob/master/src/main/ipynb/web-archive-tutorial.ipynb) at [/aitools4-aq-cluster-computing/src/main/ipynb/](https://git.webis.de/code-lib/aitools/aitools4-aq-cluster-computing/-/tree/master/src/main/ipynb) from the file browser
- Run the cells to finish the tutorial
Johannes Kiesel's avatar
Johannes Kiesel committed
247

Johannes Kiesel's avatar
Johannes Kiesel committed
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
#### How to do work with ...?
Be sure to check the generic [how to do work](#how-to-do-work) first.

- **Do not** shut down the machine
- **Do not** use the machine for sharing files
- You are responsible to keep the machine running and healthy
- Pay attention to the mails from the monitoring system
- If you see an error: fix it if possible and report in the mailing list or the Discord admin channel
- Check current load before starting your process (especially if you share the machine with others):
  - `htop` or `top` for CPU + main memory
  - `nvidia-smi` for GPU

Also check the more specific questions below.

##### How to do work with Betaweb?
Johannes Kiesel's avatar
Johannes Kiesel committed
263
Be sure to check the generic [how to do work with ...?](#how-to-do-work-with-) first.
Shahbaz Syed's avatar
Shahbaz Syed committed
264

Johannes Kiesel's avatar
update    
Johannes Kiesel committed
265
266
267
What do you want to do?
- Large batch tasks? Use [Hadoop](https://hadoop.apache.org/) and see our [cluster notes](https://webis.de/facilities.html#cluster-notes)
- Large-scale datascience? Use [Spark](https://spark.apache.org/) and see our [cluster notes](https://webis.de/facilities.html#cluster-notes)
Johannes Kiesel's avatar
Johannes Kiesel committed
268
- Setup a distributed service? Learn [how to do a demo/service?](#how-to-do-a-demoservice)
Shahbaz Syed's avatar
Shahbaz Syed committed
269

Bjarne Sievers's avatar
Bjarne Sievers committed
270
##### How to do work with Gammaweb / How to access GPUs?
Johannes Kiesel's avatar
Johannes Kiesel committed
271
Be sure to check the generic [how to do work with ...?](#how-to-do-work-with-) first.
272
273
274
275
276
277
278
279
280
281
282
283
284
285

- add a user entry to the Gammaweb salt file (https://git.webis.de/code-generic/code-saltstack/-/blob/master/src/srv/salt/pillars/gammaweb-users.sls):
>  \- login: USERNAME  
>  &ensp;fullname: "FIRSTNAME LASTNAME"  
>  &ensp;email: your@email-address.here
- download passdb.kdbx (https://git.webis.de/code-admin/passwords)
- obtain the password from an admin and install keepassxc
- get Betaweb password from passdb
- connect to Betaweb `ssh webis@betaweb020.medien.uni-weimar.de` using the Betaweb password
- navigate to `/srv/salt/pillars/` and pull the most recent repository updates with `git pull`
- add an entry to `/srv/salt/pillars/gammaweb-users-passwords.sls` with the USERNAME as added to the Gammaweb salt file and a temporary password
- run `salt gammaweb\* state.apply user-accounts.gammaweb` to generate the account and a data directory in `/mnt/raid/data/`
- access Gammaweb `ssh USERNAME@gammaweb02.medien.uni-weimar.de` using your temporary password
- change temporary password with `passwd` (do not remove temporary password from Gammaweb salt file)
Shahbaz Syed's avatar
Shahbaz Syed committed
286

Johannes Kiesel's avatar
Johannes Kiesel committed
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
## How to fix ...?
Category of questions on fixing things that break regularly.

### How to fix Docker zombie processes?
Use the `--init` flag with `docker run` to prevent your containers from spawning zombie processes. So you have to *prevent* them, whereas you have to restart the machine to fix them. Since we do not want to restart our servers, best is to just *always* use `--init`. This is especially important if your container uses the GPU, as zombie processes still block the GPU resources. If needed, here is the [background](https://stackoverflow.com/questions/49162358/docker-init-zombies-why-does-it-matter) and [documentation](https://docs.docker.com/engine/reference/run/#specify-an-init-process).

An example of zombie processes (for GPU 0) on gammaweb:
<pre>
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0   1969683      C   -                                            903MiB |
|    0   2410464      C   -                                              8MiB |
|    1   2958983      C   /opt/conda/bin/python                        111MiB |
</pre>

### How to fix Eclipse ...?
See below.

#### How to fix Eclipse confusing CVS for Git?
Newer versions of Eclipse might associate CVS resources with Git. During checkout, Eclipse will "Auto share git projects"; right-clicking resources in the Project Explorer and selecting "Team" will then only give you the Git ("Commit...", "Push to origin...", "Pull", etc.) instead of the CVS options ("Update", etc.).

To solve this issue, go to Window -> Preferences, search for "Git", go to Team -> Git -> Projects and untick "Automatically share projects located in Git Repository".

#### How to fix Eclipse not recognizing a CVS project?
Newer versions of Eclipse might not recognize a directory as a CVS project:
- Right-clicking on the project -> "Team" does not show the regular CVS options ("Commit", "Update", etc.,)
- No server name tag beside the project (e.g. literature instead of literature [webis.uni-weimar.de])
To resolve this, right-click -> _Delete_  (only from workspace and _not_ from disk) ->  Undo delete _Ctrl+Z_
The server name tag should now be displayed beside the directory and the Team context menu has the usual CVS options.

Maybe related: [How to fix Eclipse confusing CVS for Git?](how-to-fix-eclipse-confusing-cvs-for-git)

321
### How to fix our servers at the Weimar library (how to get there)?
322
323
324
325
326
You need:
  - Thoska access (Audimax Vorbereitungsraum): Mail to C. Mohr ([Liegenschaften](https://www.uni-weimar.de/de/universitaet/struktur/zentrale-einrichtungen/servicezentrum-liegenschaften/) / Betriebstechnik / Sachgebiet Heizungs-, Lüftungs- und Sanitäranlagen)
  - Key from the Weimar keyring

Then go:
327
  - [Library (Audimax) backdoor](https://www.google.com/maps/dir//50.977826,11.3273931/@50.9778234,11.3268248,102) (stairs downwards between SCC and library; Thoska reader): DO NOT BLOCK IT (causes an alarm)
328
329
330
331
332
  - Door on the right (not the one into the Audimax)
  - Door directly in front of you (not the one upstairs; key; lock again when you leave)
  - Through the corridor (lots of heating stuff here)
  - Door on the left
  - Door on the left (key; you probably need to press yourself against the door to open it; lock again when you leave)
Johannes Kiesel's avatar
Johannes Kiesel committed
333

Michael Völske's avatar
Michael Völske committed
334
335
336
337
338
### How to shut down everything in an emergency?

In extreme circumstances (e.g. aircon failure), our machines may need to be powered down quickly. Please refer to [this guide](https://git.webis.de/code-generic/code-admin-knowledge-base/-/blob/master/procedures/shutdown-procedure.md).


Johannes Kiesel's avatar
Johannes Kiesel committed
339
## How to use ...?
Johannes Kiesel's avatar
Johannes Kiesel committed
340
Category of questions on one specific command, tool, service, cluster, and so on. It is not uncommon to have a question on the generic task under "[how to do ...?](how-to-do-)" that links to one or more "how to use" questions of the respective tools.
Johannes Kiesel's avatar
Johannes Kiesel committed
341
342
343
344
345
346
347
348
349
350
351
352

### How to use Ceph/CephFS/S3?
See our [cephfs documentation](https://webis.de/facilities.html?q=cephfs) and [S3 documentation](https://webis.de/facilities.html?q=s3).

For administration see the [admin knowledgebase](https://git.webis.de/code-generic/code-admin-knowledge-base/-/tree/master/services/ceph). 

### How to use CVS?
See our [CVS documentation](https://webis.de/facilities.html#cvs) and the sub-questions below.

#### How to use CVS in Eclipse?
Install the *Eclipse CVS Client* to use CVS in Eclipse:
![eclipse-cvs-install](img/eclipse-cvs-install.png)
Ferdinand Schlatt's avatar
Ferdinand Schlatt committed
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376

**Important** set the workspace encoding to ISO-8859-1
- Menu: Window -> Preferences
- Search for "encoding", go to Workspace
- Set "Text file ecoding" to ISO-8859-1 and Next text file line delimiter to Unix

**Add CVS repository to Eclipse**
- Menu: Window -> Show View -> Other...
- Select CVS -> CVS Repositories
- In the CVS View click Add CVS Repository
- Host: webis.uni-weimar.de
- Repository path: /srv/cvsroot
- User and Password as per Webis login
- Connection Type: extssh
- Use port: 22
- Validate connection on finish
 
**Import from within eclipse**
- Checkout folders by expanding HEAD and selecting Check Out in the right-click option menu

**Import existing local CVS repository**
- (Especially useful for windows users trying to use the `webis cvs update` command)
- File -> Import -> Existing Projects into Workspace -> Next
- Select the local folder to import into the workspace by clicking Browse... and navigating to the folder and hit finish (each folder must be imported manually)
Johannes Kiesel's avatar
Johannes Kiesel committed
377

Johannes Kiesel's avatar
Johannes Kiesel committed
378
If you run into troubles, check [how to fix Eclipse](#how-to-fix-eclipse-).
Johannes Kiesel's avatar
Johannes Kiesel committed
379

Jan Heinrich Reimer's avatar
Jan Heinrich Reimer committed
380
### How to use Doccano?
381
See the [usage instructions](https://git.webis.de/code-generic/code-admin-knowledge-base/-/blob/master/services/doccano/README.md).
Jan Heinrich Reimer's avatar
Jan Heinrich Reimer committed
382

Johannes Kiesel's avatar
Johannes Kiesel committed
383
### How to use Docker?
Johannes Kiesel's avatar
Johannes Kiesel committed
384
See our [docker repository and documentation](https://webis.de/facilities.html?q=docker) and the [official tutorial](https://docs.docker.com/get-started/).
Johannes Kiesel's avatar
Johannes Kiesel committed
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425

### How to use GIT?
See our [gitlab notes](https://webis.de/facilities.html?q=gitlab)

For cloning via SSH (recommended) [generate an SSH public key](https://git-scm.com/book/en/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key) and add it to your Gitlab profile [here](https://git.webis.de/profile/keys).

Further reading: [Pro Git](https://git-scm.com/book/en/v2).

### How to use Screen or tmux?
Both programs are very similar. Tmux is probably a bit easier to use for beginners. Read the *getting started* sections for [screen](https://www.gnu.org/software/screen/manual/screen.html#Getting-Started) and [tmux](https://github.com/tmux/tmux/wiki/Getting-Started) for more information on how to use them.

### How to use SSH?
The [Secure Shell](https://en.wikipedia.org/wiki/Secure_Shell) (SSH) allows you to log into remote machines over the network, access the command line, transfer files, and set up port forwarding. 

If you're running Linux, MacOS, or a recently updated Windows 10, you can simply open a terminal and type

```
ssh user@hostname
```

to log into `hostname`, if you have access to an account named `user` there. On older Windows versions, you can install [PuTTY](https://www.putty.org/) to use SSH.

Our workstations require the use of a public key for SSH access from outside the university network: learn how to create yourself a key in [this tutorial](https://www.digitalocean.com/community/tutorials/ssh-essentials-working-with-ssh-servers-clients-and-keys). Ask your supervisor to place your public key on your machine or (preferably) do it yourself while you are at the university (both explained in the tutorial).

Further reading: [The Linux Command Line, Chapter 16](http://linuxcommand.org/tlcl.php); [How does SSH Work](https://www.hostinger.com/tutorials/ssh-tutorial-how-does-ssh-work)


### How to use SSH-tunnel?
There are three different possibilities: local port forwarding, remote port forwarding, and dynamic port forwarding, all of which are explained [here](https://help.ubuntu.com/community/SSH/OpenSSH/PortForwarding). For most simple use cases you will encounter, local port forwarding is usually the right choice. Dynamic port forwarding is more flexible, but if you really need it, you are probably better off using [VPN](#how-to-use-vpn) instead.

Further reading: [connection illustrations on StackOverflow](https://unix.stackexchange.com/a/115906/319554)

### How to use this FAQ?
If you are new to Webis, check [how to do first steps at Webis?](#how-to-do-first-steps-at-webis). Otherwise look at the [questions](#questions).

If you can not find your question: [ask](#how-to-ask-for-help), then [write a question and answer](#how-to-do-an-answer-for-this-faq) if you can. Open an issue otherwise.

If you find the answer to a question is outdated: fix it if you can. Open an issue otherwise.

### How to use VPN?
See our notes on [vpn](https://webis.de/facilities.html?q=vpn)
Johannes Kiesel's avatar
Johannes Kiesel committed
426
427


Johannes Kiesel's avatar
Johannes Kiesel committed
428
429
## Why ...?
Frequently needed justifications.
Johannes Kiesel's avatar
Johannes Kiesel committed
430

Johannes Kiesel's avatar
Johannes Kiesel committed
431
### Why do I get all these mails?
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
432
433
434
435
Webis as a research network spans several universities. We as this group have a lot of [hardware](https://webis.de/facilities.html#hardware) and many students are in our group, who switch tasks regularly. Simply put: we can not administrate that you get just the mails that you need.

Usually you just need to pay attention to mails regarding your workstation. Isn't it nice that you get a mail in case your hard disk is full, so that you do not need to check on this while you run a program that takes a week to complete? This might seem like an edge case, but you suddenly might find yourself working on one of our big computation servers: we *regularly* get mails for these machines that the hard disk is full or that some process causes it to become irresponsive or to overheat and so on.