README.md 35 KB
Newer Older
Johannes Kiesel's avatar
Johannes Kiesel committed
1
# Webis FAQ
Shahbaz Syed's avatar
Shahbaz Syed committed
2

Johannes Kiesel's avatar
Johannes Kiesel committed
3
If you are new to this document, please read "[How to use this FAQ?](#how-to-use-this-faq)"
Johannes Kiesel's avatar
Johannes Kiesel committed
4

Johannes Kiesel's avatar
Johannes Kiesel committed
5
[[_TOC_]]
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
6
7

## How to ask for help?
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
8
- Make sure you know [how to use this FAQ](#how-to-use-this-faq)
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
9
10
11
- See the section on [communication](https://webis.de/for-students.html#onboarding)
- In doubt, mail to our `webisstud@listserv.uni-weimar.de` mailing list

Johannes Kiesel's avatar
update    
Johannes Kiesel committed
12

Johannes Kiesel's avatar
Johannes Kiesel committed
13
## How to do ...?
Johannes Kiesel's avatar
Johannes Kiesel committed
14
Category of questions on tasks one wants to accomplish. Not questions on [fixing things](how-to-fix-) and not on [using specific tools](#how-to-use-).
Johannes Kiesel's avatar
Johannes Kiesel committed
15

Johannes Kiesel's avatar
Johannes Kiesel committed
16
17
### How to do a demo/service?
Learn [how to use Docker](#how-to-use-docker) and see our notes on [web services setup](https://webis.de/facilities.html?q=web+services+setup).
Johannes Kiesel's avatar
Johannes Kiesel committed
18

19
20
Also learn about [permissions](#how-to-do-a-demoservice-permission-setup).

Johannes Kiesel's avatar
Johannes Kiesel committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#### How to do a demo/service logger?
We are establishing [JSON Lines](https://jsonlines.org/) as the format for our logs. For interoperability of our software, use these field names as appropriate (add your own names if something you log is not covered here):
```ts
{
  "timestamp": string, // ISO 8601 Date, e.g. "2020-09-24T06:29:42Z"
  "user": string,      // An identifier for the user (possibly the IP Address) that triggered the log event
  "url": string,       // URL of the request that triggered the log event
  "message": unknown,  // The Protobuf request message of the request that triggered the log event (for gRPC services)
  "query": string      // The plain text search query of the request that triggered the log event
}
```
Note that you have to write the JSON above as a single line to your log.

Web services should log both `timestamp` and `user`, at least one of `url`, `message`, or `query`, and further service-specific fields.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
35

Johannes Kiesel's avatar
Johannes Kiesel committed
36
[gRPC](https://grpc.io/docs/what-is-grpc/introduction/) services should log both the `url` and `message` fields. The `message` field should be set to the JSON representation of the [Protobuf](https://developers.google.com/protocol-buffers/docs/proto3) request message.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
37

Johannes Kiesel's avatar
Johannes Kiesel committed
38
39
40
- TODO: How to code logging to Elasticsearch
- TODO: How to code logging to CephFS (and then send it to Elasticsearch)

41
42
43
44
45
46
#### How to do a demo/service permission setup?
Pick or create an authentication group `auth/auth-services/<name>` as subgroup of [auth-services](https://git.webis.de/auth/auth-services): students in that group will be able to deploy the demo/service
- If you created an new group, create the Kubernetes namespace:
  - Add it to the `kubernetes.podpriority.webisservices.extra_namespaces` (as `services-<name>`) and `kubernetes.group_namespaces` (named `services-<name>` with group `auth/auth-services/<name>`) in the [controller.sls](https://git.webis.de/code-generic/code-saltstack/-/blob/master/src/srv/salt/pillars/kubernetes/betaweb/controller.sls)
  - Run `salt "betaweb001.medien.uni-weimar.de" state.apply kubernetes.controller`
- Deploy your demo/service to the Kubernetes namespace `services-<name>` (set `metadata.namespace` in the `.yaml` to `services-<name>` for all entries (deployments, services, and so on))
47
- If you add someone to `auth/auth-services/<name>`, they might need to reset their Kubernetes token by removing the `id-token`-line from your `~/.kube/config` for the changes to take effect
48

Johannes Kiesel's avatar
Johannes Kiesel committed
49
50
51
52
53
54
55
56
57
58
59
60
61
### How to do a mail address change in GitLab?
If the account is still associated with the Weimar university account, first see [how to do a Weimar university account/LDAP account unlink in GitLab](#how-to-do-a-weimar-university-accountldap-account-unlink-in-gitlab).

On `webis.uni-weimar.de`, log in and *then* change to `root` (password in webis-organization passwords files).
```
# Based on https://stackoverflow.com/a/47135095
gitlab-rails console # takes some time to load
myuser = User.find_by_username('<username>')
myuser.email = "<email>"
myuser.skip_reconfirmation!
myuser.save
```

Johannes Kiesel's avatar
Johannes Kiesel committed
62
63
### How to do a presentation?
This depends a lot on the kind of presentation.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
64

Johannes Kiesel's avatar
Johannes Kiesel committed
65
Especially as a project student but sometimes also as a thesis student and HiWi you have to or want to [present your week's work](#how-to-do-a-presentation-for-my-weeks-work). Sometimes you also have to [present a scientific pulication of someone else](#how-to-do-a-presentation-for-a-scientific-publication-of-someone-else).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
66

Johannes Kiesel's avatar
Johannes Kiesel committed
67
68
69
70
71
72
In some occasions you may also [present your own scientific material](#how-to-do-a-presentation-for-a-scientific-publication-of-me).

#### How to do a presentation for a scientific publication of me?
See our [oral presentations unit](https://webis.de/lecturenotes.html#unit-en-oral-presentations). You might also be interested in [how to promote your publication](#how-to-do-promotion-for-my-publication).

#### How to do a presentation for a scientific publication of someone else?
Johannes Kiesel's avatar
Johannes Kiesel committed
73
The presentation should give answers to these questions:
Johannes Kiesel's avatar
Johannes Kiesel committed
74
75
76
- What is the problem?
- Why should I care? (e.g., why is this relevant to our current project?) and
- What are the solutions/results?
Johannes Kiesel's avatar
Johannes Kiesel committed
77
78
79
80
81
82
Every content you put on the slides should help to answer these questions. It is usually a good idea to order your presentation like the questions above.

Furthermore, here are some hints:
- Clearly state the name, authors, publication year, and venue of the publication (best already on the title slide)
- Do not put content on the slides that you do not understand! If you feel the content you do not understand is important: ask your supervisor whether they can help you to understand it (tell them what you do understand and what you feel is missing)
- You can use figures and tables from the original publication
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
83

Johannes Kiesel's avatar
Johannes Kiesel committed
84
85
86
87
88
89
90
91
92
#### How to do a presentation for my week's work?
Slides
- A 5-minute presentation is usually enough for a week; sometimes a single slide is sufficient
- You can use [this template](weekly-presentation-template/weekly-presentation-template.pdf): just replace the text in "()".
- Provide context at the start: how is your work related to the bigger goal of the project/thesis/student assistant task?
- Use bullet points for what you accomplished and problems you solved
- Use bullet points for what you could do next; think of what is needed and what is possible
- When you think you are done, look at your slides as if you were a colleague of yours: Would you understand everything? Do you have more questions?
- If you need to explain complicated things, have a look at our answer for [scientific publications](how-to-do-a-presentation-for-a-scientific-publication-of-me)
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
93

Johannes Kiesel's avatar
Johannes Kiesel committed
94
95
Presentation
- Mention how you checked (or double-checked) your accomplishments
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
96

Johannes Kiesel's avatar
Johannes Kiesel committed
97
98
### How to do a scientific paper?
See our [scientific writing unit](https://webis.de/lecturenotes.html#unit-en-scientific-writing).
Lukas Gienapp's avatar
Lukas Gienapp committed
99

Johannes Kiesel's avatar
Johannes Kiesel committed
100
101
### How to do a thesis?
Be sure to check the generic [how to do work](#how-to-do-work) first.
Johannes Kiesel's avatar
Johannes Kiesel committed
102

Johannes Kiesel's avatar
Johannes Kiesel committed
103
Then check our [thesis notes](https://webis.de/facilities.html?q=thesis) for advice on writing.
Martin Potthast's avatar
Martin Potthast committed
104

Johannes Kiesel's avatar
Johannes Kiesel committed
105
106
107
108
### How to do a Weimar university account/LDAP account unlink in GitLab?
As an admin, go to the account's page (admin panel, "view latest users", search for the account). Under "Identities", click "Delete".
![delete-gitlab-identity](img/delete-gitlab-identity.png)

Johannes Kiesel's avatar
Johannes Kiesel committed
109
110
111
112
113
114
### How to do an answer for this FAQ?
Placing the answer:
- Questions should be sorted alphabetically: I know it sounds nice to order them by topic, but we can not maintain such an ordering for long
- Instead, the FAQ should guide people to other questions that are relevant: use links!
- Moreover, the FAQ is structured hierarchically, and you need to place your new question into this hierarchy to achieve the best effect.
- See the top level questions to check whether your question belongs there.
115

Johannes Kiesel's avatar
Johannes Kiesel committed
116
117
118
119
Writing the answer:
- Be short. Give an overview rather than going into details.
- Use links. Everything you write is in danger of being outdated soon.
- Consider to split the question. Ask yourself: Could there be people that are just interested in one part of your answer? If so, then split the question but let the answers link to each other.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
120

Johannes Kiesel's avatar
Johannes Kiesel committed
121
### How to do an emergency shut down?
122
In extreme circumstances (e.g. aircon failure), our machines may need to be powered down quickly. Please refer to [this guide](https://kb.webis.de/procedures/shutdown-procedure.html).
Johannes Kiesel's avatar
Johannes Kiesel committed
123

Johannes Kiesel's avatar
Johannes Kiesel committed
124
125
126
### How to do an upload of a publication to https://webis.de/publications?
See the webis-publication-notes in the [CVS](/literature/webis-publications/webis-publications-notes.txt) (staff members only).

Jan Heinrich Reimer's avatar
Jan Heinrich Reimer committed
127
### How to do data annotation?
128
Depending on your task, you can use:
129
- [Doccano](https://doccano.webis.de/) for text annotation. (Here's [how to use it](https://kb.webis.de/services/doccano/index.html).)
130
131
132
133
  Doccano supports annotation for:
  - Text classification (e.g., relevance judgements, [sentiment analysis](https://doccano.webis.de/demo/sentiment-analysis))
  - Sequence labelling (e.g., [named entity recognition](https://doccano.webis.de/demo/named-entity-recognition))
  - Sequence to sequence (e.g., [translation](https://doccano.webis.de/demo/translation))
Johannes Kiesel's avatar
Johannes Kiesel committed
134
- [WAT-SL](https://webis.de/research.html#wat-sl) for labeling of pre-defined text segments.
Jan Heinrich Reimer's avatar
Jan Heinrich Reimer committed
135

Johannes Kiesel's avatar
Johannes Kiesel committed
136
137
### How to do filenames?
Use **only** English lowercase alphabet, numbers and hyphens in the filename. 
Shahbaz Syed's avatar
Shahbaz Syed committed
138

Johannes Kiesel's avatar
Johannes Kiesel committed
139
Among others, this means:
Shahbaz Syed's avatar
Shahbaz Syed committed
140
- For author names with umlaut, decompose them to vowel+e (ä -> ae, ö -> oe). 
Johannes Kiesel's avatar
Johannes Kiesel committed
141
142
- For author names with accents (e.g., à, ç) use the corresponding English letter (a, c). 
- For special characters/ Greek alphabet in the paper's title (a\*, χ2) make them explicit (astar, chi-square). 
Shahbaz Syed's avatar
Shahbaz Syed committed
143

Johannes Kiesel's avatar
Johannes Kiesel committed
144
Publications are named as `<last-name-first-author><two-digits-year>-<title>.[pdf, ...]`
Shahbaz Syed's avatar
Shahbaz Syed committed
145

Johannes Kiesel's avatar
Johannes Kiesel committed
146
For example: daume06-bayesian-query-focused-summarization.pdf
Shahbaz Syed's avatar
Shahbaz Syed committed
147
148
149
- Title: Bayesian Query-Focused Summarization
- Authors: Hal Daumé III and Daniel Marcu
- Year: 2006
Shahbaz Syed's avatar
Shahbaz Syed committed
150

Shahbaz Syed's avatar
Shahbaz Syed committed
151
152
When in doubt, ask someone before committing the file.

Johannes Kiesel's avatar
Johannes Kiesel committed
153
154
155
156
### How to do first steps at Webis?
- Make sure you are on our `webisstud@listserv.uni-weimar.de` mailing list (if not: ask your supervisor; maybe you wonder then "[Why do I get all these mails?](#why-do-i-get-all-these-mails)")
- Make sure you have an account in our [GitLab](https://git.webis.de) (if not: ask your supervisor)
- Make sure you know [how to ask for help](#how-to-ask-for-help), [how to do work](#how-to-do-work), and [how to do meeting preparations](#how-to-do-meeting-preparations)
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
157

Johannes Kiesel's avatar
Johannes Kiesel committed
158
### How to do literature research?
Shahbaz Syed's avatar
Shahbaz Syed committed
159
160
- See our [literature research unit](https://webis.de/lecturenotes.html#unit-en-literature-research).
- See [example](https://docs.google.com/presentation/d/1BdeZpW_StXmxG6l6nXL4WWhyL87AhknGjZkSw4nqMC4/edit?usp=sharing) for collecting, naming, and using the bibkey to cite a resource.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
161

Johannes Kiesel's avatar
Johannes Kiesel committed
162
163
164
165
166
### How to do meeting preparations?
- Check [how to do a presentation?](#how-to-do-a-presentation)
- If you have insights, problems, or ideas that will probably need longer discussion, tell these your group (a day) in advance so that they can prepare
- Especially for online meetings you should be ready a few minutes in advance to make sure your equipment works
- Take equipment for taking notes with you
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
167

Johannes Kiesel's avatar
Johannes Kiesel committed
168
### How to do onboarding?
Johannes Kiesel's avatar
Johannes Kiesel committed
169
See below.
Johannes Kiesel's avatar
Johannes Kiesel committed
170
171

#### How to do onboarding for a undergrad student?
Johannes Kiesel's avatar
Johannes Kiesel committed
172
173
174
175
176
177
We are [restructuring the webis command](https://git.webis.de/code-generic/code-admin-knowledge-base/-/issues/248) to do, amongst other things, most of this in one go. Until then:
- Git
  - Create [a new user](https://git.webis.de/admin/users/new) as External (use same login name af for their university)
  - Use `webis git onboard` to add them to all groups mentioned there
- Add them to the [webisstud](https://listserv.uni-weimar.de/mailman/admin/webisstud/members/add) mailing list (you have to remove the `^.*$` from the [ban list](https://listserv.uni-weimar.de/mailman/admin/webisstud/?VARHELP=privacy/subscribing/ban_list) beforehand and add it back afterwards--this is to avoid subscription request spam). The password is in the password file next to the [webis-organization-notes.txt](https://webis.de/facilities.html?q=webis-organization-notes): search for `mailman`
- [Invite](https://support.discord.com/hc/en-us/articles/208866998-Invites-101) them to our Discord server and give them the role of their university
Johannes Kiesel's avatar
Johannes Kiesel committed
178
179
180

#### How to do onboarding for a staff member?
We are [restructuring the webis command](https://git.webis.de/code-generic/code-admin-knowledge-base/-/issues/248) to do, amongst other things, most of this in one go. Until then:
Johannes Kiesel's avatar
Johannes Kiesel committed
181
182
183
- CVS
  - Use `webis cvs onboard <same-login-as-for-git> "<first-name> <last-name>"` and give the account a random password (e.g., using the generator of keypassxc; they have to change the password on first login)
  - Send them the password and tell them to use `ssh <same-login-as-for-git>@webis.uni-weimar.de` once to change the password.
Johannes Kiesel's avatar
Johannes Kiesel committed
184
185
186
187
- Git
  - Create [a new user](https://git.webis.de/admin/users/new) as Admin and External (use same login name af for their university)
  - Use `webis git onboard` to add them to all groups mentioned there
  - Add them to [auth-webis](https://git.webis.de/groups/auth/auth-webis/-/group_members)
Johannes Kiesel's avatar
Johannes Kiesel committed
188
- GitHub: Add them to these organizations as appropriate (usually as owner): [webis-de](https://github.com/orgs/webis-de/people), [netspeak](https://github.com/orgs/netspeak/people), [pan-webis-de](https://github.com/orgs/pan-webis-de/people), [tira-io](https://github.com/orgs/tira-io/people)
Johannes Kiesel's avatar
Johannes Kiesel committed
189
190
- Add them to the [webisstud](https://listserv.uni-weimar.de/mailman/admin/webisstud/members/add) and [webis](https://listserv.uni-weimar.de/mailman/admin/webis/members/add) mailing lists (you have to remove the `^.*$` from the ban list [[webisstud](https://listserv.uni-weimar.de/mailman/admin/webisstud/?VARHELP=privacy/subscribing/ban_list), [webis](https://listserv.uni-weimar.de/mailman/admin/webis/?VARHELP=privacy/subscribing/ban_list)] beforehand and add it back afterwards--this is to avoid subscription request spam). The password is in the password file next to the [webis-organization-notes.txt](https://webis.de/facilities.html?q=webis-organization-notes): search for `mailman`
- [Invite](https://support.discord.com/hc/en-us/articles/208866998-Invites-101) them to our Discord server and give them the role of their university, `staff`, and `<their-university>-staff`
Johannes Kiesel's avatar
Johannes Kiesel committed
191
- Make sure they are aware of [facilities](https://webis.de/facilities.html), [FAQ](https://faq.webis.de), [for students](https://webis.de/for-students.html#onboarding)
Johannes Kiesel's avatar
Johannes Kiesel committed
192

Johannes Kiesel's avatar
Johannes Kiesel committed
193
194
### How to do promotion for my publication?
See the research-generic-notes in the [CVS](https://webis.de/facilities.html#cvs) (staff members only).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
195

Johannes Kiesel's avatar
Johannes Kiesel committed
196
197
### How to do the name shuffle in BigBlueButton?
Add a [bookmarklet](https://en.wikipedia.org/wiki/Bookmarklet) to the browser containing the following JavaScript in the URL field: `javascript:(function(){var u = []; document.querySelectorAll("[class^=userNameMain]").forEach(i => u.push(i.textContent)); for (let l = u.length - 1; l > 0; l--) { const s = Math.floor(Math.random() * (l + 1)); [u[l], u[s]] = [u[s], u[l]]; }; document.getElementById('message-input').value = u.join("\n");})();`
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
198

Johannes Kiesel's avatar
Johannes Kiesel committed
199
### How to do work?
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
200
201
- As soon as you get or choose a new task, look out for problems. Your supervisor may not see problems that occur to you immediately. It is best to discuss problems as soon as you notice them.
- Always think about what you are doing. If it does not make sense to you: contact your supervisor. If you just continue, the risk is very high that what you do is indeed useless. Do not waste your time!
Johannes Kiesel's avatar
Johannes Kiesel committed
202
203
- If you encounter problems, do *not* wait for the next meeting! Ask yourself: 1) Who can help me to find a solution? (often Google is a good first guess); 2) Which information do they need to solve the problem? Contact them. Find a solution.
- Do not be ashamed of encountering problems. Problems are inevitable. If you encounter new problems you made progress.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
204

Johannes Kiesel's avatar
Johannes Kiesel committed
205
If you use a machine (workstation or server) of ours, check the generic [how to do work with ...?](#how-to-do-work-with-).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
206

Johannes Kiesel's avatar
Johannes Kiesel committed
207
208
209
210
Also check the more specific questions below.

#### How to do work at home/some remote place?
Be sure to check the generic [how to do work](#how-to-do-work) first.
Shahbaz Syed's avatar
Shahbaz Syed committed
211

Johannes Kiesel's avatar
Johannes Kiesel committed
212
213
214
215
216
217
You have to know [how to use SSH](#how-to-use-ssh) to log in to your remote machine.
- You probably want to know [how to use Screen or tmux](#how-to-use-screen-or-tmux).
  - Your program does not fail when your SSH connection is closed or interrupted
  - You can start a command line session while you are at the university and continue it at home (or vice versa)
  - You need just one SSH connection to have several terminal *windows* open
- You may want to know [how to use an SSH-tunnel](#how-to-use-ssh-tunnel) to access a web service on your remote machine.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
218

Johannes Kiesel's avatar
Johannes Kiesel committed
219
You probably want to know [how to use VPN](#how-to-use-vpn) to access services that are available within the webis network only.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
220

Johannes Kiesel's avatar
Johannes Kiesel committed
221
222
#### How to do work at the lab?
Be sure to check the generic [how to do work](#how-to-do-work) first. You can find floor plans of our labs [here](https://webis.de/facilities.html#lab-space): click on the room numbers.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
223
224
225
226
227
228
229
230
231
232
233
- Keep the labs clean:
  - Remove bottles/snack packages
  - Write your notes, but don’t leave a mess of papers lying around
  - Wipe your desk
  - Leave your desk as you would like to find it
- The last one to leave:
  - **Close** all the windows
  - **Switch off** the lights
- Weimar: The labs open by your Thoska. Ask your supervisor to register your Thoska


Johannes Kiesel's avatar
Johannes Kiesel committed
234
235
#### How to do work on code?
Be sure to check the generic [how to do work](#how-to-do-work) first.
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
236

Johannes Kiesel's avatar
update    
Johannes Kiesel committed
237
See our [project templates](https://webis.de/facilities.html?q=project+templates) for the usual project structure. In addition, you might add these directories:
Shahbaz Syed's avatar
Shahbaz Syed committed
238
```
Johannes Kiesel's avatar
Johannes Kiesel committed
239
data/       Input/result data (put intermediate data into .gitignore); see [how to do work on data](#how-to-do-work-on-data)
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
240
241
doc/        Documentation files, including presentations
material/   Papers, books, links, ... 
Shahbaz Syed's avatar
Shahbaz Syed committed
242
```
Shahbaz Syed's avatar
Shahbaz Syed committed
243

Johannes Kiesel's avatar
Johannes Kiesel committed
244
Check [how to do filenames](#how-to-do-filenames).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
245

Johannes Kiesel's avatar
Johannes Kiesel committed
246
247
#### How to do work on data?
Be sure to check the generic [how to do work](#how-to-do-work) first.
Shahbaz Syed's avatar
Shahbaz Syed committed
248

Johannes Kiesel's avatar
update    
Johannes Kiesel committed
249
- Only use the GitLab repository for small example data, or for resources up to 10 MB (like word lists).
Johannes Kiesel's avatar
Johannes Kiesel committed
250
- We use Ceph for everything else. You should be able to access the CephFS at `/mnt/ceph/storage` on your workstation [in the lab](#how-to-do-work-at-the-lab) or [remotely](#how-to-do-work-at-homesome-remote-place).
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
251
252
253
- Put data into `/mnt/ceph/storage/data-in-progress/`. The location inside `data-in-progress` should reflect the path of the project in GitLab. For example, the data for a repository `code-research/conversational-search/conversational-news` should be in `/mnt/ceph/storage/data-in-progress/data-research/conversational-search/conversational-news` (note that it should be `data-research` instead of `code-research`).
- Also see our [data page](https://webis.de/data.html) for an overview of the datasets we have, as they might be useful for you. You should be able to access the data at `/mnt/ceph/storage/corpora`.
- If you download a new dataset, ask your supervisor where to put it and to add an entry to the [data page](https://webis.de/data.html).
Shahbaz Syed's avatar
Shahbaz Syed committed
254

Martin Potthast's avatar
Martin Potthast committed
255
#### How to do work on web archive data?
Johannes Kiesel's avatar
Johannes Kiesel committed
256
257
- Log into the [webis jupyterlab](https://jupyter2.webis.de/) with your gitlab credentials
- Launch a new terminal and check out the [aitools4-aq-cluster-computing repository](https://git.webis.de/code-lib/aitools/aitools4-aq-cluster-computing)
Maik Fröbe's avatar
Maik Fröbe committed
258
259
260
261
262
- Ensure that you have a user directory in the HDFS (ask your supervisor to run the following in ssh.webis.de):
  ```
  HADOOP_USER_NAME=hdfs hdfs dfs -mkdir /user/<username>
  HADOOP_USER_NAME=hdfs hdfs dfs -chown -R <username>:<username> /user/<username>
  ```
Maik Fröbe's avatar
Maik Fröbe committed
263
264
- Ask your supervisor to put the S3 credentials of the `internet-archive-ro` user into `~/.aws/config`:
  ```
Maik Fröbe's avatar
Maik Fröbe committed
265
  [DEFAULT]
Maik Fröbe's avatar
Maik Fröbe committed
266
267
268
269
  host_base = s3.dw.webis.de:7480
  access_key=<TODO>
  secret_key=<TODO>
  ```
Yonlawan Yotinsoponkul's avatar
Yonlawan Yotinsoponkul committed
270
- Launch the notebook [web-archive-tutorial-beta1.ipynb](https://git.webis.de/code-lib/aitools/aitools4-aq-cluster-computing/-/blob/master/src/main/ipynb/web-archive-tutorial-beta1.ipynb) at [/aitools4-aq-cluster-computing/src/main/ipynb/](https://git.webis.de/code-lib/aitools/aitools4-aq-cluster-computing/-/tree/master/src/main/ipynb) from the file browser
Sven Fibelkorn's avatar
Sven Fibelkorn committed
271
- Run the cells to finish the tutorial
Johannes Kiesel's avatar
Johannes Kiesel committed
272

273
274
275
#### How to do work with Windows?
Using Windows, we suggest you set up your work environment in WSL 2. Please ensure the following:

276
- OpenVPN: [[docs]](https://kb.webis.de/services/openvpn/index.html#windows) (_required to access internal services_)
277
278
279
280
- Windows-Subsystem for Linux, version 2: [[docs]](https://docs.microsoft.com/de-de/windows/wsl/install), [[tutorial 1]](https://altis.com.au/installing-ubuntu-bash-for-windows-10-wsl2-setup/), [[tutorial 2]](https://petri.com/how-to-install-ubuntu-in-windows-10-with-wsl-2)
  - version "**2**" (_virtualized Linux kernel_), requires **Windows 10, Version 1903, Build 18362** or newer
  - you may need to check whether "Virtualization" is enabled (see Task Manager > "Performance" tab > "Virtual Machine") and enable it in BIOS if required
- Ceph: check out [how to use Ceph](#how-to-use-cephcephfss3),
281
282
  - install the custom WSL 2 kernel: [[docs]](https://kb.webis.de/services/ceph/cephfs-usage.html#kernel-compilation-for-cephfs-on-wsl2)
- Docker: [[docs]](https://docs.docker.com/desktop/windows/install/) (to use `docker`, `docker-compose`, `kubectl`; webis services, e.g. [kubernetes](https://kb.webis.de/k8s-manual/kubernetes-tutorial/index.html))
283
284
- be sure to check out the [onboarding steps](https://webis.de/for-students.html#onboarding)

Johannes Kiesel's avatar
Johannes Kiesel committed
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
#### How to do work with ...?
Be sure to check the generic [how to do work](#how-to-do-work) first.

- **Do not** shut down the machine
- **Do not** use the machine for sharing files
- You are responsible to keep the machine running and healthy
- Pay attention to the mails from the monitoring system
- If you see an error: fix it if possible and report in the mailing list or the Discord admin channel
- Check current load before starting your process (especially if you share the machine with others):
  - `htop` or `top` for CPU + main memory
  - `nvidia-smi` for GPU

Also check the more specific questions below.

##### How to do work with Betaweb?
Johannes Kiesel's avatar
Johannes Kiesel committed
300
Be sure to check the generic [how to do work with ...?](#how-to-do-work-with-) first.
Shahbaz Syed's avatar
Shahbaz Syed committed
301

Johannes Kiesel's avatar
update    
Johannes Kiesel committed
302
303
304
What do you want to do?
- Large batch tasks? Use [Hadoop](https://hadoop.apache.org/) and see our [cluster notes](https://webis.de/facilities.html#cluster-notes)
- Large-scale datascience? Use [Spark](https://spark.apache.org/) and see our [cluster notes](https://webis.de/facilities.html#cluster-notes)
Johannes Kiesel's avatar
Johannes Kiesel committed
305
- Setup a distributed service? Learn [how to do a demo/service?](#how-to-do-a-demoservice)
Shahbaz Syed's avatar
Shahbaz Syed committed
306

Maik Fröbe's avatar
Maik Fröbe committed
307
308
##### How to do work with Gammaweb / How to access GPUs (Slurm)?

309
Please see [How to use Slurm?](#how-to-use-slurm).
Shahbaz Syed's avatar
Shahbaz Syed committed
310

Johannes Kiesel's avatar
Johannes Kiesel committed
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
## How to fix ...?
Category of questions on fixing things that break regularly.

### How to fix Docker zombie processes?
Use the `--init` flag with `docker run` to prevent your containers from spawning zombie processes. So you have to *prevent* them, whereas you have to restart the machine to fix them. Since we do not want to restart our servers, best is to just *always* use `--init`. This is especially important if your container uses the GPU, as zombie processes still block the GPU resources. If needed, here is the [background](https://stackoverflow.com/questions/49162358/docker-init-zombies-why-does-it-matter) and [documentation](https://docs.docker.com/engine/reference/run/#specify-an-init-process).

An example of zombie processes (for GPU 0) on gammaweb:
<pre>
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0   1969683      C   -                                            903MiB |
|    0   2410464      C   -                                              8MiB |
|    1   2958983      C   /opt/conda/bin/python                        111MiB |
</pre>

### How to fix Eclipse ...?
See below.

#### How to fix Eclipse confusing CVS for Git?
Newer versions of Eclipse might associate CVS resources with Git. During checkout, Eclipse will "Auto share git projects"; right-clicking resources in the Project Explorer and selecting "Team" will then only give you the Git ("Commit...", "Push to origin...", "Pull", etc.) instead of the CVS options ("Update", etc.).

To solve this issue, go to Window -> Preferences, search for "Git", go to Team -> Git -> Projects and untick "Automatically share projects located in Git Repository".

#### How to fix Eclipse not recognizing a CVS project?
Newer versions of Eclipse might not recognize a directory as a CVS project:
- Right-clicking on the project -> "Team" does not show the regular CVS options ("Commit", "Update", etc.,)
- No server name tag beside the project (e.g. literature instead of literature [webis.uni-weimar.de])
To resolve this, right-click -> _Delete_  (only from workspace and _not_ from disk) ->  Undo delete _Ctrl+Z_
The server name tag should now be displayed beside the directory and the Team context menu has the usual CVS options.

Maybe related: [How to fix Eclipse confusing CVS for Git?](how-to-fix-eclipse-confusing-cvs-for-git)

Johannes Kiesel's avatar
Johannes Kiesel committed
345
346
347
348
349
350
351
352
353
### How to fix network/Internet/DNS on a webislab machine?
For still unknown reasons, the resolve demon sometimes stops working: you can only ping by IP (e.g., `ping 142.250.181.227`, but not `ping google.de`)
- Restart the resolv demon and set DNS
  ```
  sudo systemctl start systemd-resolved.service # restart demon
  # Check again... maybe that already did it. If not:
  sudo systemd-resolve --interface enp1s0f0 --set-dns 141.54.100.129  --set-domain medien.uni-weimar.de
  ```

354
### How to fix our servers at the Weimar library (how to get there)?
355
356
357
358
359
You need:
  - Thoska access (Audimax Vorbereitungsraum): Mail to C. Mohr ([Liegenschaften](https://www.uni-weimar.de/de/universitaet/struktur/zentrale-einrichtungen/servicezentrum-liegenschaften/) / Betriebstechnik / Sachgebiet Heizungs-, Lüftungs- und Sanitäranlagen)
  - Key from the Weimar keyring

Then go:
360
  - [Library (Audimax) backdoor](https://www.google.com/maps/dir//50.977826,11.3273931/@50.9778234,11.3268248,102) (stairs downwards between SCC and library; Thoska reader): DO NOT BLOCK IT (causes an alarm)
361
362
363
364
365
  - Door on the right (not the one into the Audimax)
  - Door directly in front of you (not the one upstairs; key; lock again when you leave)
  - Through the corridor (lots of heating stuff here)
  - Door on the left
  - Door on the left (key; you probably need to press yourself against the door to open it; lock again when you leave)
Johannes Kiesel's avatar
Johannes Kiesel committed
366

Johannes Kiesel's avatar
Johannes Kiesel committed
367
368
369
### How to fix totp.webis.de?
See [the repository](https://git.webis.de/code-admin/totp-service#troubleshooting) (requires GitLab admin rights).

Michael Völske's avatar
Michael Völske committed
370

Johannes Kiesel's avatar
Johannes Kiesel committed
371
## How to use ...?
Johannes Kiesel's avatar
Johannes Kiesel committed
372
Category of questions on one specific command, tool, service, cluster, and so on. It is not uncommon to have a question on the generic task under "[how to do ...?](how-to-do-)" that links to one or more "how to use" questions of the respective tools.
Johannes Kiesel's avatar
Johannes Kiesel committed
373
374
375
376

### How to use Ceph/CephFS/S3?
See our [cephfs documentation](https://webis.de/facilities.html?q=cephfs) and [S3 documentation](https://webis.de/facilities.html?q=s3).

377
For administration see the [admin knowledgebase](https://kb.webis.de/services/ceph/index.html). 
Johannes Kiesel's avatar
Johannes Kiesel committed
378
379
380
381
382
383
384

### How to use CVS?
See our [CVS documentation](https://webis.de/facilities.html#cvs) and the sub-questions below.

#### How to use CVS in Eclipse?
Install the *Eclipse CVS Client* to use CVS in Eclipse:
![eclipse-cvs-install](img/eclipse-cvs-install.png)
Ferdinand Schlatt's avatar
Ferdinand Schlatt committed
385
386
387
388
389

**Important** set the workspace encoding to ISO-8859-1
- Menu: Window -> Preferences
- Search for "encoding", go to Workspace
- Set "Text file ecoding" to ISO-8859-1 and Next text file line delimiter to Unix
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
- Other recommended options:
    ```
    eclipse.ini > -Xms1024m
    eclipse.ini > -Xmx2048m
    eclipse.ini > -Xverify:none
    General > Show heap status
    General > Startup and Shutdown > Plug-ins activated on startup > (Disable almost everything)
    General > Startup and Shutdown > Workspaces > Prompt for workspace on startup
    General > Workspace > Refresh using native hooks or polling
    General > Workspace > Text file encoding > Other > ISO-8859-1
    General > Workspace > New text file line delimiter > Other > Unix
    General > Workspace > Local History > Limit history size > Days to keep file > 1
    General > Workspace > Local History > Limit history size > Maximum entries per file > 1
    General > Workspace > Local History > Limit history size > MAximum file size > 1
    Install/Update > Automatic Updates > Automatically find new updates and notify me
    Run/Debug > Console > Console buffer size > 50000000
    Team > CVS > Connection > Quietness level > Somewhat quiet
    Team > CVS > Connection > Compression > 5
    Team > CVS > Console > Console buffer size > 50000000
    Team > CVS > Console > Show CVS console automatically when command is run
    Team > CVS > Update/Merge > When performing and update > Update all non-conflicting changes and then preview remaining conflicts
    Team > SVN > SVN interface > Client > SVNKit
    Team > SVN > Console > Console buffer size > 50000000
    Team > SVN > Console > Show SVN console automatically when command is run
    Validation > Disable all
    Validation > Suspend all validators
    ```
Ferdinand Schlatt's avatar
Ferdinand Schlatt committed
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435

**Add CVS repository to Eclipse**
- Menu: Window -> Show View -> Other...
- Select CVS -> CVS Repositories
- In the CVS View click Add CVS Repository
- Host: webis.uni-weimar.de
- Repository path: /srv/cvsroot
- User and Password as per Webis login
- Connection Type: extssh
- Use port: 22
- Validate connection on finish
 
**Import from within eclipse**
- Checkout folders by expanding HEAD and selecting Check Out in the right-click option menu

**Import existing local CVS repository**
- (Especially useful for windows users trying to use the `webis cvs update` command)
- File -> Import -> Existing Projects into Workspace -> Next
- Select the local folder to import into the workspace by clicking Browse... and navigating to the folder and hit finish (each folder must be imported manually)
Johannes Kiesel's avatar
Johannes Kiesel committed
436

Johannes Kiesel's avatar
Johannes Kiesel committed
437
If you run into troubles, check [how to fix Eclipse](#how-to-fix-eclipse-).
Johannes Kiesel's avatar
Johannes Kiesel committed
438

Jan Heinrich Reimer's avatar
Jan Heinrich Reimer committed
439
### How to use Doccano?
440
See the [usage instructions](https://kb.webis.de/services/doccano/index.html).
Jan Heinrich Reimer's avatar
Jan Heinrich Reimer committed
441

Johannes Kiesel's avatar
Johannes Kiesel committed
442
### How to use Docker?
Johannes Kiesel's avatar
Johannes Kiesel committed
443
See our [docker repository and documentation](https://webis.de/facilities.html?q=docker) and the [official tutorial](https://docs.docker.com/get-started/).
Johannes Kiesel's avatar
Johannes Kiesel committed
444
445
446
447
448
449
450
451
452
453
454

### How to use GIT?
See our [gitlab notes](https://webis.de/facilities.html?q=gitlab)

For cloning via SSH (recommended) [generate an SSH public key](https://git-scm.com/book/en/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key) and add it to your Gitlab profile [here](https://git.webis.de/profile/keys).

Further reading: [Pro Git](https://git-scm.com/book/en/v2).

### How to use Screen or tmux?
Both programs are very similar. Tmux is probably a bit easier to use for beginners. Read the *getting started* sections for [screen](https://www.gnu.org/software/screen/manual/screen.html#Getting-Started) and [tmux](https://github.com/tmux/tmux/wiki/Getting-Started) for more information on how to use them.

Maik Fröbe's avatar
Maik Fröbe committed
455
456
457
458
### How to use Slurm?

You need an gitlab account with [enabled SSH-Key authentication](https://git.webis.de/-/profile/keys) to run slurm jobs.
Access is only possible via SSH-Key(s), so please add (if not already done) the public key(s) that you want to use to your [gitlab account](https://git.webis.de/-/profile/keys).
459
Within the [webis vpn](https://webis.de/facilities.html?q=VPN), SSH into ssh.webis.de  by `ssh <YOUR-USER-NAME>@ssh.webis.de` to run slurm jobs. Make sure to use your private key (not the public key) for connecting, e.g. using: `ssh -i <YOUR-PRIVATE-KEY-FILE> <YOUR-USER-NAME>@ssh.webis.de`
Maik Fröbe's avatar
Maik Fröbe committed
460
461
462
463
464

Your home directory is mounted on all Gammawebs and located in the CephFS  at `/mnt/ceph/storage/data-tmp/<CURRENT-YEAR>/<YOUR-USER-NAME>`. You can access your home directory as any CephFS directory. **Attention: to maintain clean home directories, we will switch to new and empty home directories in `/mnt/ceph/storage/data-tmp/<NEXT-YEAR>/` at the turn of the year.** This switch will be announced by mail, and the old home directory remains in `data-tmp` for a while, still, do only store data in your home-directory that can be lost, as `data-tmp` indicates that there is no snapshotting, etc.

After login, you can run slurm jobs. E.g.: `srun hostname`.

465
Afterwards, please follow the [webis slurm user guide](https://kb.webis.de/services/slurm/user-guide.html) which provides examples on how to run jobs on Gammaweb with slurm.
466

Johannes Kiesel's avatar
Johannes Kiesel committed
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
### How to use SSH?
The [Secure Shell](https://en.wikipedia.org/wiki/Secure_Shell) (SSH) allows you to log into remote machines over the network, access the command line, transfer files, and set up port forwarding. 

If you're running Linux, MacOS, or a recently updated Windows 10, you can simply open a terminal and type

```
ssh user@hostname
```

to log into `hostname`, if you have access to an account named `user` there. On older Windows versions, you can install [PuTTY](https://www.putty.org/) to use SSH.

Our workstations require the use of a public key for SSH access from outside the university network: learn how to create yourself a key in [this tutorial](https://www.digitalocean.com/community/tutorials/ssh-essentials-working-with-ssh-servers-clients-and-keys). Ask your supervisor to place your public key on your machine or (preferably) do it yourself while you are at the university (both explained in the tutorial).

Further reading: [The Linux Command Line, Chapter 16](http://linuxcommand.org/tlcl.php); [How does SSH Work](https://www.hostinger.com/tutorials/ssh-tutorial-how-does-ssh-work)


### How to use SSH-tunnel?
There are three different possibilities: local port forwarding, remote port forwarding, and dynamic port forwarding, all of which are explained [here](https://help.ubuntu.com/community/SSH/OpenSSH/PortForwarding). For most simple use cases you will encounter, local port forwarding is usually the right choice. Dynamic port forwarding is more flexible, but if you really need it, you are probably better off using [VPN](#how-to-use-vpn) instead.

Further reading: [connection illustrations on StackOverflow](https://unix.stackexchange.com/a/115906/319554)

### How to use this FAQ?
If you are new to Webis, check [how to do first steps at Webis?](#how-to-do-first-steps-at-webis). Otherwise look at the [questions](#questions).

If you can not find your question: [ask](#how-to-ask-for-help), then [write a question and answer](#how-to-do-an-answer-for-this-faq) if you can. Open an issue otherwise.

If you find the answer to a question is outdated: fix it if you can. Open an issue otherwise.

### How to use VPN?
See our notes on [vpn](https://webis.de/facilities.html?q=vpn)
Johannes Kiesel's avatar
Johannes Kiesel committed
497
498


Johannes Kiesel's avatar
Johannes Kiesel committed
499
500
## Why ...?
Frequently needed justifications.
Johannes Kiesel's avatar
Johannes Kiesel committed
501

Johannes Kiesel's avatar
Johannes Kiesel committed
502
### Why do I get all these mails?
Johannes Kiesel's avatar
update    
Johannes Kiesel committed
503
504
505
506
Webis as a research network spans several universities. We as this group have a lot of [hardware](https://webis.de/facilities.html#hardware) and many students are in our group, who switch tasks regularly. Simply put: we can not administrate that you get just the mails that you need.

Usually you just need to pay attention to mails regarding your workstation. Isn't it nice that you get a mail in case your hard disk is full, so that you do not need to check on this while you run a program that takes a week to complete? This might seem like an edge case, but you suddenly might find yourself working on one of our big computation servers: we *regularly* get mails for these machines that the hard disk is full or that some process causes it to become irresponsive or to overheat and so on.