[01:13:40] hi, could someone make me a sysop on commons.wmflabs plz? [01:16:56] i'm trying to configure https://phabricator.wikimedia.org/T152553 [01:17:02] so that "data" namespace is CC0 [01:17:06] just like Help NS on mw.org [01:17:30] godog, ^? [01:24:11] 06Labs, 10Tool-Labs, 15User-bd808: BUB 503: AttributeError: 'module' object has no attribute 'python_2_unicode_compatible' - https://phabricator.wikimedia.org/T144554#2853129 (10scfc) 05Open>03Resolved a:05valhallasw>03bd808 This works now thanks to T147350: ``` tools.bub@tools-bastion-03:~$ time we... [01:28:22] 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 06Community-Tech-Tool-Labs: Develop evaluation criteria for comparing Platform as a Service (PaaS) solutions - https://phabricator.wikimedia.org/T136265#2853162 (10bd808) * FLOSS project with OSI/FSF/DFSG approved/compatible license * Public project roadmaps * Active... [01:28:33] 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 06Community-Tech-Tool-Labs: Develop evaluation criteria for comparing Platform as a Service (PaaS) solutions - https://phabricator.wikimedia.org/T136265#2853164 (10bd808) * Release packaging that is compatible with Wikimedia workflows (signed debs or easy to build lo... [01:39:37] godog, hi, still around? [01:45:10] 06Labs, 10Tool-Labs: Expand the Tool Labs definition of "free license" to include FSF-approved and DFSG-compatible licenses - https://phabricator.wikimedia.org/T152581#2853188 (10bd808) [01:51:52] yurik: hey, I don't know if my account has the right privileges but I can try, how do I do that? [02:01:43] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs: Expand the Tool Labs definition of "free license" to include FSF-approved and DFSG-compatible licenses - https://phabricator.wikimedia.org/T152581#2853207 (10bd808) I'm not generally concerned that the FSF and DFSG lists are not reputable, but I don't know if... [02:12:42] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Software-Licensing: Expand the Tool Labs definition of "free license" to include FSF-approved and DFSG-compatible licenses - https://phabricator.wikimedia.org/T152581#2853216 (10Legoktm) [02:35:00] godog, hmm, no idea tbh :) But its ok, i will try to do it from code.. .still experimenting [02:36:10] yurik: hehe ok, good luck [02:36:16] sigh :) [02:36:21] i'll need that :) [04:22:27] yurik: did you get your rights? [04:22:57] bd808, only responsibilities, no rights :( Its ok, i'm actually doing it the "proper" way now :) Thanks for asking! [04:32:41] 06Labs, 10Tool-Labs: Warnings/errors in /var/lib/gridengine/spool/qmaster/messages - https://phabricator.wikimedia.org/T152477#2853309 (10scfc) I have asked on [[http://serverfault.com/questions/819195/how-to-make-grid-master-accept-gone-hosts|serverfault.com]] regarding gone hosts, and will do so for "unable... [04:35:04] 06Labs, 10Tool-Labs: Warnings/errors in /var/lib/gridengine/spool/qmaster/messages - https://phabricator.wikimedia.org/T152477#2853310 (10scfc) (IIRC, restarting the master (process, that is) is no problem, as it does not keep state information in memory (in constrast to `execd`s). But I wouldn't consider thi... [04:39:03] 10Tool-Labs-tools-Pageviews: Parse protocol of Massviews external links - https://phabricator.wikimedia.org/T151463#2853311 (10MusikAnimal) 05Open>03Resolved a:03MusikAnimal @Samwalton9 Should be all set (see results for [[ http://tools.wmflabs.org/massviews/?platform=all-access&agent=user&source=external-... [05:55:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:30:41] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:48:21] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:28:23] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [07:33:00] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Glorian WD was created, changed by Glorian WD link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Glorian_WD edit summary: Created page with "{{Tools Access Request |Justification=I want to connect to Wikidata database replicas |Completed=false |User Name=Glorian WD }}" [07:35:14] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2853488 (10Marostegui) The 3 new labsdb hosts and sanitarium2 have now gtid_domain_id variable deployed and enabled. [09:18:59] 06Labs, 10Labs-Infrastructure, 10Tool-Labs, 10DBA, 10Wikimedia-Developer-Summit (2017): Labsdbs for WMF tools and contributors: get more data, faster - https://phabricator.wikimedia.org/T149624#2853591 (10Qgil) ... on the other hand this basically looks like a proposal for a presentation/training session... [09:23:14] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2853613 (10Marostegui) I have seen that: `modules/role/manifests/labs/db/replica.pp` already includes the firewall class: ``... [09:29:20] !log tools clush -g k8s-worker -g k8s-master -g webproxy -b 'sudo puppet agent --disable "Deploying k8s change with alex"' [09:29:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:29:26] akosiaris: going to !log here [09:30:08] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Provision sanitized data on labsdb1009, labsdb1010, labsdb1011 with from db1095 - https://phabricator.wikimedia.org/T152194#2853637 (10jcrespo) We probably need custom rules, rather than `role::mariadb::ferm`. [09:30:37] YuviPanda: just to be clear. we are talking about https://gerrit.wikimedia.org/r/324210 and https://gerrit.wikimedia.org/r/324211 only, right [09:30:38] ? [09:30:49] akosiaris: yes [09:30:53] ok [09:32:09] !log tools cherry-pick https://gerrit.wikimedia.org/r/324210 and https://gerrit.wikimedia.org/r/324211 [09:32:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:32:21] akosiaris: am going to start with tools-proxy-02, which is our standby proxy. should help see if kube-proxy continues to work [09:36:08] akosiaris: I've run into unrelated issues that I'm going to now look at [09:36:22] ok. lemme know if you need anything [09:36:36] akosiaris: ok [09:43:13] akosiaris: interesting, our redis slaves are totally out of date [09:43:40] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:44:06] hmm [09:45:20] !log tools restart redis on tools-proxy-02 [09:45:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [09:45:28] hmmmmm [09:45:30] still outta date [09:45:52] hahaha ofc [09:45:58] tools-proxy-02 can't connect to -01 [09:47:01] because tools-redis is listening on localhost, not 0.0.0.0 [09:47:04] even though we do have iptables rules [09:49:21] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:58:00] akosiaris: aaah. Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid parameter template on File[/etc/default/kube-proxy] at /etc/puppet/modules/k8s/manifests/proxy.pp:26 on node tools-proxy-02.tools.eqiad.wmflabs [09:58:16] akosiaris: can you take a look at that while I fix the redis stuff? [09:59:04] hmm looking [09:59:51] damn.. idiotic mistake. amending patch [10:01:22] YuviPanda: https://gerrit.wikimedia.org/r/324211 amended [10:01:33] akosiaris: ok! [10:02:42] 06Labs, 10Tool-Labs: dplbot webservice on Tools Labs fails repeatedly - https://phabricator.wikimedia.org/T115231#2853660 (10yuvipanda) (I've fixed the replication between the proxies with https://gerrit.wikimedia.org/r/#/c/325751/) [10:03:45] PROBLEM - Puppet run on tools-proxy-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [10:05:14] akosiaris: hmm, isn't looking good [10:05:15] Dec 07 10:05:02 tools-proxy-02 kube-proxy[8951]: E1207 10:05:02.354276 8951 reflector.go:205] pkg/proxy/config/api.go:30: Failed to list *api.Service: the server could not find the requested resource (get services) [10:07:46] hmm, I see /usr/bin/kube-proxy --master=127.0.0.1:8080 $DAEMON_ARGS --masquerade-all=true running [10:08:01] akosiaris: strings /proc/9622/environ [10:08:04] akosiaris: DAEMON_ARGS=$DAEMON_ARGS --masquerade-all=true [10:08:09] I don't think that's right [10:08:44] RECOVERY - Puppet run on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:09:15] akosiaris: oh, I missed the master as well - it never is 127.0.0.1 [10:09:15] at lal [10:09:16] *all [10:09:20] kube proxy never runs on the master [10:09:27] yes, it should have been more like DAEMON_ARGS=--kubeconfig=/etc/kubernetes/kubeconfig --proxy-mode='iptables' --masquerade-all=true [10:09:35] I see the correct config i the default file so we should get rid of that [10:09:40] no that's the "sane" default.. it's meant to be overriden [10:09:45] akosiaris: aaah, I know what's going on [10:09:57] akosiaris: systemd treats defautls files completely differently :D [10:10:00] akosiaris: as key value pairs [10:10:02] not as BASH [10:10:11] no environ expansion there at all :) [10:10:23] akosiaris: hmm, I think it's an insane value :D we should just get rid of it. [10:10:31] since it'll never be true [10:10:38] akosiaris: so you need to amend to make it be just one line [10:11:46] sigh, yeah you are right [10:11:53] ok amending [10:12:44] gonna take me a while, the templates have some logic embedded [10:13:59] akosiaris: :D ok [10:14:00] np [10:23:41] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:24:23] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [10:28:58] 06Labs, 10Tool-Labs: Redis replication from tools-proxy-01 to tools-proxy-02 broken - https://phabricator.wikimedia.org/T152356#2853679 (10yuvipanda) 05Open>03Resolved a:03yuvipanda This was fixed with https://phabricator.wikimedia.org/T152356 - thanks :D [10:44:16] YuviPanda: so, changes amended, taking one last look [10:44:16] btw [10:44:19] ssh tools-puppetmaster-01.tools.eqiad.wmflabs [10:44:19] channel 0: open failed: administratively prohibited: open failed [10:44:23] some firewall in place ? [10:44:25] akosiaris: -02 [10:44:29] ah [10:44:37] hmm then why do the hosts still reference -01 ? [10:44:52] server = tools-puppetmaster-01.tools.eqiad.wmflabs [10:44:52] in tools-proxy-02 [10:45:13] oh? [10:45:20] do they? let me look [10:45:26] well at least one [10:45:33] hahahahahaha [10:45:41] akosiaris: magic of role::puppet::self and our funky puppet code [10:45:47] akosiaris: there shall be two server = lines probably [10:47:14] arg [10:47:23] 2 [agent] ... [10:47:26] what ? [10:47:30] ok that's messed up [10:47:53] 2 [main] sections, 2 [agent] sections [10:47:56] akosiaris: yeah, look at the code that constructs puppet.conf [10:48:03] just concatenates [10:48:08] doesn't really update [10:48:17] which we are capable of doing now IIRC [10:48:21] not just in labs, this is the same code for prod too. just not as visible in prod [10:48:21] akosiaris: yup [10:48:28] anyway this is not for now [10:49:09] so, the logic now is in the ERB in https://gerrit.wikimedia.org/r/#/c/324210/5/modules/k8s/templates/kubelet.default.erb [10:49:19] akosiaris: indeed [10:49:23] and https://gerrit.wikimedia.org/r/#/c/324211/5/modules/k8s/templates/kube-proxy.default.erb [10:49:30] so, that should make systemd happy [10:49:32] akosiaris: ok let me re-try [10:49:36] cool [10:49:48] akosiaris: we should get rid of Environment=KUBE_MASTER=--master=127.0.0.1:8080 [10:50:53] ah yes I forgot that [10:50:56] lemme remove it [10:51:45] akosiaris: ok! [10:52:37] amended, patch uploaded. [10:53:00] akosiaris: ok, testing now [10:56:15] akosiaris: hmm still fucked [10:56:46] arg.. the missing doublequote ? [10:56:53] sigh [10:56:55] akosiaris: even better [10:56:55] strings /proc/12614/environ [10:56:59] there's no DAEMON_FLAGS at all [10:57:04] :D [10:57:06] DAEMON_ARGS="--kubeconfig=/etc/kubernetes/kubeconfig --proxy-mode='iptables' --masquarade-all=true [10:57:12] Just PATH and LANG [10:57:21] akosiaris: not being picked up [10:57:23] I so.. the missing double quote [10:57:28] lemme check it [10:57:47] strings /proc/12614/environ [10:57:49] err [10:57:52] DAEMON_ARGS="--kubeconfig=/etc/kubernetes/kubeconfig --proxy-mode='iptables' --masquarade-all=true" [10:57:54] I do see double quotes [10:58:04] I just added it manually [10:58:44] Neither --kubeconfig nor --master was specified. Using default API client. This might not work. [10:58:49] akosiaris: ah, right. ok [10:58:50] that's better [10:58:54] akosiaris: sorry I ran puppet again lol [10:59:00] lol [10:59:03] re-added manually [10:59:03] no worries [10:59:16] akosiaris: even better, it fails to start now [10:59:28] unknown flag: --masquarade-all [10:59:35] sigh [10:59:40] damn.. not my day [11:01:09] YuviPanda: ok success [11:01:18] it seems to be working now [11:01:28] amending the changes to reflect my manual patches [11:01:37] so the proxy seems to be working [11:01:40] that's something [11:01:51] akosiaris: yup, I see the rules now [11:01:52] how do we test it's actually working as we want it too ? [11:01:57] I had manually flushed iptables earlier [11:01:58] with iptables -t nat -F [11:02:01] now I see 'em [11:02:02] ah ok [11:02:06] even better [11:02:58] akosiaris: ok, I deem the kube-proxy stuff a success \o/ [11:03:03] :-) [11:03:11] akosiaris: let me know when you push the changes, and we can now move on to kubelet [11:03:11] perfect.. let's kill a kubelet now :P [11:03:24] changes pushed [11:03:33] akosiaris: :D ok, let me pick 'em up [11:03:40] akosiaris: am going to do tools-worker-1001 [11:07:52] akosiaris: btw, am not draining it or whatever - pods should continue running when kubelet's down [11:08:27] let's see if that should hold true :P [11:10:09] akosiaris: ok, first step that I missed is that it's using http:// for k8s master url should be https [11:10:14] I fixed manually, let me look at next step [11:10:25] akosiaris: now it isn't getting its credentials [11:11:35] akosiaris: ok, the exact same daemon_args issue :) [11:12:05] ? I did add a double quote there for sure [11:12:19] hmmm [11:13:24] akosiaris: systemd just doesn't support multiple assignments nor env expansion [11:13:25] in default files [11:13:55] er, I 've changed this for sure [11:14:32] maybe the new changes isn't cherry-picked on the puppetmaster ? [11:15:11] akosiaris: ah, yes, I didn't update kubelet [11:15:12] https://gerrit.wikimedia.org/r/#/c/324210/5/modules/k8s/templates/kubelet.default.erb,unified [11:15:17] akosiaris: let me do that [11:20:21] akosiaris: ok, other than the http -> https it works fine [11:23:07] :-) [11:23:12] ok amending that and uploading [11:23:35] \o/ [11:25:32] and change amended [11:26:15] akosiaris: ok, testing [11:27:12] akosiaris: ok, wanna merge? [11:27:24] yes, doing so now [11:27:31] akosiaris: cool. thanks :D [11:27:33] thanks for the help ! [11:27:40] :-) [11:27:40] akosiaris: np! [11:31:03] Hi there [11:31:19] I'm maintaining a tool for a project [11:31:25] It called algo-news [11:33:00] I need a long running task runner. I'm using Celery because migrating to open grid is not worth it until the project proofs itself [11:33:24] So I'm using jstart to run a celery worker [11:33:53] However I have to specify -mem 2048 to make it work, which is outragous for the memory it needs [11:34:26] hi fako [11:34:45] If I don't specify -mem 2048 I'm pretty sure that OS sends SIGTERM and Celery starts to shut down [11:34:48] two things - gridengine's memory accounting is bizzare, so I wouldn't count that as an indictment of celery's memory use :) [11:35:06] Haha, ok, that's good to know :) [11:35:09] second, celery is probably the right thing to do anyway, rather than shelling out to gridengine [11:35:38] Is there a celery running that I can use, instead of rolling my own on the grid? [11:36:00] fako: nope, celery isn't multi-tenant so you've to run your own anyway [11:36:16] And is it then ok to specify -mem 2048 for the task? [11:36:29] 2048m that is [11:36:31] fako: yup [11:37:09] Alright. Then let me know if it does turn out to be a problem. [11:37:12] Thanks :) [11:37:40] fako: \o/ cool [12:24:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [12:44:42] 06Labs, 10Phlogiston (Technical Debt): phlogiston-2 hangs every week - https://phabricator.wikimedia.org/T129891#2853793 (10hashar) 05Open>03Resolved a:03hashar CI had the same issue with jbd2/vda blocking (T138281) and I am pretty sure it was due to a kernel soft lock T138281#2395843 then from a quote:... [12:45:24] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2458257 (10hashar) 05Open>03Resolved CI had the same issue with jbd2/vda blocking (T138281) and I am pretty sure it was due to a kernel soft lock T138281#2395843 . I have closed {T1298... [12:50:40] 06Labs, 10Phlogiston (Technical Debt): phlogiston-2 hangs every week - https://phabricator.wikimedia.org/T129891#2853832 (10hashar) [12:50:43] 06Labs, 10Labs-Infrastructure: Instance deadlocking in June 2016 - https://phabricator.wikimedia.org/T152599#2853829 (10hashar) [12:52:44] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Track labs instances hanging - https://phabricator.wikimedia.org/T141673#2853841 (10hashar) [12:52:47] 06Labs, 10Labs-Infrastructure, 07Tracking: Instance deadlocking in June 2016 - https://phabricator.wikimedia.org/T152599#2853834 (10hashar) 05Open>03Resolved a:03hashar The actual issue is fixed. I filled this task as an umbrella #tracking task for several related issues (T129891 T138281 T140256). It... [13:04:01] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [13:59:00] bd808_: https://paws-public.wmflabs.org/paws-public/46618563/Latest%20Population.ipynb#Preface:-pywikibot-update is a wonderful bit of documentation/code [14:32:21] 10Tool-Labs-tools-Pageviews: Allow Langviews tool to track multiple articles, with reference to its Wikidata Q number - https://phabricator.wikimedia.org/T151888#2854066 (10Wittylama) @MusikAnimal yes, that is a specific case, but it's just a case that I know of. The hashtag function is a potentially good one f... [14:34:31] hi chasemp, the login problem was solved thanks to your help [14:34:49] great, glad to hear it :) [15:05:01] 06Labs, 10Labs-Infrastructure, 06Operations, 10netops, 10wikitech.wikimedia.org: Move novaobserver (and novaadmin) users out of ldap - https://phabricator.wikimedia.org/T152215#2854130 (10Krenair) [15:08:25] 06Labs, 10Labs-Infrastructure, 06Operations, 10netops, and 3 others: Provide read-only access to OpenStack APIs from WMF IP space - https://phabricator.wikimedia.org/T150092#2854135 (10Andrew) [15:08:29] 06Labs, 10Labs-Infrastructure, 06Operations, 10netops, 10wikitech.wikimedia.org: Move novaobserver (and novaadmin) users out of ldap - https://phabricator.wikimedia.org/T152215#2854133 (10Andrew) 05Open>03Resolved This should be resolved by https://gerrit.wikimedia.org/r/#/c/325371/ [15:24:08] !log tools.stewardbots Update access list. M7 resigned :-( [15:24:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [15:24:10] M7: Login Screen mockup - https://phabricator.wikimedia.org/M7 [15:51:45] 06Labs, 10Labs-Infrastructure, 10Tool-Labs, 10DBA, 10Wikimedia-Developer-Summit (2017): Labsdbs for WMF tools and contributors: get more data, faster - https://phabricator.wikimedia.org/T149624#2854197 (10bd808) >>! In T149624#2853591, @Qgil wrote: > ... on the other hand this basically looks like a prop... [15:54:08] RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 16%, RTA = 1.85 ms [15:57:47] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [16:16:41] RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 2.67 ms [16:22:32] PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218) [16:23:26] RECOVERY - Host tools-secgroup-test-102 is UP: PING OK - Packet loss = 0%, RTA = 11.42 ms [16:25:28] PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170) [16:36:40] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:44:30] hey YuviPanda! i'm trying to login to paws internal using my wikitech login to test it out and keep getting a invalid username or password error... any ideas how i can troubleshoot? [17:10:13] 06Labs, 10Tool-Labs, 10labs-sprint-116, 10labs-sprint-117, 10labs-sprint-118: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#2854466 (10bd808) The 2016 Tool Labs survey included responses from several people pointing out how difficult moving files in and out of the Tool Lab... [17:19:21] YuviPanda: finally read that notebook. It is a great writeup for sure. I love how it shows the fully normal "meh I'll just run it again" parts of debugging. [17:20:04] (for those without lots of backscroll -- https://paws-public.wmflabs.org/paws-public/46618563/Latest%20Population.ipynb ) [17:36:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:42:51] 06Labs, 10Tool-Labs: tools.suggestbot web requests fail after a period of time - https://phabricator.wikimedia.org/T133090#2854665 (10Nettrom) I just want the record to show my appreciation for your work on this @scfc, thank you so much for figuring this out! I'll look into switching the `qsub` calls with `jsu... [17:45:08] zareen: hey! sorry was away [17:46:59] hey YuviPanda, no worries, but i would love to test paws internal if i could log in :) [17:47:14] zareen: yeah, let's debug it together. [17:47:20] zareen: gimme a min, let me ssh in [17:47:59] bd808: yeah, that's one of the reasons I love notebooks! None of that 'this code was formed in my head perfect' stuff :D [17:48:15] zareen: I see > User Zareen not in researchers group [17:48:40] zareen: hmm, try logging in with 'zareen' (small z)? [17:48:54] YuviPanda: i think that's from stat1003 server and a separate issue [17:49:07] zareen: nope, it's using the same infrastructure [17:49:27] zareen: paws-internal is setup to correspond to stat1003 in most cases [17:49:41] (with a notebook rather than ssh being primary interface) [17:50:09] YuviPanda: ah, i see. well that was for a separate project on my end so i'll close that terminal and try ssh ing to paws internal again [17:50:45] zareen: you can use stat1003 and notebook at the same time :D no issues there [17:50:54] zareen: also this is your first time trying to use notebooks right? [17:51:10] YuviPanda: paws internal notebooks, yeah [17:51:26] YuviPanda : i can get to http://localhost:8000 fine [17:51:27] zareen: ok! should we switch to #wikimedia-research channel maybe, to not flood here? [17:51:43] sure [18:12:55] (03PS2) 10BryanDavis: Add parsley javascript library [labs/striker/staticfiles] - 10https://gerrit.wikimedia.org/r/313152 (https://phabricator.wikimedia.org/T144710) [18:13:08] (03CR) 10BryanDavis: Add parsley javascript library [labs/striker/staticfiles] - 10https://gerrit.wikimedia.org/r/313152 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [18:15:11] (03CR) 10BryanDavis: [C: 032] Add parsley javascript library [labs/striker/staticfiles] - 10https://gerrit.wikimedia.org/r/313152 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [18:15:17] (03Merged) 10jenkins-bot: Add parsley javascript library [labs/striker/staticfiles] - 10https://gerrit.wikimedia.org/r/313152 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [18:25:51] (03PS1) 10BryanDavis: [WIP] Bump striker, static, and wheels submodules [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/325814 [18:50:22] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:55:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [19:06:48] PROBLEM - Puppet run on tools-bastion-05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:10:12] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [19:14:41] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [19:38:06] 10PAWS: Support embedding WDQS results in PAWS - https://phabricator.wikimedia.org/T152623#2855137 (10WikidataFacts) [21:00:05] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:01:59] (03PS1) 10BryanDavis: Add labs root key for bd808 [labs/private] - 10https://gerrit.wikimedia.org/r/325824 (https://phabricator.wikimedia.org/T152520) [21:05:40] PROBLEM - Puppet staleness on tools-worker-1025 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [21:08:36] PROBLEM - Puppet staleness on tools-worker-1013 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [21:08:40] PROBLEM - Puppet staleness on tools-worker-1007 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0] [21:09:02] PROBLEM - Puppet staleness on tools-worker-1002 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [43200.0] [21:09:24] PROBLEM - Puppet staleness on tools-worker-1023 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0] [21:09:51] ^ I'm sorting these out [21:11:47] RECOVERY - Puppet run on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [21:12:19] PROBLEM - Puppet staleness on tools-worker-1016 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [21:14:51] PROBLEM - Puppet staleness on tools-worker-1009 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [43200.0] [21:15:13] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [21:16:01] PROBLEM - Puppet staleness on tools-worker-1017 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [43200.0] [21:16:14] PROBLEM - Puppet staleness on tools-worker-1011 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [21:18:42] PROBLEM - Puppet staleness on tools-worker-1018 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [43200.0] [21:18:56] PROBLEM - Puppet staleness on tools-worker-1006 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [43200.0] [21:19:12] PROBLEM - Puppet staleness on tools-worker-1012 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0] [21:19:28] PROBLEM - Puppet staleness on tools-worker-1020 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [21:19:34] (03CR) 10Alex Monk: "test" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/321335 (owner: 10Alex Monk) [21:20:02] PROBLEM - Puppet staleness on tools-worker-1021 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [21:24:09] PROBLEM - Puppet staleness on tools-worker-1005 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [21:24:33] PROBLEM - Puppet staleness on tools-worker-1010 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [43200.0] [21:24:43] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:24:49] PROBLEM - Puppet staleness on tools-worker-1008 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0] [21:25:23] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [21:27:27] PROBLEM - Puppet staleness on tools-worker-1022 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [21:27:43] PROBLEM - Puppet staleness on tools-worker-1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [21:29:40] PROBLEM - Puppet staleness on tools-worker-1019 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [43200.0] [21:30:46] PROBLEM - Puppet staleness on tools-worker-1015 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [43200.0] [21:34:25] Did shinken get over being mad about me removing the precise gird nodes? [21:35:13] bd808: looks like [21:37:58] there are a bunch of ancient High iowait alerts for tools-exec-12* [21:38:06] (and other hosts) [21:38:39] 06Labs, 06Operations: Explore hosting the multimedia commons use case - https://phabricator.wikimedia.org/T152632#2855573 (10chasemp) [21:38:51] bd808: I just set such hosts as 'expected down' until the year 3000 [21:39:03] if shinken is still there by then, we have bigger problems [21:39:10] RECOVERY - Puppet staleness on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:39:10] RECOVERY - Puppet staleness on tools-worker-1012 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:39:19] (03PS1) 10Alex Monk: Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 [21:39:24] RECOVERY - Puppet staleness on tools-worker-1023 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:39:28] RECOVERY - Puppet staleness on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:39:32] RECOVERY - Puppet staleness on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:39:40] RECOVERY - Puppet staleness on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:39:50] RECOVERY - Puppet staleness on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:39:51] bd808: sometime https://grafana.wikimedia.org/dashboard/db/labvirt-node-disk-stats gives clues [21:39:52] RECOVERY - Puppet staleness on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:40:02] RECOVERY - Puppet staleness on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:40:07] (03CR) 10Alex Monk: [C: 04-2] Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:40:11] (03CR) 10Alex Monk: [C: 04-1] Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:40:15] (03CR) 10Alex Monk: Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:40:19] (03CR) 10Alex Monk: [C: 031] Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:40:19] labvirt1003 had high io for a while though [21:40:24] (03CR) 10Alex Monk: [C: 032] Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:40:28] (03CR) 10Alex Monk: [V: 04-1] Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:40:34] (03CR) 10Alex Monk: Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:40:41] (03CR) 10Alex Monk: [V: 031] Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:40:42] RECOVERY - Puppet staleness on tools-worker-1025 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:40:46] (03CR) 10Alex Monk: [V: 032] Don't show approvals as 0 with new Gerrit version [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:40:46] RECOVERY - Puppet staleness on tools-worker-1015 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:40:52] (03CR) 10Alex Monk: "Okay, this works." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/325839 (owner: 10Alex Monk) [21:41:05] RECOVERY - Puppet staleness on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:41:15] RECOVERY - Puppet staleness on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:42:17] RECOVERY - Puppet staleness on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:42:18] (03CR) 10Andrew Bogott: [C: 031] Add labs root key for bd808 [labs/private] - 10https://gerrit.wikimedia.org/r/325824 (https://phabricator.wikimedia.org/T152520) (owner: 10BryanDavis) [21:42:27] RECOVERY - Puppet staleness on tools-worker-1022 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:42:45] RECOVERY - Puppet staleness on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:42:53] (03CR) 10Yuvipanda: [C: 031] Add labs root key for bd808 [labs/private] - 10https://gerrit.wikimedia.org/r/325824 (https://phabricator.wikimedia.org/T152520) (owner: 10BryanDavis) [21:43:35] RECOVERY - Puppet staleness on tools-worker-1013 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:43:43] RECOVERY - Puppet staleness on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:43:43] RECOVERY - Puppet staleness on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:43:55] RECOVERY - Puppet staleness on tools-worker-1006 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:44:01] RECOVERY - Puppet staleness on tools-worker-1002 is OK: OK: Less than 1.00% above the threshold [3600.0] [21:54:05] 06Labs, 06Operations: Explore hosting the multimedia commons use case - https://phabricator.wikimedia.org/T152632#2855625 (10chasemp) [22:03:23] 06Labs, 10Tool-Labs, 07Epic: Phase out precise instances from toollabs - https://phabricator.wikimedia.org/T94790#2855652 (10bd808) [22:03:26] 06Labs, 10Tool-Labs, 15User-bd808: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980#2855649 (10bd808) 05Open>03Resolved Shinken seems to be happy now and the last patch has been merged. [22:11:19] (03PS1) 10BryanDavis: Add wheels for formtools, parsley, and mwclient [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/325850 (https://phabricator.wikimedia.org/T144710) [22:13:35] (03CR) 10BryanDavis: [C: 032] Add wheels for formtools, parsley, and mwclient [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/325850 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [22:13:43] (03Merged) 10jenkins-bot: Add wheels for formtools, parsley, and mwclient [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/325850 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [22:20:34] (03PS2) 10BryanDavis: Bump static, striker, and wheels submodules [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/325814 [22:20:49] 10PAWS: Support embedding WDQS results in PAWS - https://phabricator.wikimedia.org/T152623#2855720 (10WikidataFacts) Actually, @yuvipanda suggested to use an `IPython.display.IFrame` directly instead of an `ipywidgets.HTML` with an `