[00:00:30] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404379 (owner: 10L10n-bot) [09:51:29] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10ORES, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Special:RecentChanges broken on Jenkins slaves - https://phabricator.wikimedia.org/T184938#3902160 (10zeljkofilipin) p:05Triage>03Low a:03zeljkofilipin [12:11:20] 10Scoring-platform-team (Current), 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Train/test reverted model for Catalan Wikipedia - https://phabricator.wikimedia.org/T182611#3902414 (10Ladsgroup) https://github.com/wiki-ai/editquality/pull/113 [12:14:39] wiki-ai/editquality#41 (cawiki_reverted - 35e77bd : Amir Sarabadani): The build passed. https://travis-ci.org/wiki-ai/editquality/builds/329436803 [14:29:22] o/ [14:29:57] 10Scoring-platform-team, 10MediaWiki-Vagrant, 10Patch-For-Review: Clean up ORES vagrant role - https://phabricator.wikimedia.org/T181850#3902712 (10awight) p:05Normal>03Low [14:32:09] * halfak starts digging through review work :) [14:48:43] Amir1: fwiw, I think that our class-level @covers annotations are defeating the function-level annotations. [14:50:32] For example, we never declare that a test covers getDamagingStructuredFiltersOnChangesList, but see https://doc.wikimedia.org/cover-extensions/ORES/includes/Hooks/ChangesListHooksHandler.php.html#260 [14:51:01] Seems to be due to the class-level @covers ChangesListHooksHandler [14:53:56] (03CR) 10Awight: [C: 032] "This is sweet, but I kinda prefer the explicit expectations. It's very hard to see what the expected results are, now that they're buried" (033 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404290 (https://phabricator.wikimedia.org/T184142) (owner: 10Ladsgroup) [14:54:44] Amir1: I want to rework and finish https://gerrit.wikimedia.org/r/#/c/403870/, but maybe we should discuss the best approach, first? ^ [14:55:42] (03Merged) 10jenkins-bot: Tests for ScoreFetcher [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404290 (https://phabricator.wikimedia.org/T184142) (owner: 10Ladsgroup) [14:56:34] awight: hey, sorry was meeting, It's Wikidata day, can I look into this tomororw? [14:56:43] Amir1: works for me! [14:57:08] I might go ahead and do a quick static expectations patch just for proof of concept. [15:00:02] Amir1: Oh, and I’m losing my mind. It’s not the expectations I’ve been furrowing my brow at, it’s the mock object, which is great just as it is. [15:02:50] (03CR) 10jenkins-bot: Tests for ScoreFetcher [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404290 (https://phabricator.wikimedia.org/T184142) (owner: 10Ladsgroup) [15:25:56] 10Scoring-platform-team, 10Wikilabels, 10editquality-modeling, 10User-Tgr, 10artificial-intelligence: Complete edit quality campaign for Hungarian Wikipedia - https://phabricator.wikimedia.org/T167968#3902886 (10Halfak) I've found the problem! huwiki is one of the datasets where we mixed edits that seem... [15:43:08] awight: no you don't, this whole thing is very complex :/ [15:45:22] (03PS4) 10Awight: Tests for PopulateDatabase [extensions/ORES] - 10https://gerrit.wikimedia.org/r/403870 (https://phabricator.wikimedia.org/T184140) [15:45:24] (03PS1) 10Awight: Move mock ORESService into a new helper class [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404473 (https://phabricator.wikimedia.org/T184142) [15:45:26] (03PS1) 10Awight: Minor test clean up [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404474 [15:46:55] Amir1: Thanks for writing that mock class, it cut the test size by exactly 1/2, leaving almost exclusively fixture data. [15:47:15] (03CR) 10jerkins-bot: [V: 04-1] Tests for PopulateDatabase [extensions/ORES] - 10https://gerrit.wikimedia.org/r/403870 (https://phabricator.wikimedia.org/T184140) (owner: 10Awight) [15:47:27] (03CR) 10Ladsgroup: Tests for PopulateDatabase (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/403870 (https://phabricator.wikimedia.org/T184140) (owner: 10Awight) [15:47:56] * Amir1 dances a little [15:48:18] (03PS5) 10Awight: Tests for PopulateDatabase [extensions/ORES] - 10https://gerrit.wikimedia.org/r/403870 (https://phabricator.wikimedia.org/T184140) [15:48:23] (03CR) 10Ladsgroup: [C: 032] Move mock ORESService into a new helper class [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404473 (https://phabricator.wikimedia.org/T184142) (owner: 10Awight) [15:48:24] lmao [15:49:20] Time for a little community service / tribute [15:50:24] (03Merged) 10jenkins-bot: Move mock ORESService into a new helper class [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404473 (https://phabricator.wikimedia.org/T184142) (owner: 10Awight) [15:53:45] (03CR) 10Ladsgroup: [C: 032] Minor test clean up [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404474 (owner: 10Awight) [15:54:26] (03CR) 10jenkins-bot: Move mock ORESService into a new helper class [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404473 (https://phabricator.wikimedia.org/T184142) (owner: 10Awight) [15:56:16] (03Merged) 10jenkins-bot: Minor test clean up [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404474 (owner: 10Awight) [15:59:20] (03CR) 10jenkins-bot: Minor test clean up [extensions/ORES] - 10https://gerrit.wikimedia.org/r/404474 (owner: 10Awight) [16:31:22] back in 1-2hr [17:12:55] PROBLEM - ssh on ORES-worker06.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:13:08] PROBLEM - ping4 on ORES-worker06.experimental is CRITICAL: PING CRITICAL - Packet loss = 100% [17:13:46] PROBLEM - check disk on ORES-worker06.experimental is UNKNOWN: [17:13:55] PROBLEM - Host ORES-worker06.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-06.ores.eqiad.wmflabs) [17:14:13] ^ Probably a rolling restart [17:15:36] halfak: what exactly is a stretch instance? [17:15:53] Debian Stretch (As opposed to Debian Jessie) :) [17:23:09] RECOVERY - Host ORES-worker06.experimental is UP: PING OK - Packet loss = 0%, RTA = 2.48 ms [17:23:14] RECOVERY - ping4 on ORES-worker06.experimental is OK: PING OK - Packet loss = 0%, RTA = 3.94 ms [17:23:41] RECOVERY - ssh on ORES-worker06.experimental is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [17:23:43] RECOVERY - check disk on ORES-worker06.experimental is OK: DISK OK [17:32:26] PROBLEM - ssh on ORES-worker09.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:32:34] halfak: I intend to plot the ROC_AUC curve, how can I retrieve datapoints for TPR vs FPR from model info? [17:32:44] PROBLEM - ping4 on ORES-worker09.experimental is CRITICAL: PING CRITICAL - Packet loss = 100% [17:33:40] PROBLEM - Host ORES-worker09.experimental is DOWN: PING CRITICAL - Packet loss = 100% [17:43:41] RECOVERY - Host ORES-worker09.experimental is UP: PING OK - Packet loss = 0%, RTA = 4.04 ms [17:43:58] PROBLEM - puppet on ORES-worker09.experimental is CRITICAL: connect to address 10.68.19.144 port 5666: No route to hostconnect to host ores-worker-09.ores.eqiad.wmflabs port 5666: No route to host [17:44:00] RECOVERY - ping4 on ORES-worker09.experimental is OK: PING OK - Packet loss = 0%, RTA = 3.85 ms [17:44:03] RECOVERY - puppet on ORES-worker09.experimental is OK: OK: Puppet is currently enabled, last run 20 minutes ago with 0 failures [17:44:24] Ok, I see that the stats are per label, I'll need to create and aggregate TPR, FPR array of values across all labels [17:44:32] for a generalized ROC_AUC curve [17:44:33] RECOVERY - ssh on ORES-worker09.experimental is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [17:53:41] codezee, I don't think you wan aggregate TPR or FPR across different labels, [17:53:49] Have you seen someone else do that. [17:55:02] halfak: not till now, but how do we provide info per label when we have 40 labels? [17:55:15] we can't make 40 graphs [17:55:28] macro and micro-averages [17:56:49] halfak: so we don't report the actual ROC curve for any label, just the micro/macro average graph obtained by varying hyperparameters? [17:57:48] codezee, we should only report on the final parameters [17:57:58] But simply state that we performed an optimization. [18:03:30] codezee, I think that plotting all of the ROC curves on the same graph would be informative. [18:03:41] You don't need to label every class. Instead, just label a few outliers. [18:04:00] I'd rather see a PR-AUC graph honestly. :) [18:04:13] halfak: if the alerts get spammy you always have the access to shut em up just fyi, I kinda wish they would give more of a heads up before rolling restarts though :/ [18:04:26] heh it's OK. [18:04:41] Not too much of a distraction [18:11:31] PROBLEM - ping4 on ORES-worker08.experimental is CRITICAL: PING CRITICAL - Packet loss = 100% [18:12:03] halfak: I see, if its just one graph, no problem then, let me check how it turns out to look [18:12:05] PROBLEM - ssh on ORES-worker08.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:12:13] with all the labels [18:12:21] PROBLEM - Host ORES-worker08.experimental is DOWN: PING CRITICAL - Packet loss = 100% [18:25:50] RECOVERY - Host ORES-worker08.experimental is UP: PING OK - Packet loss = 0%, RTA = 5.26 ms [18:25:52] RECOVERY - ping4 on ORES-worker08.experimental is OK: PING OK - Packet loss = 0%, RTA = 2.06 ms [18:25:53] PROBLEM - puppet on ORES-worker08.experimental is CRITICAL: connect to address 10.68.16.80 port 5666: Connection refusedconnect to host ores-worker-08.ores.eqiad.wmflabs port 5666: Connection refused [18:26:23] RECOVERY - ssh on ORES-worker08.experimental is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [18:26:32] RECOVERY - puppet on ORES-worker08.experimental is OK: OK: Puppet is currently enabled, last run 19 minutes ago with 0 failures [19:34:54] 10Scoring-platform-team, 10ORES, 10Operations: [Epic] Deploy ORES in kubernetes cluster - https://phabricator.wikimedia.org/T182331#3903803 (10Ottomata) p:05Triage>03Low [19:35:02] 10Scoring-platform-team, 10ORES, 10Operations: Tuning profile::ores::celery parameters should cause a Celery service restart - https://phabricator.wikimedia.org/T182203#3903804 (10Ottomata) p:05Triage>03Normal [19:39:09] 10Scoring-platform-team, 10ORES, 10Operations, 10Scap, 10Release-Engineering-Team (Next): scap support for git-lfs - https://phabricator.wikimedia.org/T181855#3903847 (10Ottomata) p:05Triage>03Normal Just curious, why not use git fat? We have a git-fat store available already, and it can be used by... [19:44:17] halfak: Have a few minutes to talk about word2vec? [19:44:30] shilad, sure! [19:44:37] codezee, ^ [19:44:42] Great! Who else is working with you on it? I can't remember? [19:44:49] codezee: I see. Thanks! [19:44:52] :D [19:45:01] Let me setup a hangout... [19:45:59] https://hangouts.google.com/hangouts/_/macalester.edu/word2vec?hl=en&authuser=0 [19:47:42] PROBLEM - ssh on ORES-lb02.Experimental is CRITICAL: connect to address ores-lb-02.ores.eqiad.wmflabs and port 22: No route to host [19:47:49] PROBLEM - ssh on ORES-redis01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:48:13] PROBLEM - Host ORES-lb02.Experimental is DOWN: CRITICAL - Host Unreachable (ores-lb-02.ores.eqiad.wmflabs) [19:48:23] PROBLEM - check disk on ORES-web03.experimental is CRITICAL: connect to address 10.68.18.196 port 5666: No route to hostconnect to host ores-web-03.ores.eqiad.wmflabs port 5666: No route to host [19:48:23] PROBLEM - check load on ORES-web03.experimental is CRITICAL: connect to address 10.68.18.196 port 5666: No route to hostconnect to host ores-web-03.ores.eqiad.wmflabs port 5666: No route to host [19:48:23] PROBLEM - puppet on ORES-web03.experimental is CRITICAL: connect to address 10.68.18.196 port 5666: No route to hostconnect to host ores-web-03.ores.eqiad.wmflabs port 5666: No route to host [19:48:26] PROBLEM - check users on ORES-web03.experimental is CRITICAL: connect to address 10.68.18.196 port 5666: No route to hostconnect to host ores-web-03.ores.eqiad.wmflabs port 5666: No route to host [19:48:29] PROBLEM - ping4 on ORES-redis01.experimental is CRITICAL: PING CRITICAL - Packet loss = 100% [19:48:36] shilad: yes, what do you want to know? [19:48:40] PROBLEM - check disk on ORES-redis01.experimental is UNKNOWN: [19:48:45] PROBLEM - ssh on ORES-web03.experimental is CRITICAL: connect to address ores-web-03.ores.eqiad.wmflabs and port 22: No route to host [19:48:54] PROBLEM - Host ORES-web03.experimental is DOWN: CRITICAL - Host Unreachable (ores-web-03.ores.eqiad.wmflabs) [19:48:57] ^ rolling restarts. nothign to see here [19:49:02] codezee: Would you like to join halfak and me in the hangout: https://hangouts.google.com/hangouts/_/macalester.edu/word2vec?hl=en&authuser=0 [19:49:23] PROBLEM - Host ORES-redis01.experimental is DOWN: PING CRITICAL - Packet loss = 100% [19:49:36] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:49:47] PROBLEM - ORES home page on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:50:17] PROBLEM - ORES web node labs ores-web-03 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:50:17] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:24] halfak: awight This alarams are expected [19:51:33] they are rebooting all VPS across the cloud [19:55:41] Amir1: ah, thanks for the heads-up. I had almost run out of the will to ignore the alerts ;-) [19:55:56] :)))) [19:56:23] It actually started to become stable, we should care about them at least a little :D [19:58:43] halfak: If you want to jump in, https://phabricator.wikimedia.org/T181855#3903847 [19:59:39] RECOVERY - Host ORES-redis01.experimental is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [19:59:47] RECOVERY - ORES home page on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 420 bytes in 0.010 second response time [19:59:57] RECOVERY - Host ORES-web03.experimental is UP: PING OK - Packet loss = 0%, RTA = 4.32 ms [20:00:14] RECOVERY - ssh on ORES-redis01.experimental is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [20:00:15] 10Scoring-platform-team, 10ORES, 10Operations, 10Scap, 10Release-Engineering-Team (Next): scap support for git-lfs - https://phabricator.wikimedia.org/T181855#3903991 (10demon) >>! In T181855#3903847, @Ottomata wrote: > Just curious, why not use git fat? We have a git-fat store available already, and it... [20:00:17] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 456 bytes in 0.571 second response time [20:00:18] RECOVERY - Host ORES-lb02.Experimental is UP: PING OK - Packet loss = 0%, RTA = 2.63 ms [20:00:25] RECOVERY - check load on ORES-web03.experimental is OK: OK - load average: 1.13, 0.30, 0.10 [20:00:26] RECOVERY - puppet on ORES-web03.experimental is OK: OK: Puppet is currently enabled, last run 36 minutes ago with 0 failures [20:00:26] RECOVERY - check disk on ORES-web03.experimental is OK: DISK OK [20:00:28] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 1.063 second response time [20:00:29] RECOVERY - check disk on ORES-redis01.experimental is OK: DISK OK [20:00:29] RECOVERY - check users on ORES-web03.experimental is OK: USERS OK - 0 users currently logged in [20:00:39] RECOVERY - ssh on ORES-web03.experimental is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [20:01:18] RECOVERY - ORES web node labs ores-web-03 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 1.198 second response time [20:01:58] nvm, RainbowSprinkles handled it... [20:02:47] 10Scoring-platform-team, 10ORES, 10Operations, 10Scap, 10Release-Engineering-Team (Next): scap support for git-lfs - https://phabricator.wikimedia.org/T181855#3904000 (10Ottomata) K cool, sounds good :) [20:05:32] Welcome to #wikimedia-botspam how can i take your order... [20:07:32] 10Scoring-platform-team, 10Operations, 10Wikimedia-Incident: Celery manager implodes horribly if Redis goes down - https://phabricator.wikimedia.org/T181632#3904031 (10Ottomata) p:05Triage>03Normal [20:07:59] 10Scoring-platform-team, 10Operations, 10Wikimedia-Logstash, 10monitoring, 10Wikimedia-Incident: Send celery and wsgi service logs to logstash - https://phabricator.wikimedia.org/T181630#3904032 (10Ottomata) p:05Triage>03Normal [20:08:18] 10Scoring-platform-team, 10Operations, 10Wikimedia-Incident: What is causing ORES celery workers to suddenly require more CPU? - https://phabricator.wikimedia.org/T181621#3904033 (10Ottomata) p:05Triage>03Normal [20:08:33] 10Scoring-platform-team, 10Operations, 10Wikimedia-Incident: Investigate redis-cluster or other techniques for making Redis not a single point of failure. - https://phabricator.wikimedia.org/T181559#3904034 (10Ottomata) p:05Triage>03Normal [20:08:48] 10Scoring-platform-team, 10Operations: Let the ORES application set log severity, not uWSGI - https://phabricator.wikimedia.org/T181546#3904035 (10Ottomata) p:05Triage>03Normal [20:13:13] Zppix: hey gimme the popcorn back [20:13:49] awight: do you like it in Gigabyte or Kilobyte? [20:14:27] toothlessbyte is all i’ve got :shrug: [20:17:59] :/ [20:18:50] shilad: if I understand correctly, are the navigation vectors primarily for providing recommendations? [20:19:19] In the past that was true with Ellery's work. [20:19:34] and what is the current scope? [20:19:44] codezee: I think they are (as you have found) a tremendously powerful algorithmic foundation for any NLP work. [20:20:26] codezee: I don't have a specific end-application in mind now except for my own research projects. That's part of why I wanted to talk with you! [20:21:20] PROBLEM - puppet on ORES-worker07.experimental is CRITICAL: connect to address 10.68.17.230 port 5666: No route to hostconnect to host ores-worker-07.ores.eqiad.wmflabs port 5666: No route to host [20:21:20] PROBLEM - check disk on ORES-worker07.experimental is CRITICAL: connect to address 10.68.17.230 port 5666: No route to hostconnect to host ores-worker-07.ores.eqiad.wmflabs port 5666: No route to host [20:21:20] PROBLEM - check load on ORES-worker07.experimental is CRITICAL: connect to address 10.68.17.230 port 5666: No route to hostconnect to host ores-worker-07.ores.eqiad.wmflabs port 5666: No route to host [20:21:21] codezee: I do feel that people (industry & research) should not have to rely on places like Google and Facebook to publish embeddings. [20:21:28] PROBLEM - check users on ORES-worker07.experimental is CRITICAL: connect to address 10.68.17.230 port 5666: No route to hostconnect to host ores-worker-07.ores.eqiad.wmflabs port 5666: No route to host [20:21:31] PROBLEM - ssh on ORES-worker07.experimental is CRITICAL: connect to address ores-worker-07.ores.eqiad.wmflabs and port 22: No route to host [20:21:31] PROBLEM - Host ORES-worker07.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-07.ores.eqiad.wmflabs) [20:21:34] PROBLEM - ping4 on Experimental ORES Website is CRITICAL: PING CRITICAL - Packet loss = 100% [20:21:36] PROBLEM - check http on Experimental ORES Website is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:21:40] Shut up you worthless bot [20:21:48] lol [20:21:54] PING PING PING PING PING :| [20:21:58] Good bot [20:22:08] PROBLEM - Host Experimental ORES Website is DOWN: CRITICAL - Host Unreachable (ores.wmflabs.org) [20:22:10] :] [20:22:17] Well it worked for a second 🤷‍♂️ [20:23:07] halfak: btw, Google-News word2vec is Apache License 2.0 [20:23:56] codezee: My hope is that the WMF navigation + content embeddings can enablesoftware and research by individuals and small companies that would be very difficult without ongoing support by, for example, Google and Facebook. [20:24:13] :))))))) [20:24:23] +1 much better to have it based on free knowledge artifacts. [20:24:44] I would ack the alarms... if i could access the interface [20:24:53] yeah i see, thats nice... [20:25:37] so you aim to create a combined navigation + content embeddings and provide as a resource... [20:25:48] I'm leaving for the day, tomorrow will work on ORES stuff, my plan is to work on SqlScoreStorageLookup in the extension (will be needed for rewriting onRCSaveHookHandler) and then work on the make file automation, if there is anything I should do beforehand let me know [20:26:02] Exactly! It would be awesome to have an internal WMF consumer of said resource, but not a requirement. [20:26:19] Amir1: Nice. I’ll be around until 19:00 UTC tomorrow. [20:26:20] Great, Amir1 :) Have a good evening. [20:26:33] shilad: do you have some metric in mind as to how you'll measure the effectiveness of these new vectors? [20:26:51] Cool, See you then [20:26:51] o/ [20:27:27] I think evaluating the quality of embeddings is application-specific. There are standard ways for text embeddings (analogies, word relatedness). [20:27:45] There are not standard ways for text + article embeddings, but named-entity recognition is an obvious approach. [20:29:35] RECOVERY - Host ORES-worker07.experimental is UP: PING OK - Packet loss = 0%, RTA = 1237.21 ms [20:30:11] RECOVERY - Host Experimental ORES Website is UP: PING OK - Packet loss = 0%, RTA = 1.66 ms [20:30:17] RECOVERY - check users on ORES-worker07.experimental is OK: USERS OK - 0 users currently logged in [20:30:17] RECOVERY - puppet on ORES-worker07.experimental is OK: OK: Puppet is currently enabled, last run 15 minutes ago with 0 failures [20:30:19] RECOVERY - check load on ORES-worker07.experimental is OK: OK - load average: 0.58, 0.16, 0.05 [20:30:19] RECOVERY - check disk on ORES-worker07.experimental is OK: DISK OK [20:30:24] RECOVERY - check http on Experimental ORES Website is OK: OK - Certificate '*.wmflabs.org' will expire on Fri 16 Nov 2018 03:41:05 PM GMT +0000. [20:30:33] RECOVERY - ssh on ORES-worker07.experimental is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [20:32:44] shilad: word2vec - https://arxiv.org/pdf/1301.3781.pdf creates a hybrid test set consisting of syntactic and semantic questions to measure the effectiveness(sec 4,1), you can probably do sth similar [20:33:00] although the basic elements in your case would be articles rather than words I think [20:34:34] shilad: and I think coming up with a metric for text+article embeddings would be a useful contribution as you already mentioned that there's no standard way to do it :) [20:35:12] PROBLEM - ping4 on ORES-web05.experimental is CRITICAL: PING CRITICAL - Packet loss = 100% [20:35:21] PROBLEM - ssh on ORES-web05.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:35:42] PROBLEM - puppet on ORES-worker05.experimental is UNKNOWN: [20:35:48] PROBLEM - Host ORES-worker05.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-05.ores.eqiad.wmflabs) [20:35:50] PROBLEM - Host ORES-web05.experimental is DOWN: CRITICAL - Host Unreachable (ores-web-05.ores.eqiad.wmflabs) [20:35:50] PROBLEM - check load on ORES-web05.experimental is CRITICAL: connect to address 10.68.23.52 port 5666: No route to hostconnect to host ores-web-05.ores.eqiad.wmflabs port 5666: No route to host [20:36:57] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:42:44] RECOVERY - Host ORES-worker05.experimental is UP: PING OK - Packet loss = 0%, RTA = 1.50 ms [20:42:47] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.526 second response time [20:43:02] RECOVERY - Host ORES-web05.experimental is UP: PING OK - Packet loss = 0%, RTA = 2.98 ms [20:43:33] halfak: well we know the alerts work I guess :P [20:43:34] RECOVERY - puppet on ORES-worker05.experimental is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [20:47:47] heh right. This has been a test of the ORES missile defense system [20:47:49] too soon? [20:48:39] Wait it has missles... [20:48:46] Whats the launch codes? [20:49:00] Go long! [20:50:17] who owns the button? :D [20:54:16] codezee: probably legal [21:00:01] making some food. back in a bit. [21:01:58] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 3525 bytes in 4.786 second response time [21:01:58] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 3606 bytes in 0.478 second response time [21:01:59] PROBLEM - ORES web node labs ores-web-03 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 3525 bytes in 3.402 second response time [21:09:58] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 443 bytes in 0.540 second response time [21:09:59] RECOVERY - ORES web node labs ores-web-03 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.533 second response time [21:09:59] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 457 bytes in 0.539 second response time [21:23:01] Wow, all I’m learning from battling PHPUnit is that PHP is no Python. [21:24:22] http://james-iry.blogspot.in/2009/05/brief-incomplete-and-mostly-wrong.html [21:25:05] 1995 - At a neighborhood Italian restaurant Rasmus Lerdorf realizes that his plate of spaghetti is an excellent model for understanding the World Wide Web and that web applications should mimic their medium. On the back of his napkin he designs Programmable Hyperlinked Pasta (PHP). PHP documentation remains on that napkin to this day. :D [21:26:40] codezee: Thanks for the support! I’m getting badly burned under this steaming heap of spaghetti. [21:26:47] All I want is unittest.mock [21:27:35] > In spite of the lack of evidence that any significant Ada program is ever completed historians believe Ada to be a successful public works project that keeps several thousand roving defense contractors out of gangs. [21:30:08] awight: mediawiki tests might help? [21:30:43] codezee: Hehe: https://phabricator.wikimedia.org/T184775. That happens to be the exact whale I’m trying to cut my way out of. [21:40:34] back. [21:40:44] I'ma continue with converting ores.wmflabs.org to stretch [21:41:40] 💯 [21:42:20] halfak what hosts? Ores-web? I can silence the alarms for ya [21:42:32] Now you tell us :p [21:43:05] awight: you can too you know [21:43:14] :) [21:43:16] You and halfak and amir all have access to the interface [21:43:44] Well if they dont kill the proxies [21:44:45] Expect more alarms. :) [21:44:51] I figure this'll just be alarm day. [21:44:54] Happy Alarm day! [21:46:26] Gerrit-icinga.wmflabs.org *cough* [21:49:56] Hmm.. I catch figure out why we have ores-redis-02 [21:51:16] +1 redis is tiny and can fit on the scoring node [21:51:48] We already have ores-redis-91 [21:51:50] *01 [21:52:07] * halfak tries to figure out what everything is pointing to. [21:56:41] Oh! It looks like the workers are pointing to -02 and the cache is pointing to -01 [21:58:21] Wait... wtf... everything should be pointing to -02 [21:58:30] but ores-web-03 has -01 in its configuration [21:59:03] halfak: Is this the staging cluster? [21:59:15] nope. regular ores.wmflabs.org [21:59:24] OH [21:59:34] Someone ran scap on ores-web-03 and it made a deployment dir [21:59:38] And it's super old [21:59:45] * halfak comes back from crazy pants mountain [22:00:06] ok. ores-redis-01 is dead. [22:00:12] lol [22:00:17] Rude [22:01:10] PROBLEM - ping4 on ORES-redis01.experimental is CRITICAL: PING CRITICAL - Packet loss = 100% [22:01:37] So redis01 is no longer needed? [22:01:53] PROBLEM - ssh on ORES-redis01.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:01:55] PROBLEM - Host ORES-redis01.experimental is DOWN: CRITICAL - Host Unreachable (ores-redis-01.ores.eqiad.wmflabs) [22:02:17] lul [22:02:23] Zppix, ^ wanna squash that? [22:02:33] Yes [22:02:43] Ill downtime it until i get around to ridding it [22:03:06] I'll have a new set of hostnames for you soon. [22:03:23] Ok [22:04:14] halfak: any other hosts going away? [22:04:23] Yes. Basically all of them ;) [22:04:30] But in a rolling fashion [22:04:41] Oh so your completely migrating? [22:04:48] Oh wait [22:04:57] I forgot cloud is weird with hostnames [22:06:04] heh [22:10:53] halfak: if you want you can disable the hosts that your ridding of https://gerrit.wikimedia.org/r/404584 there [22:15:24] OK here we go. I'm switching over to the two new web nodes. [22:22:41] \o/ works! [22:22:51] TIme to kill ores-web-03 and ores-web-05 [22:23:24] Could you update the patch with the new info once your done and then ping me and ill merge it halfak? [22:23:39] Zppix, sure. [22:23:52] Hmm... Never updated someone else's patch in gerrit. [22:24:03] PROBLEM - Host ORES-web05.experimental is DOWN: CRITICAL - Host Unreachable (ores-web-05.ores.eqiad.wmflabs) [22:24:08] loool [22:24:11] Here we go [22:24:12] PROBLEM - check disk on ORES-web03.experimental is CRITICAL: connect to address 10.68.18.196 port 5666: No route to hostconnect to host ores-web-03.ores.eqiad.wmflabs port 5666: No route to host [22:24:12] PROBLEM - ssh on ORES-web03.experimental is CRITICAL: connect to address ores-web-03.ores.eqiad.wmflabs and port 22: No route to host [22:24:21] PROBLEM - Host ORES-web03.experimental is DOWN: CRITICAL - Host Unreachable (ores-web-03.ores.eqiad.wmflabs) [22:28:08] * halfak deleted ores-worker-05/06/07 [22:28:27] PROBLEM - check users on ORES-worker05.experimental is CRITICAL: connect to address 10.68.23.15 port 5666: No route to hostconnect to host ores-worker-05.ores.eqiad.wmflabs port 5666: No route to host [22:28:27] PROBLEM - ssh on ORES-worker05.experimental is CRITICAL: connect to address ores-worker-05.ores.eqiad.wmflabs and port 22: No route to host [22:28:38] PROBLEM - puppet on ORES-worker05.experimental is CRITICAL: connect to address 10.68.23.15 port 5666: No route to hostconnect to host ores-worker-05.ores.eqiad.wmflabs port 5666: No route to host [22:28:39] PROBLEM - ping4 on ORES-worker07.experimental is CRITICAL: CRITICAL - Host Unreachable (ores-worker-07.ores.eqiad.wmflabs) [22:28:45] PROBLEM - Host ORES-worker05.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-05.ores.eqiad.wmflabs) [22:28:45] PROBLEM - check load on ORES-worker06.experimental is CRITICAL: connect to address 10.68.22.146 port 5666: No route to hostconnect to host ores-worker-06.ores.eqiad.wmflabs port 5666: No route to host [22:28:45] PROBLEM - puppet on ORES-worker06.experimental is CRITICAL: connect to address 10.68.22.146 port 5666: No route to hostconnect to host ores-worker-06.ores.eqiad.wmflabs port 5666: No route to host [22:28:48] PROBLEM - check users on ORES-worker06.experimental is CRITICAL: connect to address 10.68.22.146 port 5666: No route to hostconnect to host ores-worker-06.ores.eqiad.wmflabs port 5666: No route to host [22:28:49] icinga2-wm: are you sure? [22:28:51] PROBLEM - check disk on ORES-worker06.experimental is CRITICAL: connect to address 10.68.22.146 port 5666: No route to hostconnect to host ores-worker-06.ores.eqiad.wmflabs port 5666: No route to host [22:28:52] PROBLEM - Host ORES-worker06.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-06.ores.eqiad.wmflabs) [22:29:01] ACKNOWLEDGEMENT - Host ORES-worker07.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-07.ores.eqiad.wmflabs) zppix https://gerrit.wikimedia.org/r/404584 [22:29:02] PROBLEM - check users on ORES-worker07.experimental is CRITICAL: connect to address 10.68.17.230 port 5666: No route to hostconnect to host ores-worker-07.ores.eqiad.wmflabs port 5666: No route to host [22:29:16] ACKNOWLEDGEMENT - Host ORES-worker06.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-06.ores.eqiad.wmflabs) zppix https://gerrit.wikimedia.org/r/404584 [22:29:28] ACKNOWLEDGEMENT - Host ORES-worker05.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-05.ores.eqiad.wmflabs) zppix https://gerrit.wikimedia.org/r/404584 [22:29:33] ALARM PARTY! [22:31:17] I need a button that does mass action on icinga [22:31:34] Besides me just shutting off the bot completely [23:07:45] CUSTOM - ping4 on ORES-worker07.experimental is UNKNOWN: check_ping: Invalid hostname/address - ores-worker-07.ores.eqiad.wmflabsUsage:check_ping -H -w ,% -c ,% [-p packets] [-t timeout] [-4 paladox Zppix [23:13:37] Looks like the new workers are ... working. [23:16:52] :) [23:19:58] PROBLEM - ssh on ORES-worker10.experimental is CRITICAL: connect to address ores-worker-10.ores.eqiad.wmflabs and port 22: No route to host [23:20:07] PROBLEM - Host ORES-worker10.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-10.ores.eqiad.wmflabs) [23:20:07] PROBLEM - puppet on ORES-worker10.experimental is CRITICAL: connect to address 10.68.19.136 port 5666: No route to hostconnect to host ores-worker-10.ores.eqiad.wmflabs port 5666: No route to host [23:20:13] PROBLEM - ping4 on ORES-worker09.experimental is CRITICAL: PING CRITICAL - Packet loss = 100% [23:20:27] PROBLEM - ssh on ORES-worker08.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:20:33] PROBLEM - Host ORES-worker08.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-08.ores.eqiad.wmflabs) [23:20:33] PROBLEM - ping4 on ORES-worker08.experimental is CRITICAL: CRITICAL - Host Unreachable (ores-worker-08.ores.eqiad.wmflabs) [23:20:51] PROBLEM - ssh on ORES-worker09.experimental is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:20:59] PROBLEM - Host ORES-worker09.experimental is DOWN: CRITICAL - Host Unreachable (ores-worker-09.ores.eqiad.wmflabs) [23:31:22] CUSTOM - Host ORES-worker08.experimental is DOWN: check_ping: Invalid hostname/address - ores-worker-08.ores.eqiad.wmflabsUsage:check_ping -H -w ,% -c ,% [-p packets] [-t timeout] [-4 paladox test [23:41:44] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.029 second response time [23:42:34] PROBLEM - ORES web node labs ores-web-03 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 3.187 second response time [23:43:26] good luck! my head is PHP pasta [23:50:08] DOWNTIMEEND - check users on ORES-worker06.experimental is OK: USERS OK - 1 users currently logged in zppix https://gerrit.wikimedia.org/r/404584 [23:50:22] DOWNTIMEEND - Host ORES-worker06.experimental is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms zppix https://gerrit.wikimedia.org/r/404584 [23:50:30] DOWNTIMEEND - check load on ORES-worker06.experimental is OK: OK - load average: 0.89, 0.31, 0.23 zppix https://gerrit.wikimedia.org/r/404584 [23:50:32] DOWNTIMEEND - ping4 on ORES-worker06.experimental is OK: PING OK - Packet loss = 0%, RTA = 1.31 ms zppix https://gerrit.wikimedia.org/r/404584 [23:50:34] DOWNTIMEEND - puppet on ORES-worker06.experimental is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures zppix https://gerrit.wikimedia.org/r/404584 [23:50:36] DOWNTIMEEND - ssh on ORES-worker06.experimental is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u2 (protocol 2.0) zppix https://gerrit.wikimedia.org/r/404584 [23:50:39] DOWNTIMEEND - check disk on ORES-worker06.experimental is OK: DISK OK zppix https://gerrit.wikimedia.org/r/404584 [23:50:48] DOWNTIMEEND - Host ORES-worker05.experimental is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms zppix https://gerrit.wikimedia.org/r/404584 [23:50:52] DOWNTIMEEND - check disk on ORES-worker05.experimental is OK: DISK OK zppix https://gerrit.wikimedia.org/r/404584 [23:50:54] DOWNTIMEEND - check load on ORES-worker05.experimental is OK: OK - load average: 0.22, 0.19, 0.20 zppix https://gerrit.wikimedia.org/r/404584 [23:50:55] DOWNTIMEEND - check users on ORES-worker05.experimental is OK: USERS OK - 0 users currently logged in zppix https://gerrit.wikimedia.org/r/404584 [23:50:57] DOWNTIMEEND - ping4 on ORES-worker05.experimental is OK: PING OK - Packet loss = 0%, RTA = 0.41 ms zppix https://gerrit.wikimedia.org/r/404584 [23:50:59] DOWNTIMEEND - puppet on ORES-worker05.experimental is OK: OK: Puppet is currently enabled, last run 9 minutes ago with 0 failures zppix https://gerrit.wikimedia.org/r/404584 [23:51:04] DOWNTIMEEND - ssh on ORES-worker05.experimental is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u2 (protocol 2.0) zppix https://gerrit.wikimedia.org/r/404584