[00:20:41] 10Analytics-Tech-community-metrics, 10Developer-Relations (Apr-Jun 2017): When indexing new users, identify identical email addresses and merge identities accordingly in the DB - https://phabricator.wikimedia.org/T151634#3366034 (10Aklapper) @albertinisg: Indeed! Thanks a lot! The `"source": "gerrit"`/`"git"`... [00:23:52] 10Analytics-Tech-community-metrics, 10Developer-Relations (Apr-Jun 2017): Automatically sync mediawiki-identities/wikimedia-affiliations.json DB dump file with the data available on wikimedia.biterg.io - https://phabricator.wikimedia.org/T157898#3366037 (10Aklapper) [00:41:01] 10Analytics-Tech-community-metrics, 10Developer-Relations (Apr-Jun 2017): When indexing new users, identify identical email addresses and merge identities accordingly in the DB - https://phabricator.wikimedia.org/T151634#3366116 (10Aklapper) 05Open>03Resolved [01:41:46] 10Analytics: Extraneous whitelist items for WikimediaBlogVisit schema - https://phabricator.wikimedia.org/T168475#3366220 (10Tbayer) PS: it looks like these two fields came from an ancient [[https://meta.wikimedia.org/w/index.php?title=Schema:EventCapsule&diff=10981547&oldid=8326736 | (pre-2015)]] version of the... [04:32:43] (03CR) 10Nuria: "Do we have a corresponding job for deleting this data?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355598 (https://phabricator.wikimedia.org/T166967) (owner: 10Joal) [04:35:19] 10Analytics: Alarm on HDFS space used and related script failures - https://phabricator.wikimedia.org/T168390#3362956 (10Nuria) Should we put this on kanban? seems important enough. [04:42:09] 10Analytics, 10Operations, 10Traffic: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#3366282 (10Nuria) >These metrics are 429s emitted from RESTBase, and not Varnish. Right, that is why we should continue to see throttling on the rest base end. Do take a second... [05:05:39] (03CR) 10Nuria: UDF to tag requests (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/353287 (https://phabricator.wikimedia.org/T164021) (owner: 10Nuria) [05:29:15] 10Analytics: Data Lake queries abort with HDFS write fail - https://phabricator.wikimedia.org/T168497#3366377 (10Tbayer) [05:39:19] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3366392 (10Niharika) @kaldari, are we also sure that `PageContentInsertComplete` is *not* called during w... [06:00:44] Hello people! [06:00:54] As planned I am rebooting stat100[2,3,4] [06:17:50] and thorium [06:22:16] 10Analytics, 10DBA: Purge all old data from master - https://phabricator.wikimedia.org/T168414#3366436 (10elukey) eventlogging_cleaner.py should be capable of purging db1046 (simply not doing any whitelisting/update-with-nulls) but we can have another script if simpler. [06:48:57] 10Analytics-Kanban, 10User-Elukey: Reboot all the Analytics hosts for kernel upgrades - https://phabricator.wikimedia.org/T168381#3366485 (10elukey) [06:56:26] * elukey runs errand for ~1h [09:58:25] so up to now I rebooted kafka1012 (again, since yesterday the kernel wasn't there, my bad) [09:58:32] kafka2001 (eventbus) [09:58:38] aqs100[45] [09:58:44] stat* and thorium [09:58:50] and all the hadoop workers [09:59:11] my plan would be to do an1003 today with either Andrew or joal [09:59:19] and the hadoop masters [10:00:11] Hi elukey [10:00:29] Works for - Need to leave now for half an hour, here after [10:00:35] sure! [10:00:44] we can do it in the afternoon no problem [10:00:54] I'll do an1002 now [10:00:55] elukey: You let me know when you want, I'll be working most of the afternoon [10:01:03] great [10:01:10] See you in 1/2 hour [10:10:35] 10Analytics-Kanban, 10User-Elukey: Reboot all the Analytics hosts for kernel upgrades - https://phabricator.wikimedia.org/T168381#3366827 (10elukey) [10:13:40] an1002 back in service [10:42:32] yay [10:42:42] elukey: please go ahead with 1001 if you want [10:43:28] joal: all right doing it now [10:45:57] an1002 is now the hdfs master [10:46:23] and the yarn master [10:47:14] rebooting 1001 [10:54:37] joal: everything done [10:54:40] an1001 back to primary [10:54:47] elukey: no oozie email, everyhting looks good so far [10:55:00] elukey: Will re-launch other jobs and check [10:55:06] personal record of 10 minutes in total to reboot an1001 [10:55:08] ahhahah [10:55:41] 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10The-Wikipedia-Library, 10Wikimedia-General-or-Unknown, 10Patch-For-Review: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#3366955 (10Samwalton9) [11:05:46] joal: if you are ready we can do an1003 too [11:05:52] elukey: I am ! [11:06:02] elukey: stop camus first I guess [11:06:14] then load bundle in hue [11:06:20] and wait for jobs to drain [11:06:23] right? [11:06:26] then wait for jobs draining (we're at beginning of hour, we'll have to wait some [11:06:36] stopping camus is enough [11:06:49] ah okok [11:08:11] !log stop camus on an1003 [11:08:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:08:50] elukey: every camus, right - webrequest, mediawiki, eventlogging, eventbus [11:09:13] yep yep [11:09:47] oki thanks (just double checking) [11:10:28] please always do it, safe approach to avoid pebkacs [11:12:16] ok elukey - As expected, beginning of hour, so refinement job started not long ago - We'll need to wait for the jobs to drain [11:13:53] in the meantime I'll reboot a aqs node :D [11:14:06] yay :) [11:18:45] elukey: I'd also like to deploy refinery today if you agree [11:37:09] joal: sure! [11:37:28] elukey: one at a time :) First, reboot :) [11:37:50] really weird what happened now.. I stopped aqs1006-{a,b} and aqs1004/8 restbase endpoints went nuts timing out [11:37:57] and causing alarms for mobile apps [11:38:12] self recovered [11:38:55] weird elukey indeed ! [11:39:00] 10Analytics-Kanban, 10User-Elukey: Reboot all the Analytics hosts for kernel upgrades - https://phabricator.wikimedia.org/T168381#3367040 (10elukey) [11:39:10] Maybe there were some requests waiting for data from there? [11:39:20] 10Analytics-Kanban, 10User-Elukey: Reboot all the Analytics hosts for kernel upgrades - https://phabricator.wikimedia.org/T168381#3362750 (10elukey) [11:39:47] could be, but really weird.. maybe I could try to drain first [11:39:52] as precautionary measure [11:40:13] elukey: would be safe, but it's still weird [11:40:54] yep [11:41:07] it is also weird that mobile apps alarms if aqs goes down [11:42:45] joal: if you agree I'd go for a quick lunch + a small errand, and be back in ~1h [11:43:13] good for me elukey :) [11:43:20] all right, ttl! [11:43:25] cluster should be ready for 1003 reboot [11:43:30] later [12:09:22] elukey: when you're back, cluster is drained :) [12:41:54] (03PS3) 10Joal: Rename unique devices project-wide [analytics/refinery] - 10https://gerrit.wikimedia.org/r/360327 (https://phabricator.wikimedia.org/T168402) [13:04:13] joal: here I am! [13:04:23] sorry it took a bit more to the dentist :( [13:04:29] Wow ! Just got your message - weird [13:04:32] :) [13:04:50] ahahah [13:05:05] let's reboot an1003 then [13:05:22] I can see your username in RUNNING though [13:05:52] don't worry :) [13:06:06] hopefully it'll still work (or not) [13:09:01] host will reboot in 1 min [13:09:05] stopped all the daemons [13:13:10] we are back online [13:15:12] !log reboot analytics1003 for kernel update [13:15:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:21:19] 10Analytics-Kanban, 10User-Elukey: Reboot all the Analytics hosts for kernel upgrades - https://phabricator.wikimedia.org/T168381#3367251 (10elukey) [13:39:11] aqs1007 rebooted [13:39:36] Yay ! [13:40:47] should be able to do 100[89] today, and some kafka brokers [13:40:51] mforns: holaaaaa [13:40:54] you there? [13:41:01] I'd need to reboot eventlog1001 :) [13:42:11] I think that stopping it should be enough, but wanted to double check [13:48:38] elukey: I guess so - But I'm very much not knowledgeable in EL :( [13:54:15] joal: Would you be willing to review changes in GitHub for the recommendation project? [13:54:30] schana: feasible :) [13:54:38] thanks :) [13:54:40] !log stop eventlogging and reboot eventlog1001 [13:54:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:57:53] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3367371 (10Ottomata) In the one double rev_parent_id 0 page I checked, both revisions were 'redirect' rev... [13:59:54] !log eventlogging restarted after reboot [13:59:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:00:34] 10Analytics: Data Lake queries abort with HDFS write fail - https://phabricator.wikimedia.org/T168497#3366377 (10JAllemandou) Hi @Tbayer, I wonder about the reason ... One solution is to use the non-yet-proctionized version of mediawiki_history that has cumulative edit counts (and timestamps in JDBC format YYYY-... [14:02:55] schana: While being able to review, it would be better if you ask me to review specific patches - I guess the one you submitted is one) [14:03:37] that was my plan, joal [14:03:56] great schana :) [14:07:49] elukey, just arrived, hello team :] [14:07:58] Hi mforns [14:08:11] hi joal! [14:08:28] mforns: just rebooted eventlog1001, should be good :) [14:08:56] ok elukey [14:09:12] fdans: ok, I pushed a version with no double tests, no ajv warnings, no fs/net errors [14:09:19] success [14:09:26] niiiiiice [14:09:36] I'm gonna work on a phab repo now [14:09:38] what happened milimetric / [14:10:22] double tests: files were being watched twice, so I changed karma.conf and removed test/index.js which was adding a layer of indirection that made the problem harder to fix [14:11:06] ajv: ignored the warnings with the webpack.IgnorePlugin, so they're still there but they shouldn't be much of a problem. The bad side effect is that it makes the test output less verbose. The real fix is waiting for webpack to use ajv > 5.0 [14:12:03] fs/net errors: added a "node" section under "devServer" in the webpack config, it needed some tuning there [14:12:36] i see [14:12:39] cool [14:12:39] fdans: let me know if you had grander plans for test/index.js though [14:12:59] naw I thought it was something we needed at some point [14:13:29] 10Analytics-Kanban, 10Analytics-Wikistats: Deploy new Wikistats to stats.wikimedia.org/v2 - https://phabricator.wikimedia.org/T167684#3340773 (10Milimetric) a:03Milimetric [14:13:37] I got a bit desperate in Prague and was trying a few different approaches, and that sneaked through [14:14:19] no it seemed more flexible, but so far we can make do with just karma.conf [14:14:33] k, good to know, victory is mine! [14:15:40] ottomata: organizational question about wikistats repo [14:15:48] so we have wikistats in gerrit [14:15:55] but we want to make wikistats 2, I was going to do it in phab [14:15:59] (diffusion) [14:16:14] milimetric: uhhh ask me [14:16:16] should I call that one wikistats as well... [14:16:24] ah [14:16:27] hmmmmmm [14:16:31] hm. [14:16:47] i guess why not? [14:16:47] eh? [14:16:48] :) [14:16:55] ottomata: Hi ! [14:16:58] is it just called 'wikistats'? [14:16:58] so then we'd have two wikistatseses? [14:17:09] milimetric: it is analytics/wikistats in gerrit [14:17:13] ah [14:17:20] also hahha [14:17:22] we're not doing that in diffusion? [14:17:23] https://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/wikistatshttps://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/wikistats [14:17:24] yeah right [14:18:00] or, milimetric does anyone still make commits to wikistats? you could make a branch there and wipe master [14:18:04] for your new wokr [14:18:06] work [14:18:07] dunno [14:18:12] joal: hiiii [14:18:27] DUDES I forgot to go to the managers meeting yesterday! [14:18:45] it's ok, we should've reminded you [14:18:49] ottomata: Arrrf [14:18:50] I even thought to do it, sorry [14:18:52] it wasn't on my calendar! Tuesday Andrew didn't even have a thought about remembering that Monday Andrew said yes [14:19:29] wait you're split into 7 people like that? Like Tuesday Andrew exists on all Tuesdays?! [14:19:33] ottomata: I'm sorry I didn't think to remind you :( [14:19:41] nono, not on all tuesdays [14:19:44] every tuesday is a different person [14:19:55] but, still, i am a time worm [14:20:00] milimetric: That'd be an interestingtime intrication :) [14:20:06] seriously [14:20:19] I could see a whole tv show based on just that [14:20:27] hehe [14:20:29] https://en.wikipedia.org/wiki/Perdurantism [14:20:43] ottomata: I have a weird thing on puppet / refineyr code base [14:20:52] ottomata: do you have minute triple checking this with me? [14:21:28] I buy perdurantism, if electrons do it, so do we [14:21:33] 10Analytics, 10Patch-For-Review: Eventstreams graphite disk usage - https://phabricator.wikimedia.org/T160644#3367463 (10fgiunchedi) 05Resolved>03Open Reopening, beginning at around 6/6 eventstreams has been creating a lot of metrics consuming ~20% of graphite disk space in 8 days and it is now at around 4... [14:21:49] 10Analytics: Eventstreams graphite disk usage - https://phabricator.wikimedia.org/T160644#3367468 (10fgiunchedi) p:05Triage>03High a:05fgiunchedi>03None [14:21:50] joal: ya, but ummmm, lemme run back home real quick cause meetings are starting, sorry [14:22:00] sure ottomata [14:26:17] joal: bc [14:26:24] yes !