[09:10:54] good morning elukey [09:11:13] elukey: Quick request: would you mind restarting pivot please? [09:11:35] elukey: related to https://phabricator.wikimedia.org/T161824 [09:12:06] elukey: patch for adding a dimension to druid schema have been submitted, jobs restarted, data is present in druid indices, but not in Pivot [09:12:16] I wonder if a restart would do the trick [09:13:02] joal: done :) [09:13:08] Thanks elukey :) [09:14:01] elukey: not enough unfortunately [09:14:05] hm [09:14:07] checking the task [09:14:22] ah joal, today at 13:30 UTC (IIRC) we'd need to reimage an1003 [09:14:31] so oozie/hive/etc.. stopped [09:14:36] k elukey [09:15:00] I'll add my eyes to yours :) [09:16:02] joal: mmm is druid returning correctly the new field? [09:16:37] elukey: field is visible in fiewld list [09:17:16] weird [09:26:33] joal: https://wikitech.wikimedia.org/wiki/Incident_documentation/20170419-restbase is the outage that I mentioned last week [09:26:58] part of it was a very familiar auth problem.. :( [09:27:24] repair didn't do the job and only restoring credentials fixed the auth errors [10:06:57] elukey: for pivot, could we restart with a manual config removing the pageview-hourly ection, then restart again with prod config (to force autofill check)? [10:08:35] from the logs I can see Got the latest time for 'pageviews-hourly' (2017-04-23T23:00:00.000Z) [10:08:49] Got the latest time for 'pageviews-daily' (2017-03-31T00:00:00.000Z) [10:09:15] is it saying that from its point of view the last changes were yesterday? [10:09:46] elukey: data was added this morning I think [10:10:02] elukey: but schema change happened on the 20th [10:10:29] I can try to remove/re-add pageview-hourly, feels a bit of dramatic but let's do it :) [10:10:51] yeah, I don't have better ideas :( [10:11:16] joal: do the idea is to remove the datacube, restart, re-add it? [10:12:01] elukey: yes - remove datacube config, (having pivot finding itr by itself) - check the unregistered datacube for new field, then readd and restart [10:12:53] joal: done the datacube removal [10:13:17] forcing puppet to re-add it [10:13:49] elukey: not even needed: the datacube found by pivot wuith no name doesn't have the wished fields [10:14:01] grmbl grmbl [10:15:15] elukey: I wonder if pivot fills its schema with first-present-data instead of latest-added [10:16:05] it should query druid periodically though.. [10:16:31] elukey: it must come from which point in time it uses to build its schema [11:38:05] created https://etherpad.wikimedia.org/p/analytics1003-reinstall [11:38:25] I think that we could proceed as we did for an100[12] formatting only the root partition [11:44:39] elukey: please let me know if I can help [11:45:40] joal: I was reviewing things to save etc.. but it shouldn't be super hard, maybe we could stop oozie et all around 13:00 UTC? [11:46:19] that might simply be just stopping oozie bundles [11:46:23] and coords if needed [11:46:42] elukey: Stopping shouldn't be needed - suspending is enough [11:46:49] and elukey, I think you should be ab [11:46:58] le to suspend only webrequest-load [11:47:00] ah yes sorry I meant suspending [11:47:14] okok [11:47:19] gooooood [11:47:23] going to lunch then :) [11:47:23] and if you wait a bit, everything will drain (except for search tem jobs) [11:47:24] ttl [11:47:33] later, taking a break as well [12:12:15] 10Analytics-Tech-community-metrics: Git code repository is listed but not all recent activity in it is shown on wikimedia.biterg.io - https://phabricator.wikimedia.org/T161211#3205675 (10Aklapper) 05Open>03stalled p:05Triage>03Low @aklapper to check after T157898 / T161235 / T157709 have seen progress. M... [12:13:08] 10Analytics-Tech-community-metrics: Clarify differences between similar widgets - https://phabricator.wikimedia.org/T160576#3205683 (10Aklapper) a:05Aklapper>03None [13:07:14] !log suspended webrequest-load-bundle [13:07:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:08:44] !log suspended transfer_to_es bundle [13:08:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:09:53] goodmorrniiing [13:10:41] ottomata: o/ [13:11:03] analytics1003 in a bit, ya? [13:11:12] yep :) [13:11:20] I stopped some bundles in the meantime [13:11:32] oook :) [13:16:07] elukey: starting https://etherpad.wikimedia.org/p/analytics-analytics1003-jessie-upgrade, unless you already got something [13:16:24] ottomata: I did! https://etherpad.wikimedia.org/p/analytics1003-reinstall [13:16:42] great! [13:17:02] NIice, we need to add druid to db backup list [13:17:37] ah right [13:20:42] we should also stop druid [13:21:05] yep yep I completely forgot about it [13:21:39] ya me too almost :) [13:21:53] ah yes and camus [13:21:58] let me stop it now [13:22:07] !log disable camus cron on an1003 [13:22:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:22:36] coo [13:24:34] ottomata: should I stop the root crontabs too? [13:25:51] whoa the backup cron is in there twice??! [13:26:12] i just deleted the first one [13:26:18] ya elukey that sounds fine [13:26:44] done [13:27:07] https://yarn.wikimedia.org/cluster/apps/RUNNING seems empty now [13:27:39] I suspended webrequest_load and transfer_to_es in https://hue.wikimedia.org/oozie/list_oozie_bundles/ [13:27:43] not sure if we need more [13:27:55] (before stopping our dear oozie) [13:28:37] elukey: if oozie is stopped and there are no jobs running, it is the same as suspending everything [13:28:49] oozie will just pick up from where it left off by reading its mysql data whne it starts back up [13:29:02] okok [13:30:48] and also sent an email to the an list [13:30:56] to alert that we are almost ready [13:31:07] ottomata: I am wondering if researchers etc.. will read it [13:31:25] great [13:31:28] i think that's good elukey [13:32:47] elukey: should we start? i think we can do on IRC, ja? you driving or me? [13:33:22] let's do it! ottomata you can drive if you want, I'll set up downtime ok? [13:33:38] ok great [13:34:46] downtime set! [13:35:14] ok [13:35:23] stopping services [13:35:55] !log stopping druid, oozie and hive to upgrade analytics1003 to jessie [13:36:43] elukey: did you mark druid downtime on druid hosts too? [13:37:08] nope doing it [13:37:11] danke [13:37:41] done [13:37:55] the sal seems not working -.- [13:38:21] oh hm [13:38:24] ok [13:40:36] elukey: lemme know when druid downtime set [13:40:43] already done! [13:40:45] danke [13:40:47] oh [13:40:48] you did! [13:40:51] sorry missed it [13:41:01] stopping druid [13:41:19] super [13:42:09] stopping mysql [13:44:15] in the ops channel they are checking why wikitech/etc.. is down. it seems due to a db failure [13:47:56] back again, logged in https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:54] bohrium dump is backing up now [13:50:58] copying to stat1002 [13:51:47] super [13:51:51] ooook [13:51:51] done [13:51:55] time for reimage [13:52:27] ottomata: do you want to share for sanity check or it is ok? [13:53:43] gah my internet has been so annoying lately! [13:53:55] ok, starting reimage [13:54:09] (ok I guess the answer is no :D) [13:54:37] i think i misse the question [13:54:44] elukey: what was the q? [13:55:05] ahhh okok! " ottomata: do you want to share for sanity check or it is ok?" [13:55:10] oh [13:55:14] naw its ok i think [13:55:32] okok.. let me know when you are reimaging so I'll clear puppet/salt [13:56:06] reimaging now! [13:57:09] all right clearing puppet/salt :) [13:57:55] ahh elukey the partman seems to be taking over! [13:57:55] done [13:58:05] snap [13:58:19] don't seem to be able to stop it...oohHHhhhh ok [13:58:28] welp, i guess we'll have to reload mysql from back up :/ [13:58:40] yep :/ [13:58:49] kinda wish i had just copied the datadir [13:58:52] kinda thought about it [13:59:01] we have the datadir lvm backups on an02 [13:59:17] probably should have just run the lvm backup script manually once services stopped [13:59:18] and just used that [13:59:41] i thought i'd have an option to stop partman from happening [13:59:42] but it never let me [14:00:52] I didn't think about it too :( [14:12:09] ottomata1: all good? [14:13:15] ya installer still going [14:13:17] almost done i think [14:14:14] hm, if druid is still giving us problems maybe we should upgrade, might skip past some needless troubleshooting. Then if it's still broken we'll know it's not a bug [14:14:21] or at least not a common one [14:17:22] elukey: i think w got that mdadm rootdelay probably again [14:17:45] i have forgotten how to fix this... [14:17:50] do you have this on wikitech? [14:18:52] ottomata1: nope but I can fix it now if you logoff from the console [14:19:13] rootdelay=30 after editing the Debian boot menu [14:22:16] ok elukey i'm out of console [14:27:52] ottomata1: running puppet via install_console atm [14:28:41] elukey: ok great [14:38:41] elukey: puppet still running? [14:38:50] yep! [14:52:48] oo, i just logged in elukey [14:52:53] can i try to fix thigns up? [14:53:43] ottomata: need to run puppet a second time to install some stuff that failed [14:53:49] after that it is all yours [14:54:04] ok, it looks like partitions aren't quite right either, it allocated everyting in the vg for /srv [14:55:06] should be easy to fix no? [14:55:22] yea [14:55:54] 10Analytics: upgrade druid to 0.9.2 - https://phabricator.wikimedia.org/T157977#3206261 (10Nuria) p:05Normal>03High [14:58:25] hmmm, I can't run "hive" on 1002 oe 1004, is something happening that I missed a mail about or? [14:58:57] you did addshore! [14:58:59] check analytics list emails [14:59:07] we are upgrading the server that runs hive and some other thigns [14:59:11] ottomata: the only broken thing is Error: Could not start Service[mysql]: Execution of '/usr/sbin/service mysql start' returned 1: Job for mysql.service failed. See 'systemctl status mysql.service' and 'journalctl -xn' for details. [14:59:29] ok elukey that's fine, we'll need to restore stuff anyway [14:59:30] ack! [14:59:32] so i'll work on that [14:59:38] elukey: may i proceed? [15:00:06] sure! [15:00:28] grr camus is running :p [15:00:43] ping milimetric [15:01:32] ping ottomata [15:01:35] OH! [15:04:50] gonna wait for these camus to finish [15:10:43] (03PS1) 10Joal: Add pt.wikimedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/349969 [15:14:51] (03PS2) 10Joal: Add pt.wikimedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/349969 [15:16:17] (03CR) 10Milimetric: [V: 032 C: 032] Add pt.wikimedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/349969 (owner: 10Joal) [15:21:42] (03CR) 10MarkTraceur: [C: 032] Fix report name [analytics/limn-multimedia-data] - 10https://gerrit.wikimedia.org/r/349379 (owner: 10Matthias Mullie) [15:21:44] 06Analytics-Kanban: Check how pivot updates schema (or maybe make schema explicit on pivot) - https://phabricator.wikimedia.org/T163697#3206394 (10Nuria) [15:24:27] urandom: thanks a lot for the patience :) [15:25:16] elukey: not at all; thanks for the continued interest! [15:25:21] and for keeping me honest :) [15:28:14] 10Analytics, 10Pageviews-API: Track page views by page ID rather than title (handles moved pages) - https://phabricator.wikimedia.org/T159046#3206440 (10Halfak) [15:34:10] 10Analytics: Label mediawiki_history snapshots for the last month they include - https://phabricator.wikimedia.org/T163483#3206471 (10Nuria) p:05Triage>03High [15:36:08] fdans: tasking/grooming? [15:42:09] 10Analytics: Add templating support to reportupdater scripts - https://phabricator.wikimedia.org/T163252#3206549 (10Nuria) p:05Triage>03Low [15:42:17] 10Analytics: Pivot not loading any data - https://phabricator.wikimedia.org/T163702#3206563 (10Jseddon) [15:42:19] 10Analytics: Pivot not loading any data - https://phabricator.wikimedia.org/T163701#3206550 (10Jseddon) [15:42:21] 10Analytics: Add templating support to reportupdater scripts - https://phabricator.wikimedia.org/T163252#3191334 (10Nuria) p:05Low>03Normal [15:44:02] urandom: the report looks awesome, thanks :) [15:45:13] 10Analytics: Pivot not loading any data - https://phabricator.wikimedia.org/T163702#3206563 (10Nuria) There is an outage today as we migrate software. Please subscribe to analytics@ where notifications get sent about things like this. [15:45:22] 10Analytics: Pivot not loading any data - https://phabricator.wikimedia.org/T163702#3206613 (10Nuria) 05Open>03Invalid [15:45:40] 10Analytics: Pivot not loading any data - https://phabricator.wikimedia.org/T163701#3206550 (10Nuria) There is an outage today as we migrate software. Please subscribe to analytics@ where notifications get sent about things like this. [15:45:46] 10Analytics: Pivot not loading any data - https://phabricator.wikimedia.org/T163701#3206621 (10Nuria) 05Open>03Invalid [15:45:54] 10Analytics: Pivot not loading any data - https://phabricator.wikimedia.org/T163702#3206622 (10Jseddon) Thanks @Nuria! Seddon [15:51:37] 10Analytics, 10Analytics-Cluster, 13Patch-For-Review: can't compile numpy on stat1004 - https://phabricator.wikimedia.org/T163177#3206689 (10Nuria) p:05Triage>03Normal [15:52:10] 10Analytics, 10Analytics-Cluster, 13Patch-For-Review: can't compile numpy on stat1004 - https://phabricator.wikimedia.org/T163177#3188807 (10Nuria) Once we migrate to jassy, we will probably need to take a look at the puppet statistic classes. [15:52:22] ottomata: I think that the downtime will expire in ~30 mins or less, do you want me to extend it ? [15:53:19] nope, its all coming up now [15:53:24] analytics1003 is good [15:53:27] druid is loading segments [15:53:32] niceeee [15:53:35] \o/ [15:53:44] you can go ahead and unsuspend those oozei jobs [15:53:54] all right doing it now [15:54:51] !log re-enabled oozie bundles webrequest-load and transwer_to_es [15:54:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:03:42] elukey: we done. the bohrium backup is on stat1002 in /a/backups/bohrium [16:06:13] great work ottomata :) [16:07:55] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042#3206804 (10Nuria) Ping @LilyOfTheWest please let me know if you can provide more info. We can certainly compile a dataset of banners/c... [16:08:03] ottomata: puppet is still disabled on stat1002 ? [16:29:03] 10Analytics: Better publishing of Annotations about Data Issues - https://phabricator.wikimedia.org/T142408#2533952 (10Nuria) We feel that wiki annotations per metric that are machine readable would work for this use case . Also, we can probably make use of a generic page with annotations that affect all metri... [17:01:06] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042#3206992 (10Nuria) >Pitching in from WMDE Fundraising. Our tracking requirements are two-fold: @gabriel-wmde :Let's please not mix use... [17:07:57] going afk people! [17:08:37] a-team: tomorrow is public holiday for Italy (Liberation Day), so I'll be afk but I'll check for things exploding :) [17:09:01] o/ [17:21:56] laters! [17:30:28] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042#3207100 (10DStrine) @Nuria I don't think WMDE has access to pivot or any LDAP related systems. That may be very complicated to setup. [17:53:48] nuria: time for uniques? [18:05:36] 10Analytics: Enable nested on-wiki config pages in mediawiki-storage - https://phabricator.wikimedia.org/T163725#3207314 (10mforns) [18:06:09] ottomata: here? [18:06:23] ya [18:06:48] ottomata: you said you wanted some scala talk - I can provide 1/2 hour now if you want [18:08:08] joal: mostly would like a look over https://gerrit.wikimedia.org/r/#/c/346291/ [18:08:13] the StructExtensions and the Test class [18:08:18] JsonToHive not ready [18:08:31] if you want to look it over, you can leave comments there, or we can jump in hangout to discuss [18:08:52] will read ottomata, then we'll see :) [18:08:52] want to hear your opinions on if you think that's a good idea, implicits, how i structured it, better ways to do things, etc. [18:08:54] k [18:12:32] ottomata: For me discussion is in the value of defining the classes as implicits instead of not implicits [18:13:03] ottomata: for very generic aspects, like comparison functions when sorting for instance [18:13:11] ottomata: implicits are interesting [18:13:15] joal: back, i have time [18:13:59] joal:let me know when you are free on your end [18:14:04] ottomata: When in specific environement, the value of implicit is less important compared to the downside of stuff being hidden [18:14:10] joal ya? maybe its not worht it here? It was really nice not to have to pass arguments around all over the place [18:14:15] nuria: depends on ottomata :) [18:14:31] nuria: gimme joal for 5ish+ mins [18:14:36] joal: wanna hangout? [18:14:41] ottomata: yesssirrr [18:14:43] ottomata: we can do that [18:22:48] Hey nuria ! [18:22:49] batcave? [18:26:06] joal: yes [18:31:26] 10Analytics: Enable nested on-wiki config pages in mediawiki-storage - https://phabricator.wikimedia.org/T163725#3207384 (10mforns) [18:55:32] (03CR) 10Milimetric: [V: 032] Fix report name [analytics/limn-multimedia-data] - 10https://gerrit.wikimedia.org/r/349379 (owner: 10Matthias Mullie) [18:55:45] (03CR) 10Milimetric: "my bad for not seeing this earlier" [analytics/limn-multimedia-data] - 10https://gerrit.wikimedia.org/r/349379 (owner: 10Matthias Mullie) [20:37:43] (03PS2) 10Milimetric: Add --generate-jar and --jar-file options [analytics/refinery] - 10https://gerrit.wikimedia.org/r/349723 (https://phabricator.wikimedia.org/T143119) [22:27:08] (03CR) 10Ottomata: "Some nits :)" (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/349723 (https://phabricator.wikimedia.org/T143119) (owner: 10Milimetric)