[06:30:38] 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3671151 (10Marostegui) [07:17:39] 10Analytics-Kanban, 10Operations, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671178 (10akosiaris) We 've had part of this discussion in #wikimedia-netops IRC channel. I can post a backlog (we don't have a bot yet archiving that channel) but some first (and partial) consensus se... [08:01:37] 10Analytics-Kanban, 10Operations, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671196 (10elukey) >>! In T177511#3671178, @akosiaris wrote: > We 've had part of this discussion in #wikimedia-netops IRC channel. I can post a backlog (we don't have a bot yet archiving that channel)... [08:20:30] 10Analytics-Kanban, 10Operations, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671213 (10elukey) @akosiaris what are the steps to take to move druid100[345] outside the analytics vlan(s)? From what I know we'd need to: 1) Properly remove the hosts from service (already done by A... [08:24:33] 10Analytics-Kanban, 10Operations, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671215 (10akosiaris) >>! In T177511#3671213, @elukey wrote: > @akosiaris what are the steps to take to move druid100[345] outside the analytics vlan(s)? From what I know we'd need to: > > 1) Properly... [08:49:53] 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671257 (10akosiaris) > 5. Remove druid100[456] from any router ACL entry (Garbage collection). > > I 'll do 5, the rest all LGTM Done (I had nothing to do actually, there are no... [08:51:39] 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671259 (10elukey) >>! In T177511#3671257, @akosiaris wrote: >> 5. Remove druid100[456] from any router ACL entry (Garbage collection). >> >> I 'll do 5, the rest all LGTM > > Do... [08:52:42] Bonjour joal :) [08:56:02] I checked the Druid coordinator UI from druid1002 and it seems to me that druid100[456] are officially out of the cluster [08:56:28] I am asking since if I find somebody that helps me I'll try to move those hosts out of the Analytics VLAN [10:15:30] 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671561 (10elukey) [10:23:45] I am pretty sure that I can proceed since nothing is basically running on druid100[456] :D [10:24:20] so I prepared the dns changes, I'd only need to get some help in changing the vlan ids on the druid100[456] ports (network switches side) [10:24:49] then we'll be able to reimage and get finally the public cluster ready to be deployed :) [10:44:55] done, reimaging druid100[456] [11:18:48] 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3661358 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['druid1004.eqiad.wmnet'] ``` The log can be found... [11:18:58] 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671633 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['druid1004.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['druid1004.eqiad.wmnet'] ``` [11:23:03] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671656 (10elukey) [11:28:23] * elukey lunch! [11:58:11] 10Analytics, 10Proton, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board: Implement Schema:Print purging strategy - https://phabricator.wikimedia.org/T175395#3671720 (10mforns) Hey all! Speaking with the team, we agreed that Schema:Print's skin field is a tricky case, and that we need... [11:59:39] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671722 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['druid1004.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/201710101159_elukey... [11:59:48] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671723 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['druid1004.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['druid1004.eqiad.wmnet'] ``` [12:44:49] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671830 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['druid1004.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/201710101244_elukey... [12:45:01] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671831 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['druid1004.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['druid1004.eqiad.wmnet'] ``` [12:47:25] elukey: o/ i liiiike it :) thanks for ^++ [12:47:53] ottomata: hiiiiiiii I am trying to reimage those hosts but wmf-auto-reimage has issues, testing it with Riccardo [12:48:20] ayye cool [12:51:35] (03PS3) 10Fdans: Add stub of new contributing and content metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) [12:59:47] (03PS4) 10Fdans: Add stub of new contributing and content metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) [13:00:35] just applied the changes you suggested yesterday joal [13:26:45] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3671956 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['druid1004.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/201710101326_elukey... [13:27:47] all right reimagining druid1004 finally [13:27:54] *imaging [13:28:08] if it works fine I'll do the same with druid100[56] [13:28:30] in the meantime, I need to add the ips to the analytics firewall on cr1/cr2 [13:35:26] (03CR) 10Ottomata: "I might have missed some discussion on this, but didn't we decide to use underscores for datasource names so that they could match the nam" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/383172 (https://phabricator.wikimedia.org/T175162) (owner: 10Joal) [13:49:53] elukey: q about kafka alerts and profiles. where do you think role specific alerts should go, if role classses should only include profiles? [13:50:22] the other alerts I need to port over are for specific topics, which only exist in the jumbo (/analytics) kafka cluster [13:51:19] hmmm actually, no, this is easy! sorry, those alerts are not broker specific, so they live either in role::graphite or elsewhere [13:51:33] :) [13:51:38] haha, thank you for your help! [13:52:12] ahahahha [14:03:53] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3672061 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['druid1004.eqiad.wmnet'] ``` and were **ALL** successful. [14:05:31] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3672064 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['druid1005.eqiad.wmnet', 'druid1006.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-re... [14:09:29] ottomata: what ports should be allowed in the cr1/cr2 firewalls? [14:09:32] for druid [14:09:57] the overlord I suppose [14:10:13] elukey: source ports? from druid-> hadoop? [14:10:20] i think its going to be a big range [14:10:44] nope from hadoop to druid [14:10:50] oh righ tright [14:11:08] I am writing the firewall exception :) [14:11:11] hm, i'm really not sure, def broker [14:11:20] lets looka toozie... [14:11:56] ya overlord for sure [14:11:58] IIUC oozie pokes the overlord to start the indexing right? (then middlemanagers/peons/etc..) [14:11:58] 8090 [14:12:01] all right [14:12:04] yeah [14:12:12] but you should allow broker too [14:12:12] and do we need the broker too? [14:12:15] all right [14:12:18] so we can query it from stat1004, etc. [14:12:30] 8082 [14:12:49] ya [14:13:32] mforns: tell me how you really feel about https://gerrit.wikimedia.org/r/#/c/382636/3/src/store/index.js :) [14:14:09] I agree that the state store is so simple right now we shouldn't worry too much. But the next change that goes in there should make sure it stays somewhat sane [14:14:17] ungh,i hate this no profile hiera default rule. it basically means i have to use self hosted puppet to set up kafka in labs, because I need to make new role classes if i want to test kafka [14:14:18] milimetric, hehe, it's the truth! I didn't lie! :] [14:14:32] milimetric, yea, that's my view too [14:14:36] I'm making the questions enabled/disabled based on what metrics exist. [14:14:39] hmmm, unless i can include the profile directly... from the UI? [14:14:40] hmmm [14:14:41] and I'll push after that [14:15:49] milimetric, "I'm making the questions enabled/disabled based on what metrics exist." sounds good! [14:16:04] milimetric, "hmmm, unless i can include the profile directly... from the UI?" don't get it [14:16:27] mforns: that second line was from Andrew :) [14:16:34] oh! xD sorry [14:17:01] I just mean I'm going to configure the search to only search for metrics that are enabled [14:17:12] yea yea, sounds perfect [14:21:04] ottomata: done! [14:21:42] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3672112 (10elukey) Updated cr1/cr2 eqiad with the following: ``` elukey@re0.cr2-eqiad> show system rollback compare 1 0 [edit firewall family inet filter analytics-in4] term default { ... } + term druid {... [14:24:40] (03PS4) 10Milimetric: Implement Topic Selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382636 (https://phabricator.wikimedia.org/T167676) [14:24:46] (03CR) 10jerkins-bot: [V: 04-1] Implement Topic Selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382636 (https://phabricator.wikimedia.org/T167676) (owner: 10Milimetric) [14:25:02] lol, does that say "jerkins", hahahaha, what? [14:29:58] nope the firewall rules seems not working [14:44:23] and now they work :) [14:45:19] 10Analytics-Kanban, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3672199 (10elukey) Also added the following: ``` elukey@re0.cr2-eqiad# show | compare [edit firewall family inet filter analytics-in4] term puppet { ... } ! term druid { ... } [edit firewall family inet fil... [14:47:23] elukey: are the nodes back up too?! [14:47:27] reinstalled with new IPs? [14:48:15] (03PS6) 10Milimetric: Create Oozie job for interlanguage nav table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/365517 (https://phabricator.wikimedia.org/T170764) (owner: 10Amire80) [14:48:25] ottomata: druid1006 is the only one left [14:49:29] ah k great [14:49:54] but cr1/cr2 firewall rules are ok now, just tested a telnet from stat1004 to druid1004 [14:50:06] cool [15:01:29] ping joal [15:11:13] 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Invalid "wikimedia" family in unique devices data due to misplaced WMF-Last-Access-Global cookie - https://phabricator.wikimedia.org/T174640#3672320 (10Nuria) [16:01:10] mforns / elukey: I can deploy refinery now if you like, so we can merge that cron [16:01:23] milimetric, want me to pair? [16:01:52] sure [16:02:23] mforns: no problem, easy deploy [16:07:21] joal: superset 0.20.3 up and running :) [16:07:26] i think it also let me keep and upgrade the previous local db [16:07:36] i think saved stuff should have remained [16:07:43] can you check it out and let me know if that is what you want? [16:07:46] oh waiiit, no joal today [16:07:46] :) [16:25:00] ok mforns / elukey: deploy is done and synced to hdfs [16:25:05] gone to make lunch [16:25:07] super [16:25:08] thanks milimetric [16:27:46] https://store.dftba.com/products/human-era-calendar [16:27:53] love these guys [16:33:26] Hey folks! I'll be late to the analytics/research geek out because of tech management [16:33:33] Should be able to join in ~15 minutes. [16:37:38] ^ ottomata et al. [16:43:02] is anybody doing anything to eventlogging? [16:52:38] seemed transient [16:55:48] nuria_: sorry for the +2 removal on that patch, wouldn't normally do that, except to make sure it won't auto-merge. [16:56:06] Krinkle: ahahah, that EXPLINS It [16:56:11] *EXPLAINS it [16:56:19] Krinkle: np [16:56:29] Arg. Tech mgmt is running over [16:56:42] Krinkle: can talk in abit [17:03:04] Krinkle: can talk now [17:04:08] nuria_: https://gerrit.wikimedia.org/r/#/c/382750/2/WikimediaEventsHooks.php [17:04:30] Krinkle: got it, do you want me to remove EL code in thsi changeset and abandon the other? [17:04:44] Krinkle: hook can keep on existing w/o sending EL events right? [17:04:50] nuria_: Yep [17:05:01] nuria_: I don't midn rebasing after yours, just as long as your patch doesn't remove this code. [17:05:41] Krinkle: nah, no worries, i will abandon and redo cc ottomata who CR other patch [17:17:20] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3672818 (10Ottomata) From Aaron: https://ores.wmflabs.org/v2/scores/enwiki/damaging/?model_info=score_schema [17:19:17] 10Analytics, 10Proton, 10Readers-Web-Backlog, 10Patch-For-Review, 10Readers-Web-Kanban-Board: Implement Schema:Print purging strategy - https://phabricator.wikimedia.org/T175395#3672821 (10ovasileva) p:05Normal>03High [17:38:14] o/ ottomata, I've been working on my bike-packing setup. [17:38:15] https://photos.app.goo.gl/WCDmRRHNlzs3TV4r1 [17:38:44] ^ 35 liters of space, 44lbs total on the bike. [17:38:54] I'm excited for some fall/winter trips :D [17:39:07] NIIIICE [17:39:10] man what a pretty bike [17:39:15] Ti? [17:39:24] Got my super big heavy quilts in that front bag. Can get down to -40F [17:39:36] Yup! It's my gravel racing bike :D [17:39:44] wow that thing is pretty [17:40:02] :DDDD [17:40:07] i see we like the same saddle! [17:40:09] is that a pure? [17:40:33] * halfak checks [17:40:59] WTB Speed Comp [17:41:11] I like it but I think I might switch back to my Selle SMP [17:41:13] ah [17:43:12] We should get our pretty bikes together for a ride some time :D. Maybe analytics wants to come have an offsite in MN :D [17:43:37] I'd be down for hosting. [17:43:40] that would be fun! [17:43:41] i'm into it [17:45:17] * elukey off!! [18:01:25] halfak: another name from another article [18:01:25] https://www.confluent.io/blog/build-services-backbone-events/ [18:01:28] 'event driven service' [18:03:56] ping mforns [18:04:41] nuria_, yes? [18:05:03] mforns: reading team is about to start their new popups test [18:05:12] aha [18:05:35] mforns: this puppet change makes events only available on mysql: https://gerrit.wikimedia.org/r/#/c/383389/ [18:05:44] looking [18:06:09] right nuria_ [18:06:15] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3672988 (10Nuria) @Tbayer: in the light of your new te... [18:06:20] LGTM [18:06:24] mforns: sorry, only available on HADOOP, argh [18:06:34] yea yea [18:06:48] mforns: which means that for them to use them as data comes in you will need to refine it [18:07:03] ok [18:07:09] mforns: it will take a bit for dat ato flow but just an fyi [18:07:24] mforns: we should probably think what db will we refine that data to cc ottomata [18:08:17] nuria_, are we going to do it by hand, once the experiment is complete? or do we want to have something in place that refines it as it comes in? [18:09:03] mforns: ideally we want to have a cron in place as data comes in, they will nee to check it looks ok before experiment is done [18:09:38] nuria_, so we want to use scala jsonRefine right? [18:09:45] mforns: right [18:09:57] k [18:10:17] mforns: this is how you run it by hand: [18:10:20] https://www.irccloud.com/pastebin/Ac7go22S/ [18:10:42] probably worth trying to run and refine to your user db before setting cron to make sure things are working [18:10:51] cc ottomata for confirmation of plan [18:10:57] of course [18:11:26] nuria_, do joseph's comments need to taken care of before doing that or they are a parallel thing? [18:11:42] mforns: i do not know, probably yes, cc ottomata [18:11:49] k [18:23:14] Krinkle: let me know if this is the way you expect changes to stack up on gerrit: https://gerrit.wikimedia.org/r/#/c/383392/ [18:24:25] +1 confirmation of plan, agree we should figure out db to write to for official events [18:24:29] let's talk post standup tomorrow [18:24:31] i gotta run [18:24:32] see yall! [18:47:20] Krinkle: i let you +2 at your convenience for those two changes , no urgency on our end [18:51:03] (03CR) 10Nuria: [V: 032 C: 032] "In this case we wanted not to break people's existing bookmarks and links thus moving "_" to "-" which is how we have named most of our da" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/383172 (https://phabricator.wikimedia.org/T175162) (owner: 10Joal) [18:58:41] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3673233 (10Tbayer) >>! In T174815#3665368, @Nuria wrot... [19:01:17] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3673274 (10Nuria) >Hm, it would be really great to avo... [19:18:44] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3673335 (10Tbayer) >>! In T174815#3672988, @Nuria wrot... [19:26:16] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Schema:Popups suddenly stopped logging events in MariaDB, but they are still being sent according to Grafana - https://phabricator.wikimedia.org/T174815#3673350 (10Nuria) Ok, let us know then if we can delet... [19:42:59] 10Analytics, 10Analytics-Wikistats, 10Research: Renovation of Wikistats production jobs - https://phabricator.wikimedia.org/T176478#3673462 (10Nemo_bis) Thanks for work on this, I may try to run the scripts again on translatewiki.net dumps and tell you whether I found it easier. :) [22:13:01] 10Analytics, 10Operations, 10Ops-Access-Requests: analytics-privatedata-users access for Jeff Green - https://phabricator.wikimedia.org/T177602#3664354 (10Dzahn) @Jgreen Do you just need Hive/Hadoop or do you additionally need sampled webrequest logs and stat boxes with private data? Asking this way because... [22:14:47] 10Analytics, 10Operations, 10Ops-Access-Requests: analytics-privatedata-users access for Jeff Green - https://phabricator.wikimedia.org/T177602#3674030 (10Dzahn) Yea, aware Jeff has root on the mentioned stat boxes anyways, heh. [22:29:02] 10Analytics-Tech-community-metrics, 10Developer-Relations: Understand difference between author_name and name in gerrit - https://phabricator.wikimedia.org/T177890#3674048 (10Aklapper) [23:33:55] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Patch-For-Review: Stop collecting Data for outdated schemas PageCreation, PageDeletion, PageMove, PageRestoration. Archive tables on hdfs - https://phabricator.wikimedia.org/T171629#3674250 (10Nuria) Ping @elukey more tabl... [23:36:07] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Patch-For-Review: Stop collecting Data for outdated schemas PageCreation, PageDeletion, PageMove, PageRestoration. Archive tables on hdfs - https://phabricator.wikimedia.org/T171629#3674254 (10Nuria) Also , @elukey we prob...