[00:34:50] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3529686 (10Dzahn) There is some cronspam from stat1006: Cron Daemon root@stat1006.eqiad.wmnet via wikimedia.org 4:30 PM (1 hour ago) to stats **Error: Value 5431... [05:06:44] (03CR) 10Dzahn: "i think this is outdated since analytics/wikistats is being replaced, so it's not the original Perl scripts by Erik Zachte anymore. please" [analytics/wikistats] - 10https://gerrit.wikimedia.org/r/316289 (https://phabricator.wikimedia.org/T64570) (owner: 10Paladox) [06:24:27] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3530003 (10elukey) >>! In T152712#3529686, @Dzahn wrote: > There is some cronspam from stat1006: > > Cron Daemon root@stat1006.eqiad.wmnet via wikimedia.org > > 4:... [08:10:37] 10Analytics, 10User-Elukey: Geowiki check_web_page.sh alerts for data quality - https://phabricator.wikimedia.org/T173486#3530059 (10elukey) [08:12:45] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3530071 (10elukey) For the records I created https://phabricator.wikimedia.org/T173486 and moved the cron alert to analytics-alerts@. [10:05:50] 10Analytics, 10Operations: Tune Varnishkafka delivery errors to be more sensitive - https://phabricator.wikimedia.org/T173492#3530259 (10elukey) [10:06:18] 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3505470 (10elukey) Created https://phabricator.wikimedia.org/T173492 for the varnishkafka alarms [10:09:40] 10Analytics, 10Operations: Tune Kafka logs to register clients connected - https://phabricator.wikimedia.org/T173493#3530279 (10elukey) [10:09:55] 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Analytics Kafka cluster causing timeouts to Varnishkafka since July 28th - https://phabricator.wikimedia.org/T172681#3505470 (10elukey) And finally https://phabricator.wikimedia.org/T173493 to tune alarms. [10:45:36] * elukey lunch! [13:06:03] mforns: o/ - are you there? [13:06:32] I made some calculations about MobileWebUIClickTracking_10742159_15423246 that are a bit disturbing :D [13:06:51] so we have 1475415359 records in there [13:07:15] that are ~14754 batches of 100k updates [13:07:37] now a batch on dbstore1002 takes ~380s to complete [13:08:26] so if I made my calculations correctly it would take 64 days to complete [13:08:39] * elukey cries in a corner [13:22:17] elukey, back from lunch [13:22:19] mmmmmmm [13:24:55] mforns: this is why on dbstore1002 it is taking ages to complete, probably it will not finish by the end of the quarter :D [13:25:06] omg [13:25:17] so I'd say to stop it and wait to drop MobileWebUIClickTracking_10742159_15423246 [13:25:35] elukey, sure [13:25:43] or maybe we can add a feature like --blacklist table1,table2,table3,etc.. [13:25:54] yes, blacklist is a good idea [13:26:14] blacklist == do-not-touch right? [13:26:27] yes yes, but not in a file, just as csv parameter [13:26:33] sure [13:29:59] elukey, why did we continue using the IN clause, if we have the last_ts? wouldn't that be more computational cost? [13:32:19] because we're checking the index 100K times for each batch no? [13:33:58] lemme check, don't remember [13:34:05] elukey, batcave? [13:35:09] mforns: last_ts would not respect the batch_size no? We don't know exactly how many elements are there [13:35:16] IIRC this is why we used the IN clause [13:35:23] to specify exactly the batch uuids [13:35:48] elukey, aha, but this is a very small detail compared to a potential performance increase no? [13:37:58] mforns: probably buuuut we committed not to hit the database with more than batch size elements :( [13:38:19] I can join the cave yes, grabbing the headphones [13:38:42] elukey, ok :] [13:48:54] mforns: I'm reviewing isCurrent :) [13:48:59] you absolute SAVAGE :D [13:49:10] fdans, what?? xD [13:49:17] is that good or bad? [13:49:23] fdans: don't disturb Marcel :P [13:49:28] xD [13:49:42] I remember, isCurrent is the logic puzzle [13:49:46] haha it's fantastic, but it's also the work of a madman [13:49:47] sorry for that [13:49:56] xD [13:50:44] not too sure of what to do about its cyclomatic complexity mforns, I'm sure you've thought a lot about it [13:51:01] fortunately the way you've written it and documented it makes it quite clear [13:52:19] elukey: LUCA ಠ_ಠ [13:52:33] fdans: we are working in the cave, this is why I said that :P [13:52:58] I know I know :) [15:02:44] sorry fdans elukey google 2factor auth kicked me out of hangouts!!! [15:18:48] 10Analytics, 10Research, 10cloud-services-team: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10Halfak) [15:22:35] 10Analytics, 10Research, 10cloud-services-team: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530873 (10Halfak) [15:22:38] 10Analytics, 10Project-Admins, 10Research, 10cloud-services-team: Create a phabricator project called "wikireplica-datasets" - https://phabricator.wikimedia.org/T173512#3530877 (10Halfak) [15:22:55] 10Analytics, 10Research, 10cloud-services-team: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10Halfak) [15:22:57] 10Analytics, 10User-Elukey: Investigate Geowiki check_web_page.sh alerts for data quality - https://phabricator.wikimedia.org/T173486#3530894 (10mforns) [15:25:28] 10Analytics, 10Research, 10cloud-services-team: Create a database on the wikireplica servers called "datasets_p" - https://phabricator.wikimedia.org/T173513#3530910 (10Halfak) [15:26:00] 10Analytics, 10Research, 10cloud-services-team: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10Halfak) [15:26:24] 10Analytics: Handle long project names in Wikiselector - https://phabricator.wikimedia.org/T173373#3530930 (10mforns) [15:27:01] 10Analytics, 10Project-Admins, 10Research, 10cloud-services-team: Create a phabricator project called "wikireplica-datasets" - https://phabricator.wikimedia.org/T173512#3530933 (10Aklapper) [15:28:55] 10Analytics-Kanban: Handle long project names in Wikiselector - https://phabricator.wikimedia.org/T173373#3525907 (10mforns) [15:30:21] 10Analytics, 10Research, 10cloud-services-team: Document the process for importing a new "datasets_p" table - https://phabricator.wikimedia.org/T173514#3530940 (10Halfak) [15:30:22] 10Analytics-Kanban, 10Analytics-Wikistats: Use daily granularity for 1-month time ranges - https://phabricator.wikimedia.org/T173372#3530956 (10mforns) [15:31:01] 10Analytics, 10Research, 10cloud-services-team: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10Halfak) [15:32:34] 10Analytics, 10Research, 10cloud-services-team: Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530963 (10Halfak) [15:37:16] 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis, 10Patch-For-Review: Reportupdater outputs files with restricted permissions - https://phabricator.wikimedia.org/T173333#3530979 (10mforns) a:03mforns [15:42:22] 10Analytics, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3531049 (10chasemp) [15:44:38] 10Analytics, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3531058 (10chasemp) p:05Triage>03Normal [15:52:34] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Improve purging for analytics-slave data on Eventlogging - https://phabricator.wikimedia.org/T156933#3531074 (10elukey) So today I did a bit of calculations for the `MobileWebUIClickTracking_10742159_15423246` (500GB in size) table update timings and th... [15:52:47] mforns: --^ [16:10:51] thanks elukey :] [16:42:37] going offline people! talk with you tomorrow :) [16:43:44] bye elukey ! [17:13:17] mforns: nuria_: everyone safe in Barcelona? [17:13:57] fdans: you, too? [17:15:26] leila: thank you for asking! I live far from bcn but no one I know seems to have been in the area at the time of the attack [17:16:13] ok, good, fdans. :) [17:19:56] leila, seems that my family is safe, thanks for asking! [17:22:13] glad to hear it, mforns. [18:15:47] (03PS7) 10Mforns: Add script to purge old mediawiki data snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) [18:18:43] (03CR) 10Mforns: "Thanks for the comments Joseph, I had forgotten them! I had to change some other things and I took the opportunity to take care of them. S" (035 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) (owner: 10Mforns) [18:20:32] (03CR) 10Mforns: "Patch 7 has been tested in hadoop with no issues :]" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) (owner: 10Mforns) [18:29:07] Looking at the mediawiki_page_history table in Hive, it flags page moves but does not appear to contain info about the move (e.g. say page_namespace is the namespace moved to, then page_from_namespace could be the namespace moved from)? [18:29:55] Tried to look for a task in Phabricator, but didn’t find anything there either. So maybe I’m the only one interested in this sort of thing? [19:02:49] hi all [19:03:17] o/ [19:03:21] I'm trying to login in to the stats machine, but having some problems [19:04:53] elukey: ^ [19:05:24] bassically, the machine is asking for a password, but I'm already setup the keys, and didn't put any password [19:05:31] any ideas on this? [19:05:46] robh: you may also be able to help dsaezt given that you helped with the access itself? ^ [19:06:12] can you login to a bastion? [19:06:19] it is likely an ssh config issue ;] [19:06:30] we dont allow agent forwarding, so you have to setup proxy in your ssh config [19:06:37] let me see if i can find the wikitech page with the info [19:06:52] https://wikitech.wikimedia.org/wiki/Production_shell_access#SSH_configuration [19:07:17] also please note that config doesn't work if you mix in fundraising hosts, it is just for normal shell access. [19:07:21] cool, I'll check [19:07:30] dsaezt: so right off, try to login to bast1001.wikimediaorg [19:07:32] dsaezt: so right off, try to login to bast1001.wikimedia.org [19:07:46] if you can, then we know its ssh config for items in the private vlans [19:08:00] i also tend to set the ssh key and login name in config so i dont have to specify it. [19:08:28] nopes, it's also asking for password [19:08:35] hrmm, ok, checking on bast1001 for your key [19:09:17] robh, I was telling dsaezt I’ve started having the same issue [19:09:33] despite the fact that my local config and key haven’t changed [19:09:57] I might reuse whatever you discover here to troubleshoot my own problem [19:10:15] dsaezt: So, we should troubleshoot your logging into a bastion first [19:10:20] and then work on the systems behind said bastion [19:10:41] ok, the response that I get in the bastion is longer than in the stat [19:11:03] so i'd try ssh -vvv diego@bast1001.wikimedia.org [19:11:09] and then paste that output into a pastebin [19:11:13] we have htem on phab [19:11:23] https://phabricator.wikimedia.org/paste/ [19:11:40] you can set it private if it has private info, but typically just ssh output is fine. [19:11:57] or just paste in the errors it gives here if its not htat long, but -vvv is pretty damned verbose. [19:12:30] dsaezt: you’re in good hands with robh, ping me later if there’s anything else help I can help with :) [19:12:43] ok, I'm doing, the stat machine gives a shorter output than the bastion [19:12:50] Thanks Dartar [19:12:51] well, stat is behind bastion [19:12:58] so its not as useful until we figure out why you cannot get in th bastion [19:13:04] isnt it? [19:13:11] maybe stat is public and im just silly... checking [19:13:16] which stat host? [19:13:55] https://phabricator.wikimedia.org/P5892 [19:13:55] stat machines are behind bastion [19:14:18] stat1005 [19:14:29] dsaezt: and silly question [19:14:37] but ssh-key -L ? [19:14:41] shows your key is loaded right? [19:14:57] (the same key we are using for production access) [19:15:09] I should have started there first ;] [19:15:20] ensure same key is loaded, then troubleshoot end systems. [19:15:29] ssh-key ? [19:15:37] ssh-keygen? [19:15:54] So, you generated an SSH key for production access and you gave us the public copy [19:16:03] you have to load the private copy on your system to access the cluster [19:16:13] yes [19:16:29] so that is typically done with: ssh-key add [19:16:55] ssh-keygen -t rsa this what i use [19:17:11] so that generated the key [19:17:14] but now you have to add it [19:17:18] sorry, i used wrong commend [19:17:21] ssh-add [19:18:23] once you have the key loaded locally into your shell, you'll no longer be prompted for a passwork [19:18:25] password [19:18:35] added, same problem [19:18:58] so ssh-add -L shows the key? [19:19:18] and its the private version of the public key you provided =] [19:19:18] yes [19:19:27] yes [19:20:34] hrmm, reading your output and comapring to my own when accessing the system [19:21:05] ok [19:22:07] debug1: Offering RSA public key: /home/dsaeztrumper/.ssh/id_rsa [19:22:15] so that shows your system giving its loaded ssh key [19:22:25] if id_rsa is your wmf key then its odd. [19:22:27] that it has error. [19:22:40] it shows its using ssh config for option * [19:22:46] can you paste your ssh config somewhere? [19:23:02] or is it just what i linked on wikitech? [19:23:43] (mine does the same thing, just confirming what your ssh config options are for *) [19:25:00] ok [19:29:46] I've pasted in the phabricator task [19:30:31] uh [19:30:37] i take it phab ripped out some ## ? [19:30:52] can you paste it into a paste bin so it shows the proper formatting (cuz much of ssh woes is indeed format based) [19:31:14] cuz it has lines like 'Short names' which i think would alone break ssh config right? [19:31:42] dsaezt: wait [19:31:48] are you using the smae key for labs and production? [19:32:00] looks like it according to your config [19:32:00] yes [19:32:01] wups [19:32:02] that is not ok. [19:32:04] at all [19:32:09] im removing your shell access right now ;] [19:32:15] you have to provide a new public key [19:32:22] the L3 agreement i think reviews that this is not ok... [19:32:39] * leila prays for dsaezt :D [19:32:44] hehe..ok [19:32:45] dsaezt: So i'm ripping out your access, you have to provide a different ssh key for production [19:32:48] not mad, just policy [19:33:31] I need to leave now, I'll send a new key during the night, and ping you back later to check how is going, is that ok? [19:34:34] no worries, ive reopened the task so its not forgotten [19:34:43] and im happy to merge tomorrow and we can continue ssh debugging =] [19:34:48] sorry for all the confusion =[ [19:34:49] thanks! and sorry for the confusion [19:34:52] heh [20:27:30] robh, if you're still around, I have another quick stat machine access Q. [20:27:43] ? [20:27:49] I think that https://phabricator.wikimedia.org/T171988 is ready for LDAP work. [20:28:04] Was hoping to have a quick review in case there's any missing information I can gather. [20:28:08] i wasnt aware this project existed, heh [20:28:12] (or missing tags, etc.) [20:29:04] I wasn't quite sure how to look up shilad's shell account, so I just linked to his wikitech user. I hope that works. [20:29:12] He does have shell access in labs. [20:29:21] labs doesnt matter [20:29:26] shell refers to producion =] [20:29:33] or in this case, ldap only [20:29:40] so its tied to wikitech uid anyhow though [20:29:44] so you did right if thats how you did it [20:29:46] ^ right [20:29:50] perfect [20:30:26] so they need ldap and then granted access to the NDA group, correct? [20:30:51] ohh, i joined this project, i just forgot it existed afterwards [20:31:02] halfak: so the only thing i think we need is a manager sign off for it? [20:31:07] wmf manager handling their access that is [20:31:14] which is suppose is you? [20:31:34] Right. I can sign off. Shall I do so in the task? [20:31:51] so yeah, approve on task and also mention the end date you want listed [20:31:56] and who you want emailed when end date is reached [20:32:13] i assume you since you filed task but otherwise just a hard end date and i can create the task. the actual nda signature we need legal to confirm [20:33:26] They said it's in the spreadsheet y'all use to coordinate [20:34:09] ill double check said sheet, it was very old and outdated when i saw it before [20:34:25] but i havent looked in awhile [20:35:15] thanks for picking this up robh. I really appreciate it :D [20:35:24] halfak: ahh, found it [20:35:28] \o/ [20:36:12] ok, so im not sure if ldap users need to have a 3 day wait, i assume so since its in the same file... but not sure [20:36:23] is it something where they need it today, or is a 3 day ok as long as we know its happening? [20:36:55] 3 days is OK if that's important. [20:37:01] also the L3 document doesnt seem like it applies... [20:37:06] since this isnt shell, heh [20:37:22] i'll ask in our ops meeting on monday, but for now lets trat it the same as a normal request, so no L3 since its not shell [20:37:26] but otherwise 3 day [20:37:30] ill update task with all of this [20:38:42] robh, I do expect that shilad will be sshing to stat1005/1006. [20:38:52] Is that what you mean by "shell"? [20:38:52] oh, so this is a shell request, not ldap then. [20:38:57] ssh is shell, not ldap [20:39:03] so this isnt an ldap rquest [20:39:03] Sorry. I'm confused about the terms. :/ [20:39:05] this is a shell request [20:39:09] ldap is ONLY via web pages [20:39:19] Gotcha! I see. [20:39:21] if they need to login, and need usergroups, that is a shell request [20:39:21] =] [20:39:26] So shilad will want ldap too for yarn. [20:39:28] no worries, its confusing [20:39:38] ldap only is when they dont have shell [20:39:39] :D [20:39:46] gotcha! [20:39:46] so this should change to an access request [20:40:03] and you should include the groups they need on stat100[56] [20:40:25] * halfak gets that [20:40:55] hrmm,... [20:41:03] there is a wiktiech page that denotes which groups use what [20:41:08] but yeah, we need to get their shell access first [20:41:13] and then we can tag them into NDA afterwards [20:41:27] goign in reverse means having to undo a lot of data.yaml changes along the way for the user [20:42:02] robh, I see. Is getting the NDA signed before the shell access "in reverse"? (for future reference) [20:42:17] sorry, confusion terminology [20:42:32] so, in data.yaml there are shell accounts [20:42:34] and ldap only accounts [20:42:47] using ldap-access-requests, as far as i know, is for users who will NOT get shell [20:42:54] but will need ldap flags set to use nda rights [20:43:01] * halfak takes notes. [20:43:03] but i could be wrong [20:43:05] its a new group [20:43:13] actually, im likely wrong [20:43:31] but 'This task is done when: we have a MOU and signed NDA for @Shilad and he has access to the analytics cluster hosts.' [20:43:39] access to cluster hosts is #ops-access-requests [20:43:46] and if they are going to have shell, then we want to handle that [20:43:47] I just added that tag ^_^ [20:43:51] cool [20:43:57] dropped the ldap one. [20:44:15] I'll add a checklist for the user access [20:49:12] halfak: updated task with new checklist [20:49:27] so just need the unchecked items met and we'll get it merged next tuesday (3 day wait) [20:49:39] its now on my radar as well ;] [20:50:32] but yeah, nda confirmed by me on task, so as far as ops goes that part is cool. [20:50:55] the ldap access requests is not for NON shell, i was wrong, its only whe you want us to tag in a ldap grop though to a user [20:51:27] which isnt the case here, i dont think this person needs ldap flags set [20:51:41] they get access via usergroups and shell. [20:54:33] 10Analytics, 10Operations, 10Ops-Access-Requests, 10Research, 10Research-collaborations: NDA, MOU and LDAP (analytics cluster) for Shilad Sen - https://phabricator.wikimedia.org/T171988#3532166 (10Halfak) [20:58:30] 10Analytics, 10Operations, 10Ops-Access-Requests, 10Research, 10Research-collaborations: NDA, MOU and LDAP (analytics cluster) for Shilad Sen - https://phabricator.wikimedia.org/T171988#3532176 (10RobH) a:03Shilad Assigned to @shilad for them to sign L3, and provide preferred shell username, wikitech u... [20:58:46] robh, what's the preferred way to share a public ssh key these days? [20:59:04] since they have phab account, their comment with is is easiest [20:59:12] used to be that wiki userpage was easiest way [20:59:23] or they can prepare their own patchset, but thats the most confusing for new user [20:59:43] their comment with their phab account is easiest i mean. [20:59:47] OK! [21:00:33] sorry for any confusion on my part, im on cold meds ;D [21:00:53] was out most of this week sick and now sudafed is my frenemy. [21:01:39] You were very helpful. Thank you very much :) [21:02:22] quite welcome, glad to help [21:06:53] fwiw you do need ldap if you are accessing any of the web uis like yarn.wikimedia.org or hue.wikimedia.org. Probably not in this case though [21:08:16] indeed, need ldap flags assigned fo rthat [21:08:41] but if it was ONLY that, then no shell access hoops, the end of data.yaml has a lot fo those. [21:08:43] of those even [21:08:53] thats what i thought this was at first =] [21:36:16] 10Analytics, 10Operations, 10Ops-Access-Requests, 10Research, 10Research-collaborations: NDA, MOU and LDAP (analytics cluster) for Shilad Sen - https://phabricator.wikimedia.org/T171988#3532244 (10Shilad) @RobH, I've signed the L3. My wikitech username is "Shilad Sen" and my preferred shell username is s... [21:37:04] 10Analytics, 10Operations, 10Ops-Access-Requests, 10Research, 10Research-collaborations: NDA, MOU and LDAP (analytics cluster) for Shilad Sen - https://phabricator.wikimedia.org/T171988#3532247 (10Shilad) a:05Shilad>03RobH [21:53:27] 10Analytics, 10Analytics-EventLogging, 10Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#3532300 (10Krinkle) [21:53:32] 10Analytics, 10Analytics-EventLogging, 10Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1589603 (10Krinkle) [21:53:34] 10Analytics-Kanban: Operational improvements and maintenance in EventLogging in Q4 {oryx} - https://phabricator.wikimedia.org/T130247#3532302 (10Krinkle) [21:53:39] 10Analytics, 10Analytics-EventLogging, 10Performance-Team, 10Scap (Scap3-Adoption-Phase1): Use scap3 to deploy eventlogging/eventlogging - https://phabricator.wikimedia.org/T118772#3532301 (10Krinkle) 05Open>03declined [21:53:43] 10Analytics, 10Analytics-EventLogging, 10Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1589603 (10Krinkle) 05stalled>03Open [23:38:28] new key added :) [23:52:44] 10Analytics, 10Analytics-EventLogging, 10Performance-Team, 10Patch-For-Review: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#3532549 (10Krinkle)