[03:13:17] 10Analytics, 10Discovery-Search (Current work), 10Patch-For-Review: Use kafka for communication from analytics cluster to elasticsearch - https://phabricator.wikimedia.org/T198490 (10EBernhardson) For the bulk daemon deployment, and switching the old transfer off we need to complete the following: # Create... [09:05:29] 10Analytics: Piwik user account for Wikimedia.org.il - https://phabricator.wikimedia.org/T199046 (10Itzike) @Milimetric, I don't think traffic should be something we should worry about. I don't think that in general, the chapters' websites get a lot of traffic. This is why I don't think is valuable that each one... [10:27:28] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10mobrovac) I assume the task description implies the topic would get multiple messages every week... [10:32:19] * elukey lunch + errand! [12:50:13] (03PS1) 10Reedy: Add new wikis [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450015 [12:51:15] (03PS2) 10Reedy: Add new wikis [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450015 [12:58:00] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10Ottomata) > so the producer can simply send plain messages and they would be compressed on the f... [13:06:36] yo milimetric yt? [13:10:50] heya team :] [13:10:55] yoyo [13:12:56] hola :) [13:17:27] hey ottomata what's up [13:19:27] got some mins to brain bounce some event platform next steps? [13:19:32] milimetric: ^? [13:22:55] ottomata: yea, give me like 10 minutes to grab some food and finish up email [13:23:13] sho [13:30:48] milimetric: about geowiki jobs, I think I'm just going to delete the crons and whatever references them in puppet, and not all occurrences of geowiki [13:31:55] there's a bunch of stuff here related to the static files that I don't feel totally comfortable deleting, and also the task is just about getting rid of the jobs [13:32:57] fdans: the task should be more generically getting rid of everything geowiki. The static files should be backed up and references from puppet should be deleted [13:33:10] ottomata: omw to the cave [13:33:36] k [13:33:59] milimetric: ok then I'll push the change I got now getting rid of everything and let's cr [13:39:04] yep please let's delete all :) [13:39:32] elukey: patch sent, I tagged you as reviewer if that's ok :) [13:39:56] fdans: of course it ise! [13:39:58] *is [13:40:23] <3 [13:43:06] fdans: one trick that we could use to avoid loosing track of files to delete is doing a two pass change - in the first one, all the file resources in puppet will get the ensure => absent parameter [13:43:32] because puppet is sneaky, it doesn't clean up anything if you remove a file resource [13:43:44] the second pass would be to remove the last reference to the absented files [13:43:55] this is to avoid manual rms basically [13:45:34] (so just to be clear - your patch looks good, but I'd only leave the "file" resource in there now with "ensure=>absent" [13:48:21] ok elukey so i defer merging this patch, and create a new one where the file resources in it are set to "ensure=>absent"? [13:51:15] fdans: or we can keep this one with this variation [13:51:27] it is not a big deal, but it avoids you to chase all the files afterwards [13:51:40] because puppet in this way will clean them up [13:54:54] elukey: so instead of removing lines 36-53 we add ensure=>absent? [13:54:59] (sorry for being thick) [14:13:10] fdans: my bad sorry, I am probably not explaining it well.. So I think that we can backtrack and do what you have proposed initially, so just add ensure => absent to all the file { etc.. things related to geowiki [14:13:38] then we do another pass and we clean up all (as you did in the code change) [14:13:54] cool beans! [14:21:59] elukey: aw <3 https://usercontent.irccloud-cdn.com/file/jAYsMUif/Screen%20Shot%202018-08-02%20at%209.21.25%20AM.png [14:23:58] ahahah [14:24:21] ottomata: archiva 2.2.3 running on archiva.eqiad.wmflabs! [14:30:17] even if if the UI doesn't seem to load [14:30:20] lovely [14:30:45] elukey: wooaah [14:31:03] from deb elukey? [14:31:41] fdans: but did you back up those files somewhere first? [14:32:23] milimetric: which files? [14:33:23] fdans: i can merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/450025/ [14:33:28] but we'll need some manual cleanup after too, ya? [14:33:31] cron jobs for sure [14:33:35] what else, do you know? [14:33:37] files? [14:33:39] any apache stuff? [14:34:23] ottomata: I suggested to Fran to first create a patch to put "ensure => absent" on all files etc.. [14:34:27] ottomata: yea see elukey's suggestion above. I'm submitting a new patch that sets file resources to ensure => absent [14:36:39] milimetric, I'm having difficulties with Wikistats' dimensionalData [14:36:50] mforns: oh cool, welcome to the party :) [14:36:57] batcave? [14:36:59] I'm trying to add a method to it that filters the last N methods [14:37:01] xD [14:37:02] yea [14:37:07] gimme 1 min [14:37:35] ah ok [14:37:35] cool [14:37:42] yall got it, thanks [14:45:40] ah now archiva works! It was the CSRF validation thing that they added [14:45:51] (I am tunneling via ssh) [15:07:33] ottomata: standup [15:08:21] AHCK [15:10:31] (03PS3) 10Reedy: Add new wikis [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450015 (https://phabricator.wikimedia.org/T198400) [15:14:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Turn off old geowiki jobs - https://phabricator.wikimedia.org/T190059 (10Milimetric) [15:17:30] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10mobrovac) >>! In T200215#4472284, @Ottomata wrote: > Eric can correct me if I'm wrong, but I bel... [15:25:50] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Data set review for the Wiktionary Cognate Dashboard - https://phabricator.wikimedia.org/T199851 (10Milimetric) p:05Triage>03Normal a:03Milimetric [15:26:34] 10Analytics: Use Snakebite instead of subprocess.Popen in HdfsUtils - https://phabricator.wikimedia.org/T200904 (10Milimetric) p:05Triage>03Normal [15:26:57] 10Analytics, 10Analytics-Cluster: Upgrade spark 2.3.0 -> 2.3.1 on analytics cluster - https://phabricator.wikimedia.org/T200732 (10Milimetric) p:05Triage>03Normal [15:29:53] 10Analytics: Scan npm dependencies for vulnerabilities - https://phabricator.wikimedia.org/T200717 (10Milimetric) p:05Triage>03High [15:30:27] 10Analytics: Scan npm dependencies for vulnerabilities - https://phabricator.wikimedia.org/T200717 (10Milimetric) ping @mobrovac have you any tools that do this already? Or should we work together? [15:35:10] 10Analytics: Scan npm dependencies for vulnerabilities - https://phabricator.wikimedia.org/T200717 (10mobrovac) We have been using [nsp](https://www.npmjs.com/package/nsp) for a while now for Node.JS services, but it will be discontinued [at the end of September](https://blog.npmjs.org/post/175511531085/the-node... [15:35:12] 10Analytics: Scan npm dependencies for vulnerabilities - https://phabricator.wikimedia.org/T200717 (10Milimetric) [15:35:17] milimetric, do you have 5 mins to see this weird problem with dimensionalData? [15:35:22] still not working! [15:35:39] I believe now that it's sth totally unrelated to dimensionalData... [15:35:53] 10Analytics: Scan npm dependencies for vulnerabilities - https://phabricator.wikimedia.org/T200717 (10Milimetric) k, great, do you want to setup anything more standard in jenkins to run npm audit as part of Verify +2? [15:41:22] 10Analytics: Scan npm dependencies for vulnerabilities - https://phabricator.wikimedia.org/T200717 (10mobrovac) We currently have `nsp` [run as part of `npm test`](https://github.com/wikimedia/service-template-node/blob/1784fd19fcab699cda44b429ea31fa47e5e82793/package.json#L8) which automatically makes Jenkins r... [15:43:58] hm, a-team looks like we have a regression in mediawiki sqoop jobs, the cron for the private sqoop (geoeditors) failed [15:44:08] mmmmmmm [15:44:42] milimetric: ah snap yes I saw it this morning and forgot to mention at standup :( [15:45:02] it was this: https://github.com/wikimedia/analytics-refinery/commit/b8e7de986c93731e255d8bb0124bef758fbbeb9e#diff-d98e097b5babe4a1821bd6527cb45ab3 [15:45:10] but I think that NameError: name 'DEVNULL' is not defined is probably easy to fix [15:45:11] just needs to import DEVNULL from (sys?) [15:45:13] ? [15:45:18] I got it [15:48:11] (03PS1) 10Milimetric: Fix lack of import [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450048 [15:48:25] elukey: if you +2 that I can deploy and re-launch the command in a screen [15:50:38] milimetric: qq - are we using python3 right to run sqoop? [15:50:51] elukey: yes [15:50:58] all righ [15:51:05] (03CR) 10Elukey: [C: 032] Fix lack of import [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450048 (owner: 10Milimetric) [15:51:24] ack for the deploy too [15:51:49] (03CR) 10Milimetric: [V: 032] Fix lack of import [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450048 (owner: 10Milimetric) [15:53:19] (deploying now) [15:53:33] (afk for 10 mins) [15:56:58] (03PS1) 10Cicalese: Filter out erroneous pingback data caused by T200864. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/450050 (https://phabricator.wikimedia.org/T200864) [16:21:45] hm, I'm starting to hate crossfilter... [16:37:18] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) Ok I think I have finally get something :) So I left tcpdump to capture ipv6 traffic excluding some "known" IPs like puppetmas... [16:50:49] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) Tried to find all the occurrences of webproxy and added the related https configuration, let's see if things will change! [17:00:48] I may have found the last things calling https without the proxy --^ [17:10:22] (03PS1) 10Mforns: Fix interval bugs in time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/450063 (https://phabricator.wikimedia.org/T200497) [17:13:04] (03PS2) 10Mforns: Fix interval bugs in time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/450063 (https://phabricator.wikimedia.org/T200497) [17:20:43] (03PS3) 10Mforns: Fix interval bugs in time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/450063 (https://phabricator.wikimedia.org/T200497) [17:33:57] !log deployed refinery, relaunching geoeditors sqoop [17:33:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:39:06] Do we know the story behind the big spike in mobile traffic after May 19 as shown on page 16 [17:39:09] https://upload.wikimedia.org/wikipedia/commons/6/62/Wikimedia_Foundation_Audiences_metrics_Q3_2017-18_%28Jan-Mar_2018%29.pdf [17:39:15] Singapore slide [17:39:42] Anyone: just checking to be super sure about the following: the `user_id` and `event_user_id` fields in the wmf.mediawiki_history table are local (i.e. wiki-specific), not global user ids? I would say local. Thanks. [17:48:56] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: [Wikistats 2] Bug in time-range selector on detail page - https://phabricator.wikimedia.org/T200497 (10mforns) You can also reproduce the error just by selecting a monthly time range and then changing to a daily time range. The sel... [17:51:08] milimetric: I guess you know everything what is there to know about this table - the `user_id` and `event_user_id` fields in the wmf.mediawiki_history table are local (i.e. wiki-specific), not global user ids? I would say local. Thanks! [17:57:38] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10Ottomata) I mentioned this to Marko in IRC, but I'm not sure if his previous statement is quite... [17:58:32] GoranSM: yep, local [17:58:55] GoranSM: I've got your other requests open in tabs, I'll try to get to them by end of today [18:01:52] milimetric: Thanks a lot! [18:12:37] 10Analytics, 10Product-Analytics: Load change_tag tables in Analytics Data Lake daily - https://phabricator.wikimedia.org/T201062 (10Neil_P._Quinn_WMF) [18:13:01] 10Analytics, 10Product-Analytics: Load change_tag tables in Analytics Data Lake daily - https://phabricator.wikimedia.org/T201062 (10Neil_P._Quinn_WMF) [18:16:35] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10mobrovac) Indeed, the discussion is probably out of the scope of this ticket. That said, it wou... [18:18:05] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10Ottomata) {meme, src=votecat} [18:18:39] 10Analytics, 10Product-Analytics: Load change_tag tables in Analytics Data Lake daily - https://phabricator.wikimedia.org/T201062 (10Neil_P._Quinn_WMF) [18:18:58] 10Analytics, 10Product-Analytics: Load change_tag tables in Analytics Data Lake daily - https://phabricator.wikimedia.org/T201062 (10Neil_P._Quinn_WMF) [18:19:13] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201063 (10Ottomata) p:05Triage>03Normal [18:37:26] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Ottomata) p:05Triage>03Normal [18:38:55] ottomata: ok for me to merge https://gerrit.wikimedia.org/r/#/c/operations/debs/archiva/+/449755/ ? [18:38:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform - https://phabricator.wikimedia.org/T185233 (10Ottomata) [18:39:23] then I'll merge+push to the debian branch, and send the code review for the debian stuff [18:42:06] elukey: +1 [18:47:26] 10Analytics, 10Product-Analytics: Load change_tag tables in Analytics Data Lake daily - https://phabricator.wikimedia.org/T201062 (10chelsyx) A related ask (but probably should sit in a new ticket): Is it possible to backfill the tag? We started to include tags `ios app edit` and `android app edit` from June 2... [18:55:40] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) >>! In T200215#4471773, @mobrovac wrote: > I assume the task description implies t... [18:59:29] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10Ottomata) @EBernhardson, do you have an idea of how large your individual messages will be? I k... [19:01:54] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) >>! In T200215#4474031, @Ottomata wrote: > @EBernhardson, do you have an idea of h... [19:06:57] 10Analytics: Scan npm dependencies for vulnerabilities - https://phabricator.wikimedia.org/T200717 (10Legoktm) CI already has support for running `npm audit` with npm v6, please file a bug in #continuous-integration-config if you want it added to your repositories. [19:20:19] * elukey off! [19:22:25] 10Analytics, 10Product-Analytics: Load change_tag tables into the Analytics Data Lake on a daily basis - https://phabricator.wikimedia.org/T201062 (10Neil_P._Quinn_WMF) [19:27:33] 10Analytics, 10Wikimedia-Incident, 10cloud-services-team (Kanban): Alarms on throughput on refined data - https://phabricator.wikimedia.org/T198908 (10Ottomata) Alright, I have an idea! So, a long time ago, @JAllemandou wrote the CamusPartitionChecker that we use to mark webrequest partitions as imported.... [19:31:57] NeilPatelQuinn[m: yt? [20:19:38] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) I ran a slightly longer test using 5M records, this allows it to run long enough t... [20:22:56] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), 10Services (watching): Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10Ottomata) Hm ok. We can probably handle that in main-eqiad, but it would be very bursty and dom... [20:30:40] Anyone around today that can help us with Piwik login? :) [20:34:09] varnent: unless mforns is still here (and i'm not sure if he can) [20:42:13] ottomata: do you remember how to fix this problem we always have when repairing hive table partitions doesn't work? [20:42:33] you set mapred.mode to nonstrict, and then go: [20:42:33] milimetric: what's up? [20:42:33] msck repair table wmf_raw.mediawiki_private_cu_changes; [20:42:37] sounds right [20:42:57] but because it's multiple nested partitions like month=x/wiki=y, it fails [20:43:01] hm [20:43:06] that shoudln't be a problem tho [20:43:07] no? [20:43:14] SET hive.mapred.mode = nonstrict; [20:43:15] right? [20:43:19] with this error: Caused by: MetaException(message:Expected 1 components, got 2 (month=2018-07/wiki_db=abwiki)) [20:43:30] yeah, it shouldn't but it always happens and I always forget why [20:44:13] that's not one i remember seeing before [20:44:45] ok :( yeah, I definitely saw this at least a half dozen times [20:44:50] and I always forget why 'cause it makes no sense [20:44:52] set hive.msck.path.validation=ignore; [20:44:53] ? [20:45:06] hm, worth a shot... [20:45:19] could be dangerous be carefil [20:49:48] ok, that worked, looks like maybe we're hitting https://issues.apache.org/jira/browse/HIVE-12859 which is unresolved, so I'll file a task to find a better work-around [20:50:00] worst case we might have to set that ignore on the repair partition workflow [20:55:58] hm [20:56:15] milimetric: the oozie workflow does repair partitions? instead of manually adding? [20:56:30] yeah, it does repair [20:56:38] oh, I guess it could add them manually [20:56:42] not sure how long that would take [20:57:42] milimetric: how many partitions does it need to add each run? [20:57:49] it hsouldn't be a lot, right? [20:57:53] just one per wiki per month? [20:58:02] there's a job per wiki, no? [20:58:08] for this, yeah, not sure if there are others, but yeah, for this like 780 [20:58:24] one per all wikis per month [20:58:28] (job) [20:58:36] right [20:58:56] milimetric: is this sqoop? or something else? [20:59:36] well that workflow is called from all over oozie [20:59:43] but in this case, the sqoop private cu_changes job failed [20:59:52] (I'm rerunning now after doing the ignore trick) [21:01:12] is it one oozie workflow per wiki? [21:01:15] or per month? [21:01:22] per month [21:01:29] so the job works with all wikis [21:01:48] i guess the job could just know about what wikis wer eimported and add teh partitions [21:05:22] seems like a lot of work to do this case by case, would be nice if we could make a general partition checker thing that just works [21:06:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform - https://phabricator.wikimedia.org/T185233 (10Pchelolo) [21:08:12] hm ya [21:11:11] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform - https://phabricator.wikimedia.org/T185233 (10Pchelolo) [21:12:02] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform - https://phabricator.wikimedia.org/T185233 (10Pchelolo) [21:14:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Pchelolo) - As an **engineer**, I want to be able to guarantee production of the event and be able to retry until the... [21:15:00] 10Analytics, 10EventBus, 10Services (later): Reliable (atomic) MediaWiki event production - https://phabricator.wikimedia.org/T120242 (10Pchelolo) [21:15:02] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Pchelolo) [21:29:23] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201063 (10Pchelolo) - As an **engineer** I want each schema/(schema revision) to have a unique ID in a form of a publically accessible... [21:33:42] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform - https://phabricator.wikimedia.org/T185233 (10CCicalese_WMF) [22:05:48] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), and 2 others: Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) I've added some rate limiting and tested it set to 1k messages/s: >>! In T200215#4474426, @... [23:02:58] 10Quarry: Support queries against Quarry's own database and ToolsDB - https://phabricator.wikimedia.org/T151158 (10bd808)