[03:23:18] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Definition of not text content metrics for tunning session (rich media,: images and the linke) - https://phabricator.wikimedia.org/T247417 (10Nuria) @Nuria to submit reportupdater change to get these metrics every quarter [03:52:09] 10Analytics, 10Product-Analytics, 10Epic: SQL definition for wikidata metrics for tunning session - https://phabricator.wikimedia.org/T247099 (10Nuria) @Nuria to setup reportupdater monthly calculations for this [04:35:24] (03PS1) 10Nuria: Automate calculations for number of pages using wikidata items [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) [04:35:45] (03CR) 10Nuria: [C: 04-1] "Still needs testing" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) (owner: 10Nuria) [06:14:17] (03PS4) 10Fdans: Replace numeral with numbro and fix bytes formatting [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/585725 (https://phabricator.wikimedia.org/T199386) [06:20:25] (03PS5) 10Fdans: Replace numeral with numbro and fix bytes formatting [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/585725 (https://phabricator.wikimedia.org/T199386) [06:55:46] (03PS6) 10Fdans: Replace numeral with numbro and fix bytes formatting [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/585725 (https://phabricator.wikimedia.org/T199386) [07:00:28] (03PS1) 10Fdans: Add "automated" as a new dimension for total page views [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/593175 (https://phabricator.wikimedia.org/T251170) [07:01:09] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Wikistats UI offers splits with agent_type; spider, user and automated - https://phabricator.wikimedia.org/T251170 (10fdans) a:03fdans [07:01:21] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Wikistats UI offers splits with agent_type; spider, user and automated - https://phabricator.wikimedia.org/T251170 (10fdans) p:05Triage→03High [07:01:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add "automated" dimension to Total Page Views metric on Wikistats - https://phabricator.wikimedia.org/T251170 (10fdans) [07:03:14] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review, 10good first task: Wikistats - move from numeral to numbro for better localization support - https://phabricator.wikimedia.org/T199386 (10fdans) p:05Low→03High a:03fdans [07:09:50] fdans: holaaa [07:10:00] helloooo [07:10:04] when you have a moment I'd need your JS powa [07:10:07] for superset [07:10:27] haha that was so dramatic [07:10:35] ...for superset [07:10:57] elukey: let me finish documenting one thing 2 min and I'll ping you [07:11:08] yeah even this afternoon [07:14:27] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Experiment with Druid and SqlAlchemy - https://phabricator.wikimedia.org/T249681 (10elukey) The 0.36.0 release of Superset adds a specific setting to allow Druid datasources (as we use them) to be visible/usable. Upstream is clearly in favor of u... [08:05:00] addshore: o/ [08:05:09] I know today I am annoying, sorry [08:05:16] if you have time, I'd like to discuss https://phabricator.wikimedia.org/T119070 [08:05:25] is it still needed? [08:05:32] I am refactoring all puppet code on stat boxes [08:05:50] and currently to allow that use case we push logs via rsync from labstore100x to stat1007 [08:06:23] ideally, if still needed, we could push the logs to hdfs and modify consumers to pull from it, rather than localhost on stat1007 [08:06:35] otherwise we could simply remove the rsync :D [08:26:09] 10Analytics, 10Analytics-Kanban: Support language variations on Wikistats - https://phabricator.wikimedia.org/T251091 (10fdans) a:03fdans [08:26:24] 10Analytics, 10Analytics-Kanban: Support language variations on Wikistats - https://phabricator.wikimedia.org/T251091 (10fdans) p:05Triage→03High [08:33:03] 10Analytics, 10Analytics-Kanban: Support language variations on Wikistats - https://phabricator.wikimedia.org/T251091 (10fdans) As discussed in the Gerrit patch, this will be solved by maintaining a list of languages manually within Wikistats. The languages will be arriving in a very gradual way and we want fi... [08:40:37] 10Analytics: Change Wikistats UI language without reloading the page - https://phabricator.wikimedia.org/T251375 (10fdans) [08:43:41] 10Analytics: Support right-to-left languages - https://phabricator.wikimedia.org/T251376 (10fdans) [08:47:27] !log roll restart zookeeper on an-conf* to pick up new openjdk11 updates (affects hadoop) [08:47:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:48:59] 10Analytics, 10Event-Platform, 10Inuka-Team (Kanban), 10KaiOS-Wikipedia-app (MVP): Capture and send back client-side errors - https://phabricator.wikimedia.org/T248615 (10hueitan) Thanks @Ottomata one more question, I tried to hit the error and the response returns success 201, but I can't see it on the da... [09:13:09] 10Analytics: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10elukey) @Nuria I am wondering if we could prioritize this during May, I could also work on it if needed with some advice about sqoop :) [09:26:04] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Experiment with Druid and SqlAlchemy - https://phabricator.wikimedia.org/T249681 (10elukey) Verified in staging that a user without admin permissions can add tables from Druid, so it should work as Druid datasources. I'll follow up over email wi... [10:00:15] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Experiment with Druid and SqlAlchemy - https://phabricator.wikimedia.org/T249681 (10elukey) Added https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Druid_datasources_vs_Druid_tables [10:00:26] * elukey brb [10:49:46] * elukey lunch! [11:47:19] Hi team - I'm gonna deploy soon - Is there anything not yet merged that you'd like me to? [12:28:16] 10Analytics, 10Dumps-Generation: Document missing project types in pagecount dumps - https://phabricator.wikimedia.org/T249984 (10fdans) Checking https://dumps.wikimedia.org/other/pagecounts-ez/merged (April 2020), these are the correct codes: ` # Project-code is # # b:wikibooks, ✅ # d:wiktionary, ==> NOT K... [12:30:22] 10Analytics: Add TLS to Kafka Mirror Maker - https://phabricator.wikimedia.org/T250250 (10elukey) Disabled puppet on jumbo1001, and added the following bits to consumer/producer properties: ` security.protocol=SSL ssl.truststore.location=/etc/kafka/ssl/truststore.jks ssl.truststore.password=XXXXXXX ssl.enabled.... [12:40:53] 10Analytics, 10Analytics-Kanban: Enable TLS encryption from Eventgate to Kafka - https://phabricator.wikimedia.org/T250149 (10Ottomata) All done here! [12:50:17] \o/ [12:57:40] 10Analytics, 10Event-Platform, 10Inuka-Team (Kanban), 10KaiOS-Wikipedia-app (MVP), 10Patch-For-Review: Capture and send back client-side errors - https://phabricator.wikimedia.org/T248615 (10Ottomata) I see your events in Kafka! Logstash is not yet ingesting this topic. Here's a patch for @fgiunchedi t... [12:59:26] folks I am going out buying groceries, should take ~1h [12:59:58] I have mirror maker running with TLS on jumbo1001, I didn't realize that we already have the puppet code to do it and that kafka main already uses TLS for mirror maker! [13:00:06] I'll file a code change when I am back :) [13:01:18] NICE! i don't know if I remembered that either [13:01:24] makes sense though since main is cross DC [13:03:51] Hey ottomata - anything to merge before deploy? [13:05:52] hello! [13:05:52] hm [13:06:05] this might be nice [13:06:05] https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/592739 [13:07:10] joal: ^ [13:11:47] reading [13:12:56] ottomata: blacklist if both blacklisted and whitelisted, blacklist wins - Should we document that? [13:13:14] right, i'd expect that though [13:13:18] but i love documenting! [13:13:19] so ok! [13:13:20] :) [13:13:44] ottomata: I let you comment in code, and I don't know where else we should do it (if any) [13:13:51] except from that, all good [13:14:02] adding comment [13:14:48] (03PS2) 10Ottomata: RefineTarget shouldRefine should consider both table whitelist and blacklist [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/592739 (https://phabricator.wikimedia.org/T238230) [13:15:45] (03CR) 10Joal: [C: 03+2] "LGTM !" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/592739 (https://phabricator.wikimedia.org/T238230) (owner: 10Ottomata) [13:18:18] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging before deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/591956 (https://phabricator.wikimedia.org/T249759) (owner: 10Nuria) [13:19:45] (03PS5) 10Joal: Correcting examples in README for data quality jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/591956 (https://phabricator.wikimedia.org/T249759) (owner: 10Nuria) [13:20:16] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging after rebase" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/591956 (https://phabricator.wikimedia.org/T249759) (owner: 10Nuria) [13:21:18] (03Merged) 10jenkins-bot: RefineTarget shouldRefine should consider both table whitelist and blacklist [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/592739 (https://phabricator.wikimedia.org/T238230) (owner: 10Ottomata) [13:21:31] ty! [13:23:02] fdans: hello :) [13:23:14] fdans: is ally, AQS will return file names in the style that the user [13:23:18] oops [13:23:32] fdans: is https://gerrit.wikimedia.org/r/#/c/analytics/aqs/+/571968/ a candidate to be merged for deploy? [13:24:19] also ottomata - could you CR https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/585295/ please? [13:24:21] joal: nonono [13:24:26] ok fdans :) [13:24:34] all good! thank you joal [13:25:17] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Parameterize sqoop to run in a specific yarn queue [analytics/refinery] - 10https://gerrit.wikimedia.org/r/585295 (https://phabricator.wikimedia.org/T249155) (owner: 10Joal) [13:25:22] merged joal [13:25:23] Thanks andrew :) [13:26:12] (03PS9) 10Joal: Add automated agent-type to pageview_hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/578373 (https://phabricator.wikimedia.org/T238363) [13:26:35] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/578373 (https://phabricator.wikimedia.org/T238363) (owner: 10Joal) [13:26:45] git up [13:26:47] oops [13:32:56] 10Analytics: Statistics on a CN banner - https://phabricator.wikimedia.org/T251177 (10Jseddon) It looks like Wiki Techstorm wasn't the landing page for both years: https://tools.wmflabs.org/pageviews/?project=mediawiki.org&platform=all-access&agent=user&redirects=0&start=2018-08-07&end=2018-10-04&pages=Wiki_Tec... [13:39:03] 10Analytics: page_restrictions field incomplete in Data Lake mediawiki_wikitext_current table - https://phabricator.wikimedia.org/T251411 (10Tgr) [13:40:28] (03PS1) 10Joal: Update changelog for version 0.0.123 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/593232 [13:41:05] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/593232 (owner: 10Joal) [13:47:59] Starting build #41 for job analytics-refinery-maven-release-docker [13:48:28] !log Releasing refinery 0.0.123 onto archiva with Jenkins [13:48:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:50:26] 10Analytics, 10Dumps-Generation: Document missing project types in pagecount dumps - https://phabricator.wikimedia.org/T249984 (10Tgr) Thanks @fdans! That's a fascinating piece of software archeology :) > I find y a little more puzzling. The only site associated to it is test.y. @Tgr perhaps you can think of... [14:01:33] Project analytics-refinery-maven-release-docker build #41: 09SUCCESS in 13 min: https://integration.wikimedia.org/ci/job/analytics-refinery-maven-release-docker/41/ [14:01:43] \o/ :) [14:03:48] ebernhardson: Hi! there are 2 jobs running concurrently eating a lot of resources on the cluster now - could you try to make them run at different time please :) [14:15:52] 10Analytics: Statistics on a CN banner - https://phabricator.wikimedia.org/T251177 (10Ciell) Thank you Seddon! [14:16:54] 10Analytics: Statistics on a CN banner - https://phabricator.wikimedia.org/T251177 (10Ciell) [14:20:54] Hi hashar! [14:21:52] hashar: I have just deployed refinery-source using the dockerized job (awesome) and wondered if you wanted me to test the refinery-update-jars one (IIRC there are merges to happen before) [14:25:31] joal: ahh cool [14:25:47] joal: there is a docker version of the job now [14:25:54] \o/ [14:26:00] though I have changed it to run the script in dry run or review mode [14:26:01] anything to be merged before me testing? [14:26:19] let me check [14:26:27] !log enable TLS consumer/producers for kafka main -> jumbo mirror maker - T250250 [14:26:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:26:31] T250250: Add TLS to Kafka Mirror Maker - https://phabricator.wikimedia.org/T250250 [14:26:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Druid access for view on event.editeventattempt - https://phabricator.wikimedia.org/T249945 (10Nuria) ping @fdans that we need to reindex since the beginning of data in events database, currently data spans only a week [14:26:44] hashar: I ask that cause there are a bunch of open CRs from you ;) [14:26:45] I have two trivial patches which should not be blockers [14:26:52] 10Analytics, 10Patch-For-Review: Add TLS to Kafka Mirror Maker - https://phabricator.wikimedia.org/T250250 (10elukey) Kafka main is already done, all good! [14:26:56] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/589609/ to show "git diff --stats" when running in dry run mode [14:27:07] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add TLS to Kafka Mirror Maker - https://phabricator.wikimedia.org/T250250 (10elukey) p:05Triage→03Medium [14:27:11] and one to switch from wget to curl https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/589610/ , but I think both are in the container already [14:27:29] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add TLS to Kafka Mirror Maker - https://phabricator.wikimedia.org/T250250 (10elukey) [14:27:55] joal: https://gerrit.wikimedia.org/r/#/c/integration/config/+/589589/6/jjb/analytics.yaml has a few questions for you :] [14:28:10] joal: notably I have made the Docker based job to send a patch for review instead of a straight push [14:28:25] and the archiva credentials, I guess I can remove them [14:28:47] hashar: super :) [14:31:35] hashar: just left a bunch of comments [14:32:48] elukey: do you have 5 minutes to batcave? I have a moral question [14:33:03] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/589609 (owner: 10Hashar) [14:34:14] hashar: I'm not good enough in curl - but I trust you :) [14:34:37] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/589610 (owner: 10Hashar) [14:35:50] hashar: I assume we need to merge https://gerrit.wikimedia.org/r/#/c/integration/config/+/589589/ before I can test it right? [14:36:15] mforns: ahhaha sure coming [14:36:26] elukey: ok :] omw [14:36:45] joal: I have deployed the job already ;) [14:37:12] Ah! ok :) [14:37:36] hashar: Shall I test and then we merge, and merge and then we test? [14:37:49] joal: I guess we should test it [14:38:05] then I can amend the job as needed and we repeat until it is all perfect [14:38:18] works for me hashar [14:38:43] hashar: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/ this is the place I guess [14:39:10] No change in parameter, except for RELEASE_VERSION as usual, right hashar ? [14:40:01] joal: yup [14:40:15] Testing, hashar! [14:40:18] potentially we could have the script to figure out the latest release version by parsing tags from the source repo [14:40:22] but hmm. I was too lazy [14:40:29] So am I :) [14:40:36] the archiva credentials aren't need are they? [14:40:44] I don't think they are [14:41:07] failure :( [14:41:16] ;D [14:42:01] hashar: curl - command not found :) [14:42:06] MEHHHH [14:42:54] grbmblbl [14:43:04] 10Analytics, 10Analytics-EventLogging, 10Event-Platform, 10CPT Initiatives (Modern Event Platform (TEC2)), 10MW-1.34-notes (1.34.0-wmf.20; 2019-08-27): Refactor EventBus mediawiki configuration - https://phabricator.wikimedia.org/T229863 (10Ottomata) I just ran into [[ https://phabricator.wikimedia.org/T... [14:43:21] let me refresh the job [14:44:40] joal: try again? ;) [14:44:55] sure hashar [14:44:58] seems like I forgot to update the job and it used some obsolete container that indeed lacked curl [14:44:58] bah [14:45:20] Starting build #9 for job analytics-refinery-update-jars-docker [14:45:36] Project analytics-refinery-update-jars-docker build #9: 04STILL FAILING in 15 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/9/ [14:46:22] hashar: also, it seems that curl also shows bizarre stuff in the log :S [14:46:44] https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/9/console [14:48:03] 10Analytics: page_restrictions field incomplete in Data Lake mediawiki_wikitext_current table - https://phabricator.wikimedia.org/T251411 (10JAllemandou) I checked raw data and 'page_restrictions' is empty for the given example (it is as well in the historical dumps). @ArielGlenn, I'm moving this to your realm! [14:48:16] (03PS1) 10Lex Nasser: Fix project field of geoeditors public monthly and semantics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) [14:48:26] 10Analytics, 10Dumps-Generation: page_restrictions field incomplete in Data Lake mediawiki_wikitext_current table - https://phabricator.wikimedia.org/T251411 (10JAllemandou) [14:49:28] 10Analytics, 10Dumps-Generation: page_restrictions field incomplete in current and historical dumps - https://phabricator.wikimedia.org/T251411 (10JAllemandou) [14:51:05] grblblbl [14:56:15] joal: wrong workdir, the script is run from / [14:56:23] I should really have tested it out locally before doing all those changes [14:56:33] for curl , I guess we can have it run in silent mode [14:56:47] sure, that is fine :) [14:57:01] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Technical contributors emerging communities metric definition, thick data - https://phabricator.wikimedia.org/T250284 (10Nuria) Of use: {T226663} [14:57:44] 10Analytics, 10Analytics-Kanban, 10Research: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Isaac) > quoting "database" should work. Thanks @Nuria ! > Now seeing the select spans all partitions for table I doubt it would work as... [14:58:49] joal: sure, it already limits the hyperparam job (there are actually 18 of them, but only 1 at a time runs), but the second glent job is separate, i'll see if i can get it into the same bucket as the others for single-execution [14:59:09] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Technical contributors emerging communities metric definition, thick data - https://phabricator.wikimedia.org/T250284 (10Nuria) Some documentation for the dataset used to build data for the dashboards: https://wikitech.wikimedia.org/wiki/Analytics/Data_L... [15:00:04] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Technical contributors emerging communities metric definition, thick data - https://phabricator.wikimedia.org/T250284 (10Nuria) dashboard link: https://wmcs-edits.wmflabs.org/#wmcs-edits [15:01:06] Many thanks ebernhardson - the hyperparam one is resource heavy :) [15:01:45] nuria: standuuup [15:02:26] at some point i need to evaluate if hyperparam even needs to run every week....but that takes time. Some day :) [15:02:39] :) [15:03:35] Starting build #10 for job analytics-refinery-update-jars-docker [15:03:49] Project analytics-refinery-update-jars-docker build #10: 04STILL FAILING in 13 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/10/ [15:10:23] Starting build #11 for job analytics-refinery-update-jars-docker [15:10:40] Project analytics-refinery-update-jars-docker build #11: 04STILL FAILING in 16 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/11/ [15:11:27] joal: I am muting the job and will polish it up [15:12:10] sure hashar - Shall I wait for you or move with the old one (I'm eager for us not to make you keep the old machines :) [15:12:31] joal: if you could wait a bit that would be cool :D [15:12:37] no problem hashar :) [15:12:54] thanks a lot for fixing hashar :) [15:13:09] trying to fix up the netrc file ;) [15:15:58] (03CR) 10Milimetric: "one naming nit, otherwise looks good" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [15:30:33] luca ops sync y/n? [15:30:37] elukey: ^ [15:30:42] ottomata: I was about to ask [15:30:43] :) [15:30:56] from my side we can skip [15:31:02] ok llets skip! [15:31:10] ack! [15:31:44] Ah nuria - automated pageviews - No change in data back in time? [15:32:13] joal: no, I think it will be very confusing for reports that are already in place, as we report those quaterly [15:32:27] ack nuria - Will only start from now [15:32:30] thanks nuria [15:32:39] joal: super thanks [15:33:08] (03CR) 10Milimetric: [C: 04-1] Fix language dropdown for ios devices (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/589606 (https://phabricator.wikimedia.org/T246971) (owner: 10Fdans) [15:33:17] Starting build #13 for job analytics-refinery-update-jars-docker [15:33:29] joal: the credential file was not readable ;D [15:33:36] (03PS1) 10Maven-release-user: Add refinery-source jars for v0.0.123 to artifacts [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593259 [15:33:36] Yippee, build fixed! [15:33:37] Project analytics-refinery-update-jars-docker build #13: 09FIXED in 18 sec: https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars-docker/13/ [15:33:40] hashar: and it is needed? [15:33:45] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/593259 Add refinery-source jars for v0.0.123 to artifacts [15:33:46] yeah [15:33:50] to in order to push [15:34:05] when I have removed the archiva credentials, I have dropped a bit to make the git credential file readable bah [15:34:09] that worked! [15:34:31] hashar: You're my CI angel :) [15:34:36] those weeks of confinement makes it challenging :( [15:34:51] indeed - less available time [15:34:55] so the change is that it now sends a review [15:35:18] 10Analytics, 10Operations, 10Wikimedia-Logstash, 10observability, 10Performance-Team (Radar): Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10fgiunchedi) >>! In T205856#6072710, @Krinkle wrote: >>>! In T126989#5076715, @gerritbot... [15:35:23] hashar: I'm gonna aprove and merge that, update the docs, and that's it, no? [15:35:30] yeah [15:35:31] finally! [15:36:20] 10Analytics, 10Analytics-Kanban, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, and 2 others: Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10hashar) [15:36:32] \o/ [15:37:03] elukey: would be ready for me to deploy AQS? [15:37:53] 10Analytics, 10Analytics-Kanban, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, and 2 others: Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10hashar) 05Open→03Resolved Paired with Joseph and the last bit has been c... [15:37:53] joal: thank you so much for all the support! [15:38:08] hashar: no no, thank you for doing all the work :) [15:38:18] ;D [15:39:02] joal: +1 [15:40:13] hashar: question for you: I have received a CR about refinery-source v0.0.123 - shall I discard that? [15:40:24] ack elukey - starting deploy process [15:41:00] joal: that is https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/593259/ which got generated by the new update jars job ;) [15:41:34] 10Analytics, 10Analytics-Kanban, 10Research: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Milimetric) Yes, Isaac, that's the idea. Though the result of your query would be joined with the page titles from `mediawiki_page_histor... [15:41:39] Ah! hashar - I thought the job was pushing (as you mentionned push earlier) - Ok great :) [15:42:04] hashar: I'm gonna merge that, update the docs - anything else on your side or everything has been merged? [15:42:16] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593259 (owner: 10Maven-release-user) [15:45:18] joal: sounds good hopefully there is not too much doc to fix up [15:45:27] hashar: super easy :) [15:45:48] hashar: also, no more emails right, only IRC? [15:47:37] joal: it is supposed to span analytics-alerts@wikimedia.org [15:48:08] It has hashar - my box was not updated [15:48:27] my bad- sorry for the noise hashar - I now leave you continue your confimned life ;) [15:48:52] no worries ;))) [15:49:44] 10Analytics, 10Analytics-Kanban, 10Research: Proposed adjustment to wmf.wikidata_item_page_link to better handle page moves - https://phabricator.wikimedia.org/T249773 (10Isaac) @Milimetric :thumbs up: looks like your approach then will help reduce the number of partitions that have to be searched for `media... [15:49:45] docs updated - those oldies VMs can be dropped :) [15:50:48] (03PS2) 10Lex Nasser: Fix project field of geoeditors public monthly and semantics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) [15:55:15] (03PS1) 10Joal: Update aqs to 916afe4 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/593264 [15:55:16] 10Analytics, 10Analytics-Kanban, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, and 2 others: Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10JAllemandou) Thank you a lot @hashar for making us move to the newer system :) [15:55:24] 10Analytics, 10Analytics-Kanban, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, and 2 others: Migrate analytics/refinery/source release jobs to Docker - https://phabricator.wikimedia.org/T210271 (10JAllemandou) Thank you a lot @hashar for making us move to the newer system :) [15:56:10] (03PS4) 10Lex Nasser: Configure Oozie job for loading geoeditors data into Cassandra [analytics/refinery] - 10https://gerrit.wikimedia.org/r/582638 (https://phabricator.wikimedia.org/T248289) [15:56:16] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/593264 (owner: 10Joal) [15:57:55] !log Deploying AQS using scap [15:57:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:00:34] fdans: see message in ops chan (sorry ...) [16:00:35] :) [16:00:41] joal: :D [16:05:32] elukey: successful aqs deploy :) [16:05:40] super! [16:06:27] !log Deploying refinery using scap [16:06:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:08:59] 10Analytics, 10Research: Check home/HDFS leftovers of flemmerich - https://phabricator.wikimedia.org/T246070 (10Isaac) @elukey I can't do the deletions myself, so I'll just quickly summarize: * Hive: all tables can be removed * HDFS: everything under /user/flemmerich can be deleted (there aren't any other fold... [16:20:08] Pchelolo: Hi! [16:20:24] hello hello [16:20:26] Pchelolo: I just sent a PR to restbase for an update to the pageviews endpoints [16:21:05] looks good [16:21:11] when do you need this deployed? [16:21:24] 10Analytics, 10Research: Check home/HDFS leftovers of flemmerich - https://phabricator.wikimedia.org/T246070 (10elukey) Thanks for the feedback! Cleaned up everything except the Hive db. [16:21:27] Pchelolo: When you want/can - AQS is ready :) [16:22:02] ok, will do at some point, maybe when there's some more changes stacked up [16:22:36] sounds good Pchelolo - Can you add a reminder to keep us posted please? [16:22:58] sure [16:23:09] thanks :) [16:28:16] 10Analytics, 10Event-Platform, 10Inuka-Team (Kanban), 10KaiOS-Wikipedia-app (MVP), 10Patch-For-Review: Capture and send back client-side errors - https://phabricator.wikimedia.org/T248615 (10Ottomata) Merged, let me know if it works now. [16:31:24] 10Analytics, 10Research: Check home/HDFS leftovers of flemmerich - https://phabricator.wikimedia.org/T246070 (10elukey) So the tables in Hive have the following location: ` checking table: motivations.redirect_ar location:hdfs://analytics-hadoop/user/hive/warehouse/motivations checking table: motivations.redi... [16:31:33] joal: for you when you have a min --^ [16:33:51] 10Analytics, 10Research: Check home/HDFS leftovers of flemmerich - https://phabricator.wikimedia.org/T246070 (10JAllemandou) @elukey : sounds good - You might even not need the hdfs-deletion, depending on how the table were created (`external` or not). Also, be sure to run the hive command as `hdfs` user, for... [16:33:56] done elukey --^ [16:33:57] :) [16:34:07] thanks :) [16:34:32] I really want to find a way to make refinery deloyments faster [16:34:49] It started half an hour ago :( [16:38:25] Gone for diner, will finalize deploy after [16:38:54] 10Analytics, 10Research: Check home/HDFS leftovers of flemmerich - https://phabricator.wikimedia.org/T246070 (10elukey) 05Open→03Resolved All cleaned up! [16:40:48] 10Analytics: Check home/HDFS leftovers of anomie - https://phabricator.wikimedia.org/T250167 (10elukey) @AMooney sorry for the lag, I checked in puppet and you don't have ssh access to any host as far as I can see. Creating an account only for this seems a bit overkill, maybe somebody from your team with access... [16:49:10] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Technical contributors emerging communities metric definition, thick data - https://phabricator.wikimedia.org/T250284 (10Nuria) Given that the emerging communities are of very different nature it is very likely that we need to think of a categorization sc... [16:51:32] (03CR) 10Nuria: [C: 03+2] Add "automated" as a new dimension for total page views [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/593175 (https://phabricator.wikimedia.org/T251170) (owner: 10Fdans) [16:52:04] joal: we need to also document on wikitech for aqs and pageview_hourly [16:52:47] (03Merged) 10jenkins-bot: Add "automated" as a new dimension for total page views [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/593175 (https://phabricator.wikimedia.org/T251170) (owner: 10Fdans) [16:58:27] 10Analytics: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Nuria) I think we definitely can, let's move it to kanban and assign to @Milimetric that will have, u think, some bandwith to tackle it. [16:58:58] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Nuria) a:03Milimetric [17:12:00] PROBLEM - Presto Server on an-presto1005 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args com.facebook.presto.server.PrestoServer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto/Administration%23Presto_server_down [17:12:25] I think this might be Irene testing superset :D [17:12:39] i think that was me [17:12:41] sorry! [17:13:08] bearloga: ah ok! Thanks for telling us, no big deal :) [17:13:21] what query did you run ?? [17:13:38] PROBLEM - Presto Server on an-presto1002 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args com.facebook.presto.server.PrestoServer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto/Administration%23Presto_server_down [17:14:06] I am running puppet on an-presto to bring up the nodes [17:14:46] elukey: idk if using DISTINCT and GROUP BY together by accident (I was prototyping the query with DISTINCT & LIMIT before switching over to GROUP BY). here's the query: [17:15:26] elukey: the query with DISTINCT removed https://www.irccloud.com/pastebin/0vEaopP1/ [17:15:28] RECOVERY - Presto Server on an-presto1002 is OK: PROCS OK: 1 process with command name java, args com.facebook.presto.server.PrestoServer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto/Administration%23Presto_server_down [17:15:40] RECOVERY - Presto Server on an-presto1005 is OK: PROCS OK: 1 process with command name java, args com.facebook.presto.server.PrestoServer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto/Administration%23Presto_server_down [17:16:49] bearloga: how many hours of webrequest? [17:16:57] elukey: just 1 [17:18:44] oh yeah, it's going much better without that accidental DISTINCT in there [17:20:46] from https://grafana.wikimedia.org/d/pMd25ruZz/presto?orgId=1&from=now-1h&to=now there seems to be a ton of data moving anyway [17:20:54] (workers section) [17:21:19] but we still need to find how presto behaves with these kind of queries [17:21:24] I broke it myself yesterday :) [17:22:27] I'm surprised it didn't error when the sub-query included both DISTINCT and GROUP BY. I like that about Hive, when it saves me from myself [17:25:05] elukey: 1 hour executes in 00:00:37.30, is it safe to remove the hour partition so the query runs on the whole day? [17:25:34] or should I stick with hour-by-hour querying? [17:25:45] bearloga: nono please, one day is too much data, hour by hour is safer [17:25:57] glad I asked! :) [17:27:43] elukey: the speed on this thing is incredible. thank you and everyone else on the team who made it happen. this is the future of querying data lake and the future is awesome and SO FAST [17:28:42] bearloga: nice :) [17:30:29] Presto seems to die for not enough memory to commit on the OS [17:30:39] possibly the Xmx=124G is too much [17:32:40] PROBLEM - Presto Server on an-presto1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args com.facebook.presto.server.PrestoServer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto/Administration%23Presto_server_down [17:33:49] ah ok, so 124G are 124 Gibibyte, so 133 gigabytes [17:33:57] more than the total on the host [17:34:30] RECOVERY - Presto Server on an-presto1001 is OK: PROCS OK: 1 process with command name java, args com.facebook.presto.server.PrestoServer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto/Administration%23Presto_server_down [17:45:01] !log roll restart Presto workers to pick up the new jvm settings (110G heap size) [17:45:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:51:44] brb [17:54:55] elukey: about presto and data-movement: For them moment, almost all data needs to be moved to presto nodes for computation to happe [17:55:32] nOnce we'll have presto collocated with HDFS, even with small amount of memory/compute, first data gathering will be local [17:56:27] As most queries do a lot of filtering on raw data, the 'local' aspect of first data gathering is important :) [17:56:51] yep I agre [17:56:55] *agree [17:57:08] ok presto workers restarted with the new heap size [17:57:19] * joal want refinery to be deployed as fast as the thin env of it [17:57:57] !log Deploy refinery on HDFS [17:57:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:58:36] will check later, logging off! o/ [17:58:42] Bye elukey [18:00:01] just a reminder Analytics/Research office hours starting now for the next hour. If you have questions for the two teams, show up in the wikimedia-research channel on IRC. [18:00:27] milimetric: ^ [18:03:46] (03CR) 10Lex Nasser: "joal helped test this - PS4 resolved an issue with the properties file. It should be all good to go" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/582638 (https://phabricator.wikimedia.org/T248289) (owner: 10Lex Nasser) [18:20:21] (03CR) 10Joal: "Another broader comment I ddin't notice on the already merge code: the `geoeditors_public_monthly` dataset from the oozie/mediawiki/geoedi" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [18:23:54] (03CR) 10Joal: Configure Oozie job for loading geoeditors data into Cassandra (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/582638 (https://phabricator.wikimedia.org/T248289) (owner: 10Lex Nasser) [18:24:36] joal: deploy went ok? [18:27:29] ottomata: all good from a deploy perspective - now restarts :) [18:27:43] also ottomata, we are fully dockerized in CI :) [18:29:23] !log Kill-restart data-quality-stats-hourly bundle [18:29:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:34:22] (03PS1) 10Joal: Remove data_quality mandatory user parameter [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593289 [18:34:26] mforns: --^ [18:34:41] looking [18:36:06] !log kill restart pageview-druid jobs (hourly, daily, monthly) to add new dimension [18:36:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:36:22] joal: it worked! the docker deploy? [18:37:01] ottomata: we've been docker-releasing to archiva for a month, and now we docker-update-refinery :) [18:37:15] ottomata: wikitech doc is up-to-date :) [18:37:23] niiiiice [18:37:25] looking [18:37:41] ottomata: kudos to hashar who's been doing all the work [18:38:17] great stuff [18:38:56] mforns: actually he pageview-druid code have been restarted already (21st of April) [18:39:11] oh! the monthly one? [18:39:19] yup [18:39:24] ok! wait... [18:39:34] I checked today in turnilo and there were no new dimensions... [18:39:39] hm [18:39:43] let's check in druid [18:41:13] mforns: In druid they are (project_family, namespace_is_content...) for pageview_hourly [18:41:34] (03CR) 10Mforns: [C: 04-1] "Isn't there a user param in the daily bundle?" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593289 (owner: 10Joal) [18:41:35] in pageview_monthly they are not, but no job has run yet [18:41:49] aaah! that's why [18:42:12] april has not finished [18:42:44] (03CR) 10Lex Nasser: "joal: Oops, missed the changes for bundle.xml, will change those - thanks!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/582638 (https://phabricator.wikimedia.org/T248289) (owner: 10Lex Nasser) [18:44:15] (03PS4) 10Ottomata: Add python/refinery/eventstreamconfig.py and use in in bin/camus to build dynamic topic whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593047 (https://phabricator.wikimedia.org/T241241) [18:44:24] (03PS2) 10Joal: Remove data_quality mandatory user parameter [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593289 [18:44:40] (03CR) 10Joal: "there was no user param in daily bundle, nope :)" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593289 (owner: 10Joal) [18:45:18] !log No restart needed for pageview-druid jobs [18:45:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:46:28] !log Kill-restart pageview-hourly job [18:46:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:47:31] (03PS3) 10Joal: Remove data_quality mandatory user parameter [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593289 [18:47:59] (03CR) 10Joal: "Actually there was, my updated file was not saved /facepalm" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593289 (owner: 10Joal) [18:48:06] mforns: --^ [18:49:16] (03CR) 10Lex Nasser: "joal: I was thinking that the dropping would go here just because it's like that in other tables such as wmf.mediarequest, and there's no " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [18:50:00] oh joal! so the user param comes from the enviroment variable? [18:50:39] (03CR) 10Joal: "@lexnasser: makes sense. qustion then: should we check for '.org' before removing, or is it safe to say that all projects will end in .org" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [18:51:28] (03CR) 10Mforns: [C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593289 (owner: 10Joal) [18:51:56] joal: I let you verify and merge this time :] [18:52:45] (03CR) 10Lex Nasser: "joal: right now, geoeditors is only supported for Wikipedia projects, which all end with .org . Maybe in the future, this may change, thou" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [18:53:49] (03CR) 10Ottomata: "Tested, it works!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593047 (https://phabricator.wikimedia.org/T241241) (owner: 10Ottomata) [18:55:58] !log Kill-restart cassandra-daily-coord-local_group_default_T_pageviews_per_article_flat [18:55:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:07:50] (03CR) 10Joal: "Indeed the geoeditor start-date is correct, my bad. I looked at geoeditors-edits-monthly, which has started 2018-11. Sorry about that :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/582638 (https://phabricator.wikimedia.org/T248289) (owner: 10Lex Nasser) [19:08:26] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging quick-fix" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593289 (owner: 10Joal) [19:10:48] (03CR) 10Joal: "Another way is to: REGEX_REPLACE(project, '\.org$', '')" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [19:17:20] (03PS5) 10Lex Nasser: Configure Oozie job for loading geoeditors data into Cassandra [analytics/refinery] - 10https://gerrit.wikimedia.org/r/582638 (https://phabricator.wikimedia.org/T248289) [19:18:35] (03CR) 10Joal: [C: 03+2] "LGTM :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/582638 (https://phabricator.wikimedia.org/T248289) (owner: 10Lex Nasser) [19:50:36] (03PS3) 10Lex Nasser: Fix project field of geoeditors public monthly and semantics [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) [19:51:40] (03CR) 10Lex Nasser: "joal: chose a different, more future-proof way, just dropping everything after the last period, including the last period, tested on a few" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/593246 (https://phabricator.wikimedia.org/T244597) (owner: 10Lex Nasser) [20:46:10] 10Analytics, 10Event-Platform, 10WMF-JobQueue, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: EventBus extension must not send batches that are too large - https://phabricator.wikimedia.org/T232392 (10Pchelolo) [20:46:47] 10Analytics, 10Event-Platform, 10WMF-JobQueue, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: EventBus extension must not send batches that are too large - https://phabricator.wikimedia.org/T232392 (10Pchelolo) p:05Medium→03High Bumping to high priority since this is happening in produ...