[00:03:11] 10Analytics, 10Analytics-Kanban, 10Operations: Move internal sites hosted on thorium to ganeti instance(s) - https://phabricator.wikimedia.org/T202011 (10Dzahn) There are now: analytics-tool1001 - for superset analytics-tool1002 - for turnilo analytics-tool1003 - for hue more details on T202013#4516863 C... [02:11:38] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 7 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Tbayer) >>! In T201124#4517240, @Krinkle wrote: > @Tbayer Nuria, Jon and I had a chat on... [02:34:51] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 7 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Krinkle) Following another IRC chat, I proposed to @Tbayer that if sessionId+randId is s... [03:14:28] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 7 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Nuria) >Again, this doesn't work for the intended purpose of having a pageview token th... [07:07:39] 10Quarry, 10Operations, 10cloud-services-team (Kanban): Let quarry use the mariadb module - https://phabricator.wikimedia.org/T181205 (10jcrespo) Actually, instead of: ``` class {'mariadb::packages_wmf': class {'mariadb::service': ``` `class { 'mariadb::packages'` should work to install the regular service... [07:57:51] (03CR) 10Joal: "@Milimetric: Let's discuss factorization later today - I've tried that before (first patchset of the same change for instance), but it was" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/439869 (https://phabricator.wikimedia.org/T192481) (owner: 10Joal) [07:57:54] | [07:57:54] | .---------. [07:57:54] | /:::::::::::\ [07:57:55] | |:::::::::::::| [07:57:56] | |:::::::::::::| [07:57:57] | |::::::::::::/ [07:58:00] | |:___________\ [07:58:03] | //c \___/ /__) [07:58:06] | /' . .,_ | | -- SLICK P-B-D [07:58:08] | |': ; / \ /_/ THE GHETTO V-I-P [07:58:11] | oo ; `"`" } WILL LEAVE YOU R-I-P [07:58:14] | ; 'oo, { Shizzle my nizzle. [07:58:16] | / oo } [07:58:19] | ; '::. oo\/\ /\| [07:58:21] | |. ':. oo`"`oo\ [07:58:24] | / '::'::' / o o ; [07:58:26] | |':::' '::' / ($) | [07:58:29] | \ '::' _.-`; ; [07:58:31] | /`-..--;` ; | | [07:58:34] | ; ; ; ; ; | | [07:58:36] | ; ; ; ; ; ; / ,--........,, [07:58:39] | |; ; ; ; ;/ ; .' -='. [07:58:42] | | ; ; ; ; / / .\ : [07:58:45] | | ; ; /` .\ _,==" \ .' [07:58:47] | \; ; ; .'. _ ,_'\.\~" //`. \ .' [07:58:50] | | ; .___~' \ \- |-| /,\ ` \ _.' [07:58:52] | ~ ; ; ;/ _,.-~'|-| |$| _,-''\..--' [07:58:55] | ~ /; ;/="" |$| |-| _="` [07:58:57] | ~..==` \\ |-| /_/_="` [07:59:00] | ~` ~ /,\ / /_,)") [07:59:02] | ~ ~~ _,.-)") [07:59:05] | ~ ~ _,=~"| [07:59:07] | ~ =~"|; ;| GNAAbird [07:59:10] | ~ ~ | ; | ======== [07:59:12] | ~ ~ |;|\ | [07:59:15] | |/ \| [08:02:59] milimetric: Thanks a lot for the reviews :) Let's discuss when you have time :) [11:11:53] Hello people :) [11:12:14] still on vacation till tomorrow but I am back in Bologna :) [11:17:48] Hi elukey !!!! :) Enjoy your last day of holidays :) [11:18:22] \o/ [11:18:44] wanted to say to Marcel that India was incredible, but he is not on IRC afaics.. will do it tomorrow :) [11:18:53] :) [11:18:53] ttl people! missed you [11:18:58] <3 [11:23:49] Taking a break team - later [11:34:11] hi teaam :] [12:02:46] (03PS5) 10Mforns: Fix interval bugs in time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/450063 (https://phabricator.wikimedia.org/T200497) [13:39:19] joal: ok, I'm online [13:51:32] (03CR) 10Milimetric: "Oh, I see you basically tried to do the same thing that I thought of here https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/4" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/439869 (https://phabricator.wikimedia.org/T192481) (owner: 10Joal) [13:52:32] Hi milimetric :) [13:52:39] hi joal [13:53:12] Thanks for the reviews [13:54:00] I think it's better to keep the code with duplication - We discussed the readability with nuria_, and we agreed that it would be a lot easier to maintain the way it is [13:57:24] ok joal, didn't know you all had talked about it already [13:57:34] that's fine with me [13:57:43] milimetric: no prob, it's actually an interesting question :) [13:58:35] yeah, I think the parent class would be harder to read but we'd never have to change it, we'd only maintain the list of fields and checks [13:58:51] so my opinion is still that it's better to factor, but if you and nuria decided otherwise, I'm ok [14:04:09] joal: you're gonna kill me [14:04:24] milimetric: Absolutely not :) [14:04:45] for https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/454242/1/oozie/mediawiki/history/reduced/generate_mediawiki_history_reduced.hql, I remember we were thinking of making a service that served user/page data in response to an id [14:04:56] milimetric: If we and nuria find a factorization that makes everybody happy, I'll go for it?here is [14:05:19] milimetric: I recall that as well :) [14:05:37] milimetric: we also discussed the possibility of using druid lookup-tables [14:06:06] oh, yeah, but that's a different topic maybe, for user-id lookups we can use anything, including cassandra [14:06:26] milimetric: for sure [14:06:31] (I think even the mediawiki api) [14:06:48] milimetric: I actually think restbase can do that [14:06:59] yeah, and we can call it directly from wikistats [14:07:05] yup [14:07:19] are there other reasons to do this change then? [14:07:20] But it makes many calls [14:07:21] the id -> text one? [14:07:25] yeah, true [14:10:43] milimetric: other reason I think interesting to index text vs ids is for stats per page/user [14:11:08] We can now request edit/bytes-diff stats per page and user (we could, but it was by id-0 [14:11:44] what was bad about per-id? Just that if someone had a user they'd have to first lookup the id and then request? [14:12:04] correct milimetric, easiness of usage [14:13:14] yeah, ok, this is a much easier way to get this stuff now without having to build a lookup [14:13:17] now one of the downside of the 'text' approach is the canonical-namespace for instance [14:13:43] yeah, there's a few problems [14:13:58] like in a lookup service we could provide a list of WikiProjects the page belongs to, etc. [14:14:04] here we're limited to just namespace [14:14:26] but that's ok, let's go this way, and if we make a lookup service later we can consider changing it back [14:14:46] I have another comment I'll reply on the review [14:15:05] great :) [14:17:27] (03CR) 10Milimetric: [V: 032 C: 032] Replace ids with text in mediawiki-history-reduced [analytics/refinery] - 10https://gerrit.wikimedia.org/r/454242 (https://phabricator.wikimedia.org/T201617) (owner: 10Joal) [14:17:47] I'll let you merge that joal, I was wrong, I thought you forgot the : in between the namespace and title [14:17:52] but you didn't :) [14:18:22] milimetric: I originally made it a '/', and when testing realized it was a ':' :) [14:18:52] milimetric: testing the datasource using AQS locally with a tunnel to druid has been really super fun :) [14:19:12] milimetric: I ran a python script querying both real AQS and local one, and compare the results [14:19:27] so you index a new datasource, and then test that? Yeah, that's great [14:20:28] in private cluster: test_mediawiki_history_reduced_2018_07 [14:22:26] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: Productionize per-country daily & monthly active app user stats - https://phabricator.wikimedia.org/T186828 (10Tbayer) >>! In T186828#4516659, @mpopov wrote: >>>! In T186828#4505784, @chelsyx wrote: >> >> 1,... [14:29:03] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: Productionize per-country daily & monthly active app user stats - https://phabricator.wikimedia.org/T186828 (10Tbayer) [14:34:44] nuria_: staff? [14:37:16] ottomata: going [14:57:07] (03PS4) 10Amire80: Measure articles published using CX2 [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/442860 (https://phabricator.wikimedia.org/T196435) [15:03:45] 10Analytics, 10Analytics-Kanban: Clickstream dataset for Persian Wikipedia only includes external values - https://phabricator.wikimedia.org/T191964 (10Nuria) [15:08:48] (03CR) 10Milimetric: [C: 032] Measure articles published using CX2 [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/442860 (https://phabricator.wikimedia.org/T196435) (owner: 10Amire80) [15:39:36] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: Productionize per-country daily & monthly active app user stats - https://phabricator.wikimedia.org/T186828 (10Tbayer) >>! In T186828#4516659, @mpopov wrote: >>>! In T186828#4505784, @chelsyx wrote: >> [...]... [15:46:21] 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics: No recent data in the Edit event log in the Data Lake - https://phabricator.wikimedia.org/T202348 (10Ottomata) Yes, the 'Edit' schema is blacklisted from Hive Refinement, because the schema is so ~~bad~~ incompatible [15:53:05] (03CR) 10Zhuyifei1999: [C: 031] Implement user prefs and browser notifications (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/427952 (https://phabricator.wikimedia.org/T124625) (owner: 10Framawiki) [15:54:29] 10Analytics, 10Analytics-Wikistats: "Total Article Count" Wikistats metric (per project and overall) - https://phabricator.wikimedia.org/T198425 (10Nuria) [15:54:31] 10Analytics-Kanban, 10Analytics-Wikistats: Vet calculation of total article count by summing pages created (with proper filters) over timespam - https://phabricator.wikimedia.org/T199734 (10Nuria) 05Open>03Resolved [15:54:44] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Refactor puppet code for the Hadoop Analytics cluster to roles/profiles - https://phabricator.wikimedia.org/T167790 (10Nuria) 05Open>03Resolved [15:59:10] (03CR) 10Milimetric: "-1 for the typo. But a couple of questions came to mind as I was thinking about this." (036 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/454243 (https://phabricator.wikimedia.org/T201617) (owner: 10Joal) [15:59:17] (03CR) 10Milimetric: [C: 04-1] Update Wikistats2 top and per-editors/edited-pages [analytics/aqs] - 10https://gerrit.wikimedia.org/r/454243 (https://phabricator.wikimedia.org/T201617) (owner: 10Joal) [16:26:14] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform (TEC2) - https://phabricator.wikimedia.org/T185233 (10CCicalese_WMF) [16:37:29] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install analyticsmaster100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T201939 (10Cmjohnson) [16:50:16] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: [EL sanitization] Modify spark log4j params to output to stdout instead of stderr - https://phabricator.wikimedia.org/T202429 (10mforns) p:05Triage>03Normal [16:58:01] 10Analytics, 10MinervaNeue, 10Product-Analytics, 10Readers-Web-Backlog, and 2 others: [Spike ??hrs] Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) [16:58:44] ottomata, I think I found the place to modify for the log4j issues [16:58:58] I made this change that switches from stderr to stdout: https://gerrit.wikimedia.org/r/#/c/operations/puppet/cdh/+/454318/ [16:59:08] but not sure how to test this at all... [16:59:25] do you have any suggestion for testing? [16:59:26] 10Analytics-Kanban, 10Patch-For-Review: Refresh SWAP notebook hardware - https://phabricator.wikimedia.org/T183145 (10Cmjohnson) [17:04:08] ottomata: I'll be some more min late. joining right after the current meeting finishes. ;) [17:05:41] ok we here! [17:07:01] ottomata: mind merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454308/ when you get a chance [17:18:04] done milimetric :) [17:18:12] mforns: will look, in meeting.... [17:18:23] np :] [17:19:16] ohhh interesting mforns....hm. [17:19:47] hm [17:19:53] (03PS3) 10Fdans: Add druid snapshot deletion script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/448551 (https://phabricator.wikimedia.org/T197889) [17:19:57] :] [17:20:43] it doesn't seem possible, from what I researched, to add this parameter to the spark-submit call [17:21:16] the only parameter that the spark-submit call accepts is extraJavaOptions, where you can specify another log4j config file [17:22:02] but I'm afraid that testing with another config file will not guarantee that the job works with the real puppet file... dunno [17:22:49] and I thought I might encounter problems when trying to use a custom config file, so that's why I ask [17:25:52] mforns: i think if you use a custom log4j file, setting there should override, but i'm not 100% certain [17:25:56] have you tried? [17:26:07] no [17:26:28] mforns: -Dsome.setting.blah should work, right? [17:26:36] from what I understand, if I use a custom config file, I have to repro all params from the puppet one [17:26:41] no [17:26:46] maybe? worth a try for at least testing to see if it does what you want? [17:27:28] the only -Dparam that spark-submit accepts to affect log4j is: [17:28:15] spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/apps/spark-1.2.0/conf/log4j.properties [17:28:37] pointing to the actual file [17:31:45] but I can test with a local config file, that is a copy of the puppet one, except for the stderr [17:31:57] aye [17:32:17] k, trying [17:48:56] (03CR) 10Milimetric: [C: 032] Fix interval bugs in time range selector (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/450063 (https://phabricator.wikimedia.org/T200497) (owner: 10Mforns) [18:00:15] thx! [18:00:23] mforns: i'm a little worried that changing it to stdout for all spark stuff might break things or make some things worse [18:00:32] ottomata, makes sense [18:00:32] like spark shell, etc. [18:00:44] not sure though, it could be fine [18:00:45] yeeaaaaa [18:01:19] if a script is calling spark and parsing its stdout output, then this would break stuff for sure [18:04:56] ottomata, I think the logger can be configured from within the code... [18:05:39] ottomata, https://stackoverflow.com/a/31409774 [18:06:47] hm, but this would only affect ELSanitization logs, not Refine logs... [18:06:54] unless we refactor everything... [18:06:56] oh cool, that works where -Dlog4j.configuration.file doesn't? [18:07:01] hmm right [18:07:17] hmm actually mforns it might not be hard to do.... [18:07:23] except that i guess it wouldn't change spark logs? [18:07:28] since I use the LogHelper class [18:07:31] pretty much everywhere in Refine [18:07:44] ottomata, oh... [18:07:57] but yeah, spark logs.. [18:07:59] yeah [18:08:16] -Dlog4j.configuration.file didn't work for spark logs? [18:08:38] ottomata, I've tried many variations without success [18:08:55] the log4j.properties file might need to be inside the jar... [18:09:22] ottomata, there's also the other quick hack, that would make this work... [18:10:53] mforns: wrapper script? [18:10:57] actually mforns when this usually runs [18:11:01] all of the applicaiton logs will be in yarn [18:11:02] 2>/dev/null && if [ $? -ne 0 ] then; echo error; fi [18:11:06] not on the CLI [18:11:45] most of what will be in output will be just 'aplication status: RUNNING' over and over again [18:11:57] since it runs in cluster mode, right? [18:12:00] yes [18:12:18] a single output line is enought to trigger a false alarm email no? [18:12:19] so yeah, waht you say could be fine, especially if you could include the yarn application id in the error you echo if there is a problem [18:12:27] yes [18:12:28] aha [18:12:43] so maybe you could do something like [18:14:10] oh, mforns why not just write to a log file? [18:14:14] redirect i mean [18:14:16] are you already doing that? [18:14:34] no, but then how to determine whether the job failed and send an email? [18:14:35] 2> /var/log/refinery/refine_sanitize.log and then your if echo error [18:14:56] ah, yes, there's already a log file [18:15:09] mforns: btw, if refine fails, it sends an email itself [18:15:12] but logs go to both log file and stderr [18:15:13] rather than letting cron do it [18:15:28] yes, but the problem is when refine has not started [18:15:36] ah rigiht [18:15:46] yeah i guess your && echo error thing will work [18:15:47] like, an error while parsing the whitelist, which is what happened last week [18:16:04] ok, will try that [18:16:13] if [ $? -ne 0 ] then; echo "Refine Sanitization failed at $(date), please check /var/log/refine/..."; fi [18:16:24] makes sense [18:16:52] thank yooooouu :] [18:17:13] 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics: No recent data in the Edit event log in the Data Lake - https://phabricator.wikimedia.org/T202348 (10Neil_P._Quinn_WMF) >>! In T202348#4519469, @Ottomata wrote: > Yes, the 'Edit' schema is blacklisted from Hive Refinement,... [18:17:20] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 7 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Krinkle) Following another IRC chat, @Nuria said that two randomId does indeed suffice,... [18:29:37] (03CR) 10Joal: "Pushing typo correction now." (032 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/454243 (https://phabricator.wikimedia.org/T201617) (owner: 10Joal) [18:30:07] (03PS2) 10Joal: Update Wikistats2 top and per-editors/edited-pages [analytics/aqs] - 10https://gerrit.wikimedia.org/r/454243 (https://phabricator.wikimedia.org/T201617) [18:34:58] 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics: No recent data in the Edit event log in the Data Lake - https://phabricator.wikimedia.org/T202348 (10Ottomata) https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/job/refine.pp#... [18:36:40] (03PS7) 10Milimetric: Annotate wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705) [20:04:55] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 7 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Jdlrobson) a:05Niedzielski>03Jdlrobson [21:30:25] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 7 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Tbayer) >>! In T201124#4517544, @Nuria wrote: > >>Again, this doesn't work for the inte... [21:40:07] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 7 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Nuria) > but because it leaves a bit more of a buffer in case a browser's implementation... [22:39:18] 10Analytics, 10EventBus, 10Product-Analytics, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), and 2 others: Load change tags into the Analytics Data Lake on a daily basis - https://phabricator.wikimedia.org/T201062 (10Pchelolo) After today's deploy of group 0 I've tested some tag adding/re...