[07:06:46] Morning elukey :) https://blog.toggl.com/lightbulb-cartoon-developers/ [07:12:10] 10Analytics, 10Patch-For-Review, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10JAllemandou) >>! In T172532#4554063, @elukey wrote: > The good thing is that all the crons become systemd units, that have their own journald configuration (... [07:12:47] ahahha nice :) [07:16:13] hi! [07:16:55] what's the meaning of prefixes (eqiad/codfw) in kafka topics? [07:17:14] dcausse: I'll let elukey onfirm, but I think it;s the producing DC [07:18:00] e.g. if on main-codfw I see a topic eqiad.xyz it means data was produced from eqiad to a kafka broker in codfw? [07:18:21] joal: ok thanks! [07:18:38] dcausse: I think some topics are repliated cross-dc [07:19:11] ok [07:19:24] so it can be that it was mirrored [07:19:53] dcausse: For analytics purposes, everything is mirrored to jumbo (eiqad only) - On main clusters however, I think eqiad is mirrored to codfw and vice-versa, with naming conventions to prevent infinite loops of mirroring [07:20:16] elukey: I am reasonably not incorret? --^ [07:28:17] yep :) [07:28:27] (sorry I was afk) [07:28:55] we use mirror maker as you describe to replicate topics between the main clusters cross dc (that otherwise would have the same name) [07:29:55] elukey: so in a given DC it's impossible to consume/produce to another DC kafka broker [07:29:59] mirror maker must be used [07:30:21] dcausse: feasible - We do that for varnish for instance [07:30:54] dcausse: the reason we locally-produe and let mirrormaker do the synchronization is to prevent dataloss in case of split [07:31:04] ok [07:31:07] dcausse: or at least it's my understanding [07:34:04] dcausse: in theory you can but it wouldn't be good in terms of dc failover policies [07:34:16] for example, eventbus in codfw produces only to kafka main-codfw [07:34:24] and then mirror maker replicates in eqiad [07:35:00] in case of a failover between eqiad and codfw, we have a system that it is transparently able to handle the new state [07:35:35] dcausse: any specific use case that you have in mind? [07:35:35] elukey: sure, I don't really want to do cross-DC stuff, I was wondering on how to determine where the data come from when I see eqiad-xya in main-codfw, so it's very likely mirror make [07:35:59] it is surely mirror maker in this case [07:36:04] thanks [07:36:08] np :) [07:45:03] elukey: do you have a magic command to extract the offset of a consumer group out of zookeeper? [07:47:40] dcausse: the offset is stored in kafka itself [07:47:48] in a special topic [07:48:04] only the old consumers were using zookeeper for offset commit [07:48:04] oh ok [07:48:27] so IIRC with python it is easy to get it, buuuut there should be a easy command [07:49:37] elukey: sure, thanks, don't lose time on this, I'll figure this out :) [07:50:31] do you have a specific consumer group in mind ? [07:51:06] ah wait! [07:51:09] I am stupid [07:51:25] https://grafana.wikimedia.org/dashboard/db/kafka-consumer-lag?orgId=1 [07:51:51] elukey: I just wanted to make sure that a consumer group is active [07:52:22] just by seeing if offsets are changing [07:52:25] ahh okok then the above graphs (the one at the bottom) is good [07:52:29] kafka_burrow_partition_current_offset{exported_cluster="$cluster",topic=~"$topic",group=~"$consumer_group"}[5m] [07:52:37] this is the prometheus query to know exactly the offset [07:52:56] but, with the graph in the dashboard, you can see the rate of offset commit for any cgroup [07:53:08] so it is easy to spot active vs non active consumers [07:53:17] elukey: oh that is perfect! thanks! :) [07:53:25] super :) [07:55:29] 10Analytics, 10Patch-For-Review, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10elukey) There are still some open questions, namely: 1) unit logging 2) alarming to us when something break [08:13:05] joal: re systemd timers [08:13:35] are are some things that might need some changes if we want to keep going with that approach [08:14:08] for example, atm there is a unit that executes the following [08:14:10] ExecStart=/srv/deployment/analytics/refinery/bin/refinery-dump-status-webrequest-partitions --hdfs-mount /mnt/hdfs --datasets webrequest,raw_webrequest etc.. [08:14:40] now atm with cron if the script emits any stdout, then we get an email, that it is our alert [08:15:12] the script returns always 0, so the exit code doesn't reflect any problem [08:15:21] so for systemd, the script's execution was "ok" [08:15:29] and the unit will not fail/alarm [08:16:29] ( Main PID: 10433 (code=exited, status=0/SUCCESS)) <--- this is the last execution [08:27:23] now in theory there is a way to send emails when a unit fails [08:27:29] there is a OnFailure option [08:28:48] elukey: this is the main diff I see in the change indeed (with cron, stdout = error, while with SystemD we need proper return codes - Which is way bettrer by the way) [08:29:03] it is yes [08:56:56] ah icinga-vm is still not here [08:56:57] uffff [08:57:42] I "injected" a exit 1 in the check partitions script [08:57:43] elukey@analytics1003:~$ /usr/local/lib/nagios/plugins/check_systemd_state [08:57:47] CRITICAL - degraded: The system is operational but one or more units failed. [08:57:53] this translates to [08:57:54] PROBLEM - Check systemd state on analytics1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed [08:58:18] elukey: This seems violent of an alarm for a check failing :D [08:58:44] violent? [08:59:10] Meaning when I read an alarm like that, I have the feeling the host is dying! [09:00:43] ahhhhh [09:00:51] nothing major though :) [09:43:28] Wow - This is heavy - https://globenewswire.com/news-release/2018/09/04/1564650/0/en/data-Artisans-Introduces-Industry-s-First-Serializable-ACID-Transactions-Directly-on-Streaming-Data.html [09:45:54] In my mind, this --^ means streaming-applications will be responsible for global states. Let's imagine: You push a revision-create event into a stream, and the revision-table maintained by the streaming system is updated in an ACID way [09:46:19] This is particularly relevant for Andrew when he wakes-up :) [09:48:54] And the actual origianl post: https://data-artisans.com/blog/serializable-acid-transactions-on-streaming-data [09:50:31] very nice, will read those later :) [09:51:00] so still no luck to add icinga-vm back in here [09:51:14] I added a cry for help in https://phabricator.wikimedia.org/T202314 [09:51:19] :-( [10:17:21] ah!!! --^ [10:18:14] (if you don't have all the chan's logs: icinga-wm (~icinga-wm@einsteinium.wikimedia.org) to #wikimedia-analytics) [10:18:32] so I added a ban exemption for everything coming from einstsenium [10:18:38] *einstenium, where icinga lives [10:18:47] and then restarted ircecho [10:23:48] PROBLEM - Check systemd state on analytics1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:23:52] yesssssss [10:24:01] * elukey dances [10:24:42] I forced one script to exit 1 [10:25:58] RECOVERY - Check systemd state on analytics1003 is OK: OK - running: The system is fully operational [10:28:49] 10Analytics, 10Patch-For-Review, 10User-Elukey: Refactor analytics cronjobs to alarm on failure reliably - https://phabricator.wikimedia.org/T172532 (10elukey) I had to add a ban exemption to #wikimedia-analytics (more info T202314), and after forcing a exit 1 to one of the new systemd timer units I got this... [10:47:28] * elukey invite fdans #wikimedia-analytics [10:50:31] elukey: I'm in! [10:50:35] there you go [10:50:35] :) [10:50:39] yayyy irc sucks!! [10:50:43] elukey: can you op me? [10:51:07] you have powa now [10:51:16] graaazie I will use my powers exclusively for evil [10:51:30] thank you for your time elukey [10:51:49] going afk for a bit! Sorry need to run errand, will try to bb asap [11:14:44] (03PS8) 10Fdans: Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) [11:26:04] (03PS9) 10Fdans: Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) [11:27:26] (03CR) 10jerkins-bot: [V: 04-1] Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) (owner: 10Fdans) [11:30:27] * fdans is getting close to throwing the laptop through the window [11:32:24] (03PS10) 10Fdans: Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) [11:34:09] (03CR) 10jerkins-bot: [V: 04-1] Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) (owner: 10Fdans) [11:47:54] (03PS11) 10Fdans: Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) [11:55:46] (03PS12) 10Fdans: Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) [11:57:07] (03CR) 10jerkins-bot: [V: 04-1] Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) (owner: 10Fdans) [12:01:08] (03PS13) 10Fdans: Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) [12:01:21] this is the good one (sorry for the flood) [12:25:24] (03CR) 10Fdans: [C: 032] Add strategy, usability and advisory sites to pageview definition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) (owner: 10Fdans) [12:32:28] back! [12:32:33] all good with irc? [12:32:48] elukey: for me yes! [12:33:18] it is weird though [12:33:39] fdans: can you provide to get a cloak during the next days? [12:34:34] elukey: I gueeeess, but I don't think it's related [12:34:46] nuria didn't get banned and she doesn't have a cloak [12:40:17] we all should have [12:40:38] but it is strange that you got banned even if your nick is registered [12:41:36] also fdans, let's op only when needed [12:49:42] anyhow, good career today for fdans.. from banned to op :D [12:50:09] elukey: but now I have to relinquish the power [12:50:44] fdans: you can re-op no? [12:50:51] it is like being spiderman [12:51:17] elukey: can't [12:51:37] wq [12:52:37] weird I though you had the possibility to +o now [12:52:43] how are you trying to op? [12:52:46] via chanserv? [12:55:13] joal: elukey any opposition to deploying source and the cluster? [12:56:10] nope [12:57:14] nope [12:58:00] fdans: how are you trying to op? [12:58:10] via chanserv? [12:58:17] elukey: sorry, i do /op fdans [12:58:55] fdans: can you try "/msg chanserv op #wikimedia-analytics fdans" ? [12:59:09] msg chanserv op #wikimedia-analytics fdans [12:59:12] goddammit [12:59:14] ahahha [13:00:00] elukey: https://usercontent.irccloud-cdn.com/file/UlaLowF1/Screen%20Shot%202018-09-04%20at%202.59.37%20PM.png [13:00:23] #movetoslack [13:01:18] slack is the same thing, it is only a matter of finding how to grant you +o permanently :) [13:04:12] weird [13:11:27] elukey: I mean I've never been banned from a slack channel "just because", with no clear indication of what caused it [13:15:29] also I've no idea what was said in this channel between 9pm yesterday and 12:30pm today, since irccloud only saves chats when you're in the channel (afk or not) [13:17:09] you know that we have records for this chan right? [13:19:00] http://bots.wmflabs.org/~wm-bot/logs/ [13:19:04] fdans: --^ [13:19:33] plus being "banned" itsef it is now a flaw of IRC, but something was triggered by whoever manages it [13:19:42] slack is only a new fancy version of IRC [13:19:44] nothing more :) [13:19:50] you can do damages in there too [13:20:27] ottomata: o/ [13:20:28] * joal whispers to elukey: don't feed the troooooool! :) [13:20:29] hellooo [13:21:00] ottomata: whenever you have time can we review analytics' labs instances? I'd need to nuke some of them to spin up new ones :) [13:23:21] elukey: no it's not (as someone who's used both a lot), and we can leave this conversation for the offsite, but irc as an enterprise communication tool is wildly insufficient [13:23:42] joal: let's not confuse trolling with a sincere opinion :) [13:24:25] fdans: Please accept my excuses, I was joking on an already-well-spoken topic :) [13:26:24] joal: no problem ;) [13:26:31] fdans: trust me that it is, we are functioning really well even without slack and other things [13:27:45] if you say that it is more fancy and nice to use for things like gifs etc.. I can absolutely agree, but the core of things that a chat system needs to do are there [13:28:37] ok whatever [13:29:09] very constructive [13:29:22] anyhow, we can chat in person as you said above [13:29:32] :) [13:33:06] (03PS2) 10Joal: Add python script importing xml dumps onto hdfs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/456654 (https://phabricator.wikimedia.org/T202489) [13:33:29] joal: I am trying to add some exit 1 to refinery-dump-status-webrequest-partitions but it is not that easy [13:33:50] elukey: it is not, indeed! [13:34:12] also there is [13:34:13] error() { echo "Error" "$@" >&2 exit 1 [13:34:13] } [13:34:46] (03CR) 10Joal: "Currently being tested on the cluster" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/456654 (https://phabricator.wikimedia.org/T202489) (owner: 10Joal) [13:37:32] elukey: Would the "HAS_FAULTY" variable be of ise? [13:37:37] s/i/u [13:39:52] elukey: ya nuke anything away! i've been messing with hadoop-worker-1 (presto stuff) but it can be nuked [13:40:11] i have to run home real quick (gotta move car from where it is on the way), so i'll be back shortly and we can discuss anythingngggg [13:41:29] as for kafka stuff, also nuke as needed [13:41:43] haven't used them in a while, its nice to have 2 running kafka clusters for mirror maker testing, but we can recreate anything at any time [13:42:01] also, btw, i'm into these systemd timers :) [13:43:09] \o/ [13:43:11] (03PS1) 10Fdans: Update changelog.md for v0.0.73 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/457902 [13:43:15] I requested moar capacity for labs [13:43:38] (03CR) 10Fdans: [V: 032 C: 032] Update changelog.md for v0.0.73 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/457902 (owner: 10Fdans) [13:43:40] joal: I was checking that one as well, might be good! [13:45:52] elukey: great, i was running into that problem too [13:45:56] ok be back shortly! [13:46:51] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Tbayer) [14:04:48] joal: I am wondering if next q could be good for druid 0.12 [14:05:05] there is http://druid.io/docs/0.12.0-rc1/development/extensions-core/druid-basic-security.html that it is not bad [14:05:24] from https://github.com/apache/incubator-druid/releases they seem to have release 0.12.2 [14:08:38] elukey: why not ! [14:08:47] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Tbayer) [14:17:50] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Tbayer) a:05Tbayer>03None We did further checks of the token in PageIssues and Readi... [14:23:53] 10Analytics, 10Analytics-Data-Quality, 10Contributors-Analysis, 10Product-Analytics: Resume refinement of edit events in Data Lake - https://phabricator.wikimedia.org/T202348 (10Halfak) FYI: Here's the proposal I wrote a while ago: https://meta.wikimedia.org/wiki/Schema_talk:Edit#A_proposal_I_wrote_a_while... [14:28:22] !log deployed refinery source using jenkins [14:28:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:31:39] (03PS1) 10Fdans: Bump up jar version in webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/457917 [14:33:43] (03PS2) 10Fdans: Bump up jar version in webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/457917 [14:34:31] joal can you give your blessings to this? ^ :) [14:51:45] 10Analytics, 10Operations, 10vm-requests: eqiad (1) - VM request for Piwik/Matomo - https://phabricator.wikimedia.org/T202963 (10akosiaris) @elukey: Totally doable disk wise. Network wise, bohrium is at private1-a-eqiad, so no analytics network. Just making sure :D Can I also assume we will be deleting bo... [14:54:06] ottomata: wdyt about hosting piwik/matomo on analytics-tool1004? (new VM to request, the new bohrium basically) [14:54:11] rather than something like piwik1001 [14:54:19] (or maybe better, matomo1001 [14:54:24] (sounds weird) [15:00:58] ping joal [15:01:18] on the phone, will be minutes late [15:01:42] (03CR) 10Ottomata: [V: 032 C: 032] Bump up jar version in webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/457917 (owner: 10Fdans) [15:02:04] 10Analytics, 10Analytics-EventLogging, 10Research: 20K events by a single user in the span of 20 mins - https://phabricator.wikimedia.org/T202539 (10Nuria) [15:02:35] analytics-tool1004 into it, but only if we can put in analytics vlan [15:02:37] elukey: ^ [15:03:26] ottomata: for bohrium? is it needed? [15:03:32] or for consistency [15:03:33] ? [15:04:07] for consistency i think, right? [15:04:18] would be confusing to have to note that 1001-1003 and all other analytics* hosts are in the analytics vlan [15:04:21] but analytics-tool1004 is not [15:04:56] yes yes, even if I'd prefer to keep matomo/bohrium outside the vlan.. [15:05:22] maybe this is a different use case, matomo1001.eqiad.wmnet could be good ? [15:05:30] +1 elukey [15:05:36] ack thanks :) [15:06:48] fdans: Was getting kids from school - sorry :( [15:08:01] joal: I should know by now :) [15:08:21] fdans: I could also warn - I lost my habit of doing so - will try to think about it [15:35:07] 10Analytics, 10Operations, 10vm-requests: eqiad (1) - VM request for Piwik/Matomo - https://phabricator.wikimedia.org/T202963 (10elukey) >>! In T202963#4556071, @akosiaris wrote: > @elukey: Totally doable disk wise. > > Network wise, bohrium is at private1-a-eqiad, so no analytics network. Just making sure... [15:35:24] 10Analytics, 10Analytics-Kanban: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10Ottomata) FYI, this is scheduled to happen tomorrow Wed Sept 5 at about 13:30 UTC [15:44:48] joal: so after running fsck on / (corrected a lot of issues) and reboot, aqs1004 seems fine [15:44:51] really weird [15:45:06] elukey: mybe we;ll have disk failure one of those days? [15:47:08] smartctl does not show any sign of distress [15:47:12] maybe yes :( [15:50:58] elukey: That's good at least that the thing fixed [15:54:52] (03CR) 10Ottomata: "Couple nits, and a Q that could probably be cleared up with more docs:" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/456654 (https://phabricator.wikimedia.org/T202489) (owner: 10Joal) [16:02:14] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Add AQS endpoint providing top editors and top pages (by number of edits, by net-bytes-diff and abs-bytes diff) - https://phabricator.wikimedia.org/T201617 (10Nuria) [16:04:29] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Reading-analysis: Assess impact of ua-parser update on core metrics - https://phabricator.wikimedia.org/T193578 (10fdans) I've stored the two jupyter notebooks I've used in github: https://github.com/fdansv/ua_compare_notebooks Here are the increases... [16:06:19] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: [EL sanitization] Retroactively sanitize (including hash and salt appInstallId fields) data in the events database - https://phabricator.wikimedia.org/T199902 (10Nuria) [16:07:11] !log beginning refinery deployment [16:07:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:12:04] fdans: remember that we have a new archiva host, in theory everything should go fine but lemme know if you see any weirdness [16:12:29] elukey: so far everything's peachy [16:13:06] gooood [16:15:08] 10Analytics, 10Analytics-Kanban, 10Research: Automate XML-to-parquet transformation for XML dumps (oozie job) - https://phabricator.wikimedia.org/T202490 (10JAllemandou) [16:33:26] ottomata: do you have a minute on the python thing? [16:37:27] !log restarting webrequest-load bundle [16:37:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:38:52] Gone for diner team - will be back after [16:39:52] joal: yup! [16:39:57] ohhh missed ya sorry! [16:45:50] joal: so.... do you want me to deploy RB with your patch now-ish? [16:48:20] (03PS1) 10Ottomata: Add --table-whitelist flag to EventLoggingSanitization job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/457949 [16:49:24] joal,ottomata FYI I just repooled aqs1004 after another reboot for kernel+openjdk upgrade [16:49:28] it looks fine so far [16:49:33] coo [16:50:41] 10Analytics, 10security-team-backlog: Establish a process to periodically review and approve access for hadoop/hue users - https://phabricator.wikimedia.org/T121136 (10Bawolff) [16:52:05] (03CR) 10Nuria: [C: 031] "Let's please document in wikitech that this "reduction" of tables to refine can be done." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/457949 (owner: 10Ottomata) [16:55:05] oh nuria we cancelled staff? [16:55:12] ottomata: yes [16:55:15] oh ok [17:08:55] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10ovasileva) a:03Niedzielski [17:09:27] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Niedzielski) @niedzielski to write the docs. [17:13:27] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Jdlrobson) [17:13:44] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Jdlrobson) Copied remaining A/C to T203013 [17:14:14] (03CR) 10Ottomata: [V: 032 C: 032] "K, just tested this to rerun a sanitize job in prod, works great." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/457949 (owner: 10Ottomata) [17:14:56] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Niedzielski) 05Open>03Resolved @niedzielski to go ahead and update docs as part of T... [17:35:59] elukey: still around? [17:36:03] yep! [17:36:28] I was trying to decypher your emails about spark refine and failed miserably [17:36:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: RFC: Modern Event Platform: Schema Registry - https://phabricator.wikimedia.org/T201643 (10Ottomata) @kchapman I'm not totally familiar with the RFC process. What's next? Does this need to go through another RFC IRC meet... [17:36:54] I somehow thought that those failed jobs wouldn't have needed a re-run, so I didn't act on them :( [17:37:10] need to familiarize with failure scenarios and how to fix them [17:38:28] elukey: sorry irc closed for a min, got a sec to brain bounce presto puppetization? [17:39:09] sure [17:40:03] in bc [17:51:36] going of! [17:51:38] *off [17:51:39] * elukey off! [18:02:26] fdans: did you deployed cluster? [18:02:55] fdans: ah yes, i see, did you restarted jobs or are you waiting for joseph? [18:03:40] nuria: yea bundle is restarted [18:03:56] fdans: we got a warning of a pgeview that is not in whitelist [18:04:15] fdans: did you updated whitelist (you know other than code change it needs to be feed into live table) [18:04:25] fdans: ahem... hopefully that made sense [18:04:58] yeayea, i need to ad advisory nuria i think [18:05:12] fdans: k [18:24:54] 10Analytics, 10security-team-backlog: Establish a process to periodically review and approve access for hadoop/hue users - https://phabricator.wikimedia.org/T121136 (10chasemp) [18:54:25] hallo nuria , milimetric . I'm trying to figure out https://wikitech.wikimedia.org/wiki/Analytics/Systems/Dashiki . [18:55:28] It says to edit https://meta.wikimedia.org/wiki/Dashiki:CategorizedMetrics , but it's a bit strange: I don't see any reference to https://meta.wikimedia.org/wiki/Config:Dashiki:Interlanguage there. [18:56:22] There's a section called "Compact Language Links" there, and it was added by me, but I don't see how is it related. [18:59:02] omg I've asked j.oal about deployment and then completely forgot to check the answer [19:00:13] do you need the new endpoins exposed right now or tomorrow is good? [19:04:50] Hi Pchelolo - Was gone for diner [19:05:00] Pchelolo: no rush, tomorrow is good :) [19:05:41] Restart cassandra-hourly-wf-local_group_default_T_pageviews_per_project_v2-2018-9-4-14 [19:05:46] !log Restart cassandra-hourly-wf-local_group_default_T_pageviews_per_project_v2-2018-9-4-14 [19:05:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:06:29] joal: ok, promise tomorrow [19:06:35] Thanks Pchelolo :) [19:06:41] I thought you were away [19:06:49] I was ! [19:08:51] ottomata: I second Luca in the idea of pushing forward the procedure to apply in case of refinement failure - It's not difficult to find from the OnCall page, so it should just be a matter of pushing for actions to be taken :) [19:09:25] joal: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Refine#Rerunning_jobs [19:09:31] Sanitize needs more docs from mforns i think [19:09:34] it was kinda hard for me to rerun that [19:09:37] ottomata: Indeed I found that [19:10:05] ottomata: meh :( [19:11:43] yeah [19:11:47] not the best, the command is very long [19:11:53] maybe we can make the wrapper script good for EL defaults [19:12:04] so you can provide fewer flags and it will fill in the good stuff for EL [19:12:05] ottomata: properties file ?? :) [19:12:11] haha, that would be great too actually! [19:12:14] we shoudl do that [19:12:14] right! [19:12:29] i had a wip patch to to do that but got caught in the manywaystodothingsinscalaa/javaland [19:12:31] ottomata: I can help :) I have been suggesting that for months :) [19:12:32] what was the good one? [19:12:47] typesafe? [19:12:49] i think [19:14:06] ottomata: why not :) [19:14:20] ottomata: we could also write our wrappers, but since those ones exist [19:14:51] ottomata: also, can I ask you for a minute in da cave about the python importer? [19:15:54] yes joal coming [19:31:53] 10Analytics: Upgrade Hive to ≥1.3 or ≥2.1 - https://phabricator.wikimedia.org/T203498 (10mpopov) [19:45:58] 10Analytics: Upgrade Hive to ≥1.3 or ≥2.1 - https://phabricator.wikimedia.org/T203498 (10mpopov) [21:09:26] aharoni: please look at https://wikitech.wikimedia.org/wiki/Analytics/Systems/Dashiki#Test_your_dashboard_config_locally [21:10:04] aharoni: ping us once you have a local instance that works to see the testing configs files, once that is done you will be able to test your config locally [21:12:49] nuria: thanks, I'm trying to install it now. (I used to have it installed, but I had to reinstall my laptop recently...) [21:13:15] aharoni: ok, ping me when you can see things [21:17:13] (03PS1) 10Fdans: Add advisory and strategy sites to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458060 [21:17:41] nuria: can I get a +2 on that? ^ [21:18:13] fdans: did you look at error? we might also need https://fixcopyright.wikimedia.org/wiki/Main_Page [21:19:47] nuria: hmmmm for now advisory and strategy are the only ones being complained about since they are not in the whitelist [21:20:13] fdans: funny i would imagine that one would show too if it is fronetned by varnish [21:20:27] nuria: nope because it's not being counted as a pageview [21:20:34] since it's not on the regex [21:20:40] >.< [21:20:50] fdans: ah cause regex doe snot match? [21:20:57] fdans: right right [21:21:01] we don't have fixcopyright in the regex [21:21:47] fdans: ya, i will add it once i figure out if that wiki is frontend by varnish [21:22:38] nuria: it probably does since it's part of the sitematrix, but that regex doesn't scale well [21:23:14] fdans: that regex is a total pain [21:23:21] in terms of having to deploy source + the cluster to add a wikimedia site [21:24:30] fdans: that happens only with the ones not named en-blah.blah.org [21:25:08] nuria: yeah en-blah.wikimedia.org, right? [21:27:15] fdans: right [21:27:54] nuria: can't we use the sitematrix to validate pageviews? [21:28:27] (excluding the private wikis, which are specified in the sitematrix) [21:29:34] fdans: sure, if there was a programatic way to do it that did not included an http request [21:29:46] (03CR) 10Nuria: Add advisory and strategy sites to whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458060 (owner: 10Fdans) [21:34:43] nuria: I'm supposed to clone dashiki, npm install, install gulp, and gulp build, right? [21:34:55] When I do `cd semantic && gulp build`, I see: [21:34:59] aharoni: to test your config locally yes [21:35:02] [ERROR] you need --layout and --config parameters to build [21:36:21] fdans: if you add new site and update commit message we can merge that [21:40:49] nuria: should I do it? I know nothing about gulp [21:40:54] aharoni: updated docs see if they make more sense: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Dashiki#Test_your_dashboard_config_locally [21:47:09] nuria: mmm... not sure... what's supposed to be the sequence? 1. clone dashiki. 2. npm install 3. sudo npm install -g gulp [21:47:53] and then `npm install -g bower` and `bower install`? [21:49:01] aharoni: did you follow sequence on wiki? [21:49:43] I think so, but `bower install` gives me: [21:49:45] bower ENOENT No bower.json present [21:49:52] and this doesn't look right [22:00:27] aharoni: let me try to repro [22:04:30] (03PS2) 10Fdans: Add advisory and strategy sites to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458060 [22:06:51] aharoni: ya, i think our setup of dashiki needs an update to work well, do send us a ticket about your dashboard and we will get to it. [22:07:53] (03CR) 10Nuria: Add advisory and strategy sites to whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458060 (owner: 10Fdans) [22:11:55] 10Analytics, 10Analytics-Dashiki, 10CX-analytics, 10Language-2018-July-September: Setup Config:Dashiki:CX2Translations as a public chart and update the Dashiki documentation accordingly - https://phabricator.wikimedia.org/T203516 (10Amire80) p:05Triage>03High [22:13:39] (03PS3) 10Fdans: Add advisory, fixcopyright and strategy sites to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458060 [22:16:00] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Reading-analysis: Assess impact of ua-parser update on core metrics - https://phabricator.wikimedia.org/T193578 (10Nuria) > What percentage of global human pageviews (i.e. those with agent_type = 'user', a core metric we report to the board on a month... [22:16:06] (03CR) 10Nuria: [V: 032 C: 032] Add advisory, fixcopyright and strategy sites to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/458060 (owner: 10Fdans) [22:17:16] CindyCicaleseWMF: yt? [22:18:44] CindyCicaleseWMF: will be here for a bit, ping me when you get back [22:24:12] nuria: yes, I'm back. [22:24:23] CindyCicaleseWMF: let me see if you can see this url [22:24:46] CindyCicaleseWMF: https://bit.ly/2M0lzwB [22:24:52] CindyCicaleseWMF: login is lDAP [22:24:58] CindyCicaleseWMF: sorry LDAP [22:25:20] Yes, I can! [22:25:53] Nice! [22:26:03] CindyCicaleseWMF: ok, that url is sampled requests to 1/128 so about 1/100 [22:26:27] The landing page is also now live in beta: https://meta.wikimedia.beta.wmflabs.org/wiki/Fix_copyright [22:26:50] CindyCicaleseWMF: i see, then two things [22:27:15] CindyCicaleseWMF: when you launch tomorrow you can use that url to see traffic (will be delayed <2 hrs) no iyt is almost-real-time [22:27:29] CindyCicaleseWMF: the beta page will alllow us to see vents one sec [22:27:32] *events [22:27:43] excellent [22:28:21] nuria: to be sure I understand, geocoded data is stored for all visits, but the graph only samples 1/128? [22:28:36] CindyCicaleseWMF: that is not eventloggin, thsoe are requests to varnish [22:28:47] CindyCicaleseWMF: as in traffic [22:29:17] ok, got it [22:29:20] CindyCicaleseWMF: geocoded data for eventlogging is not visible through that UI yet [22:29:47] CindyCicaleseWMF: but even if you do not have pageviews tomorrow that url will help you quantify traffic [22:29:59] got it [22:30:11] CindyCicaleseWMF: your page in beta uisending events [22:30:16] * is sending [22:30:33] CindyCicaleseWMF: those events will be visible here (once site goes live) [22:30:47] great [22:31:07] CindyCicaleseWMF: https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&var-schema=MediaWikiPingback [22:31:18] CindyCicaleseWMF: change name of schema , i used pingback for example [22:31:39] CindyCicaleseWMF: this last url will help you measure traffic going to your schema , makes sense? [22:31:50] CindyCicaleseWMF: just like you see it there for pingback [22:32:17] makes sense [22:32:22] CindyCicaleseWMF: which is sending couple events per sec [22:33:13] CindyCicaleseWMF: then , if you launch and you do not see any events on that graph for your schema.. mmm something is a miss, makes sense? [22:33:42] yes, makes sense for the EUCCStats eventlogging schema [22:33:51] the one bit I'm still missing is how we correlate the uselang query parameter w/ geocoded page views, since we no longer are using the second eventlogging schema (EUCCVisit) [22:34:23] that is, when somebody visits the page, we want to know what country they are visiting from and what language they are using [22:35:25] CindyCicaleseWMF: I see, you are not using schema entirely or you removed country (country will always be imprecise from client side) [22:35:56] CindyCicaleseWMF: you could use still use schema but not including the country [22:36:03] CindyCicaleseWMF: just the url bit [22:36:29] CindyCicaleseWMF: is that still an option (you can get this data other ways, this would just be easier) [22:36:55] ok, that makes sense [22:37:05] nuria: sorry, I've got to run - thanks for the help [22:37:13] CindyCicaleseWMF: k [23:13:29] 10Analytics, 10Analytics-Kanban: Clickstream dataset for Persian Wikipedia only includes external values - https://phabricator.wikimedia.org/T191964 (10Nuria) What I think is happening: the ulr decoded farsi text from internal referrers does not match (encoding?) any page by name and thus those internal referr...