[00:27:28] 10Analytics, 06Community-Tech: Investigation: How can we improve the speed of the popular pages bot - https://phabricator.wikimedia.org/T164178#3230233 (10MusikAnimal) Another question: Would querying the database for page assessments be faster than going through the API? With the latter I see we have to loop... [00:31:45] 10Analytics, 06Community-Tech: Investigation: How can we improve the speed of the popular pages bot - https://phabricator.wikimedia.org/T164178#3230239 (10MusikAnimal) >>! In T164178#3230233, @MusikAnimal wrote: > I ran a query on Biography (the biggest WikiProject I think), and it took around 1.5 minutes to f... [00:36:34] 10Analytics, 10Analytics-Dashiki, 06Analytics-Kanban: Compare layout doesn't handle files with non daily resolution - https://phabricator.wikimedia.org/T164335#3230273 (10Nuria) [00:41:14] (03CR) 10Nuria: "This fixes dataset api, the compare layout had a bug prior that was silently failing and now appears more obviously. I have filed bug here" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/351210 (owner: 10Nuria) [00:43:09] 10Analytics, 10Analytics-Dashiki: Compare layout doesn't handle files with non daily resolution - https://phabricator.wikimedia.org/T164335#3230311 (10Nuria) [00:49:52] 10Analytics, 06Community-Tech: Investigation: How can we improve the speed of the popular pages bot - https://phabricator.wikimedia.org/T164178#3230341 (10kaldari) If we assume there are 1.5 million biography articles, collecting all the articles from the API would take 1500 API requests. If we assume each req... [03:11:56] 06Analytics-Kanban, 13Patch-For-Review: Update pivot to latest source - https://phabricator.wikimedia.org/T164007#3230408 (10Ottomata) A good idea! I bet we could! [08:08:41] very interesting results from the mysql slow-log on bohrium [08:09:13] from what I can see the archive cron that runs every 30 mins causes spikes in latency for all the reqs, INSERT included [08:12:04] mmmm but I can see that the cron goes only every day (14400 seconds) [08:12:55] !log set Piwik archive cron on bohrium to run every 3600s (rather than 14400) [08:12:58] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:16:03] !log removed 2>&1 from the Piwik cron archive script [08:16:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:20:29] !log GRANT FILE on *.* to piwik@localhost executed on bohrium (https://piwik.org/faq/troubleshooting/#faq_194) [08:20:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:32:34] !log added "adapter=MYSQLI" to config.ini to enable LOAD FILE capabilities on piwik (restarted apache2) [08:32:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:57:00] 10Analytics: Investigate the use of local_quorum for AQS - https://phabricator.wikimedia.org/T164348#3230766 (10elukey) [09:00:59] 10Analytics: Investigate the use of local_quorum for AQS - https://phabricator.wikimedia.org/T164348#3230798 (10elukey) [09:24:51] 10Analytics, 10Analytics-General-or-Unknown: Provide regular cross-wiki reports on flagged revisions status - https://phabricator.wikimedia.org/T44360#3230844 (10Nemo_bis) Such a tool should only query the FlaggedRevs wikis' API for available statistics (also to be consistent with what the users see on their w... [10:16:33] will make my day for sure: http://theoatmeal.com/comics/believe [10:32:51] joal: read it yesterday, it's nice when oatmeal makes these long, beautiful strips ^^ [10:33:10] it is ! [10:37:05] * elukey reads [10:38:04] elukey: --> until the end ;) [10:38:20] (03PS1) 10Joal: Update daily unique devices druid loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/351613 [10:41:14] wow [10:41:40] I laughed a lot reading Core belief: God created velociraptors [10:41:46] :D [10:42:02] didn't know these strips, thanks! [11:00:29] * elukey lunch! [11:48:39] taking a break a-team [12:01:04] * fdans lunch! [12:12:35] 06Analytics-Kanban, 15User-Elukey: Metrics and Dashboards for Piwik - https://phabricator.wikimedia.org/T163204#3231264 (10elukey) Created https://grafana.wikimedia.org/dashboard/db/piwik and added bohrium's host metrics. Next step is to have Apache and possibly Mysql metrics. [12:34:13] so from https://grafana.wikimedia.org/dashboard/db/piwik?orgId=1&from=now-7d&to=now it seems that IOPs dropped [12:34:19] after May 1st [12:34:31] and I didn't do much on that day since I was on vacation :D [12:52:01] this is definitely a WTF [12:52:21] iowait patters changed too [12:54:36] I don't like when things improve without an explanation :D [13:11:25] hello team :] [13:15:37] mforns: o/ [13:15:43] hello :] [13:15:47] thanks for letting me know about staff yesterday marcel [13:16:06] np ;] [13:24:45] (03PS2) 10Mforns: [WIP] Add monthly sanitized job for banner activity [analytics/refinery] - 10https://gerrit.wikimedia.org/r/350219 (https://phabricator.wikimedia.org/T157582) [13:31:43] 06Analytics-Kanban, 15User-Elukey: Improve purging for analytics-slave data on Eventlogging - https://phabricator.wikimedia.org/T156933#3231432 (10elukey) Useful comment from Marcel about the whitelist: https://phabricator.wikimedia.org/T108850#1618861 [13:34:11] elukey, note also that this comment misses a detail (is outdated) [13:35:32] elukey, the column "table_name" is not "table_name" any more, it's "schema", meaning that the final whitelist, does not store version numbers [13:35:35] just schemas [13:36:05] mforns: thanks! I am reading the task now, but I found those comments useful (like what to do in case a table's attribute is not whitelisted and others are [13:36:16] elukey, probably the last comments on that task explain the thing a bit better [13:36:17] ok [13:36:45] atm I am trying to familiarize with the db, I'll probably ask you a lot of questions in Prague :) [13:37:43] the thing is: if the white-list does consider version numbers, when people modify the schema, their schema will gain a new version number, and the white-list will stop applying to their schema [13:37:57] elukey, does this make sense? [13:38:19] so, the white-list must be robust to schema changes, hence, can not store the version number [13:38:51] so, a white-list rule should apply to all versions of the listed schema [13:38:53] sure sure! Bare in mind that I am still a newbie in this subject :) [13:39:12] ok, ok, feel free to ask me whatever :] [13:40:24] the only thing that I am wondering is if we'd prefer a script that runs manually (giving you a "delete preview" first) or a fully automatic purging [13:41:01] maybe the former would be a good first step, and then it could become fully automated when we trust it [13:46:42] elukey, yea I agree [13:47:21] like it could have a -f flag, that makes it non-interactive, and once we trust it, we can setup a cron job for it, no? [13:47:34] exactly [13:47:53] or --no-preview [13:47:58] whatever [13:51:34] as FYI we are switching back to eqiad in ~9 mins [14:01:29] (03PS3) 10Mforns: Add monthly sanitized job for banner activity [analytics/refinery] - 10https://gerrit.wikimedia.org/r/350219 (https://phabricator.wikimedia.org/T157582) [14:07:47] 06Analytics-Kanban, 13Patch-For-Review: Label mediawiki_history snapshots for the last month they include - https://phabricator.wikimedia.org/T163483#3231479 (10Milimetric) @Neil_P._Quinn_WMF: heads up that this is now done. [14:16:01] Hi milimetric [14:17:06] milimetric: yesterday you said you had feedback for me on wks2 backend [14:17:25] milimetric: many meetings today, [14:17:33] but maybe we couls find some time? [14:17:43] I'm free now joal [14:17:51] and I have no meetings today, so free whenever you are [14:17:55] Let's go then :) [14:18:21] (there) [14:30:10] 12 minutes to switchback to eqiad, really nice job from ops! [14:30:20] everything seems green atm [14:41:25] 10Analytics, 06Community-Tech: Investigation: How can we improve the speed of the popular pages bot - https://phabricator.wikimedia.org/T164178#3231620 (10Milimetric) While the purpose of this bot is awesome, the approach to get data is wrong. This kind of data should be fetched in batch not in millions of ti... [14:43:24] 06Analytics-Kanban: Create purging script for mediawiki-history data - https://phabricator.wikimedia.org/T162034#3231621 (10fdans) a:03fdans [14:44:10] 10Analytics, 06Research-and-Data-Backlog: Host API for token persistence dataset - https://phabricator.wikimedia.org/T164280#3231623 (10Halfak) [14:44:50] 10Analytics, 06Research-and-Data-Backlog: Host API for token persistence dataset - https://phabricator.wikimedia.org/T164280#3228458 (10Halfak) [14:44:59] 10Analytics, 10Analytics-General-or-Unknown: Provide regular cross-wiki reports on flagged revisions status - https://phabricator.wikimedia.org/T44360#3231626 (10Milimetric) I'm still a little lost here, I don't really know how FlaggedRevs or Special:ValidationStatistics work and what kind of data you need. S... [14:47:15] (03PS2) 10Milimetric: Implement showLastDays option on tab layout [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/350692 (https://phabricator.wikimedia.org/T160796) [14:58:59] 10Analytics, 06Scoring-platform-team, 10rsaas-articlequality , 07Spike: [Spike] Store article quality data inside hadoop and make AQS outputs a public API - https://phabricator.wikimedia.org/T164377#3231651 (10Ladsgroup) [15:00:44] ottomata: forgot to ask your review for https://gerrit.wikimedia.org/r/#/c/350542/ [15:01:21] joal: standuppp [15:10:50] 06Analytics-Kanban, 15User-Elukey: Improve purging for analytics-slave data on Eventlogging - https://phabricator.wikimedia.org/T156933#3231683 (10Tbayer) >>! In T156933#3231432, @elukey wrote: > Useful comment from Marcel about the whitelist: https://phabricator.wikimedia.org/T108850#2454271 > > Addendum fr... [15:11:28] 06Analytics-Kanban, 15User-Elukey: Improve purging for analytics-slave data on Eventlogging - https://phabricator.wikimedia.org/T156933#3231688 (10Tbayer) Please note that there is an open task about the whitelist at T164125. [15:39:09] 06Analytics-Kanban, 15User-Elukey: Improve purging for analytics-slave data on Eventlogging - https://phabricator.wikimedia.org/T156933#3231878 (10Tbayer) >>! In T156933#3231683, @Tbayer wrote: > >> https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Data_retention_and_auto-purging > BTW that... [15:49:15] (03CR) 10Ottomata: [C: 031] Add --generate-jar and --jar-file options [analytics/refinery] - 10https://gerrit.wikimedia.org/r/349723 (https://phabricator.wikimedia.org/T143119) (owner: 10Milimetric) [15:49:45] elukey: (respond whenever), q about the zkCli init thing [15:50:10] won't that mean that hadoop-hdfs-zkfc will try to start if zk is down? [15:51:26] ottomata: you both +1-ed it, does that mean I have to self-merge 'cause you're not sure? [15:54:26] milimetric: haha, it means that I've skimmed your responses to my nits and think they are fine, i can +2 :) [15:54:46] (03CR) 10Ottomata: [V: 032 C: 032] Add --generate-jar and --jar-file options [analytics/refinery] - 10https://gerrit.wikimedia.org/r/349723 (https://phabricator.wikimedia.org/T143119) (owner: 10Milimetric) [15:54:52] ottomata: my main question is, is the jar in the right place? Like, do we need to do anything with archiva? [15:55:01] no, that's fine [15:55:16] k, then I'll get to updating the cron in puppet, thanks! [15:55:17] unless we wanted to automate generation of the jar into refinery source release or something [15:55:19] which we don't want to do [15:55:27] oh [15:55:28] hmmmm [15:55:29] no, this version will be fine until they change the schema [15:55:35] do we need to? hmm [15:55:41] it will need to be present on archiva server [15:55:46] git fat push? [15:55:47] milimetric: i'll try [15:56:09] not sure what that means... but ok [15:56:59] milimetric: do you have a real copy of that jar somehwere? [15:57:01] stat1002 or something? [15:58:25] on my local I guess [15:58:28] should I git push? [15:58:37] git fat skinny push fat? [15:58:50] i'm not sure actually, hang on [15:58:59] i think you can't push [15:59:00] not sure [15:59:14] but yes, I have it here, I can put it wherever [15:59:33] ok, we need to upload it to archiva [15:59:38] via the UI [15:59:48] i could put it in git fat manually, but would be better to go through archiva [16:00:23] https://wikitech.wikimedia.org/wiki/Archiva#Uploading_dependency_artifacts [16:00:33] cept it doesn't go into mirrored [16:01:00] urandom: Hello - As usual, comflict for me today :( [16:01:02] milimetric: will you put it somewhere i can get it? [16:01:22] one sec [16:01:22] urandom: I am a 5 minutes late for the meeting but I'll be there if you will :) [16:01:59] elukey: k [16:02:02] ottomata: stat1002.eqiad.wmnet:/home/milimetric/mediawiki-tables-sqoop-orm.jar [16:02:11] k [16:02:12] danke [16:02:51] joal: that's too bad, we're giving out door prizes today [16:03:44] 10Quarry: Quarry should remember my login - https://phabricator.wikimedia.org/T164390#3231927 (10Dvorapa) [16:07:00] (03CR) 10Bartosz Dziewoński: "I think I left a +1 because I don't have permissions to +2 here." [analytics/multimedia] - 10https://gerrit.wikimedia.org/r/324379 (https://phabricator.wikimedia.org/T98449) (owner: 10Gergő Tisza) [16:08:06] ok milimetric we good to go [16:08:36] thanks ottomata, I'll do the puppet in a bit, after this pivot page [16:09:42] ok, im' going to add a readme for this [16:14:10] (03PS1) 10Ottomata: Fix broken select_missing_sequence_run_query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/351666 [16:14:12] (03PS1) 10Ottomata: Add README.mediawiki-tables-sqoop-orm [analytics/refinery] - 10https://gerrit.wikimedia.org/r/351667 (https://phabricator.wikimedia.org/T143119) [16:14:22] (03CR) 10Ottomata: [V: 032 C: 032] Fix broken select_missing_sequence_run_query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/351666 (owner: 10Ottomata) [16:14:26] milimetric: https://gerrit.wikimedia.org/r/#/c/351667/ [16:15:17] cool ottomata, you wanna call it <>.README instead so they show up together if someone ls-es? [16:18:48] a-team: I wrote https://wikitech.wikimedia.org/wiki/Analytics/Systems/Pivot but I stopped abruptly because I realized we need to check with the Imply team about disclosing certain details [16:18:56] will continue when their lawyer gets back to us [16:19:35] unless anyone needs me, I'm gonna get some lunch [16:19:56] k ya sounds good milimetric [16:20:29] (03PS2) 10Ottomata: Add README.mediawiki-tables-sqoop-orm [analytics/refinery] - 10https://gerrit.wikimedia.org/r/351667 (https://phabricator.wikimedia.org/T143119) [16:23:51] fdans: I'm sorry I've been super distracted and have been doing a bad job of starting this wikistats front-end project [16:24:27] fdans: so are you finishing stuff up or waiting for me to give you stuff? [16:24:55] milimetric: oh man please, np at all!! [16:25:20] I'm currently finishing stuff [16:25:50] but it's pretty asynchronous so I could start looking at something [16:25:57] ok, fdans when you're ready to go make a fork: https://github.com/milimetric/wikistats-prototype/network/members [16:26:29] fdans: then take a look at the current webpack config, the README section with what needs to get done [16:26:45] fdans: and when you're ready with that, schedule time with me so you can school me on what's going on [16:26:48] milimetric: i was also editing pivot page! [16:26:51] then we can figure together what to do [16:26:54] https://www.irccloud.com/pastebin/XJXgCH4v/ [16:27:05] milimetric: awesome [16:27:37] ok, fdans, I'm relying on you to execute that plan, 'cause see note on distraction :) [16:27:37] milimetric: maybe you can incorporate my edit? [16:27:58] nuria_: but :) I said I was going to and you said it sounded like a good structure [16:28:16] fdans: did you get all info you needed to cahnage the approach on the eventlogging changeset? [16:28:26] milimetric: yessss I gots your back [16:28:30] fdans: it is quite different so it would need to be throughly tested on beta [16:28:47] milimetric: yes, I KNOW you are right, i just had a few mins [16:28:55] milimetric: feel free to scrape [16:29:11] np, I'll merge when we hear back [16:30:06] nuria_: I'm good with what's needed in the backend, but I've never touched the consumers [16:32:18] fdans: do sync up with ottomata /mforns in that regard you probably want to have that tested on beta before starting deep on wikistats FE [16:37:22] gotcha [16:42:56] ottomata: here I am! Re: zkInit - my understanding is that the exec will be perfomed if no mtime is found in the zk request (by the grep), that is exactly what happens sometimes when one zk node is down.. [16:43:13] so I thought to add the ERROR string as workaround [16:51:25] hmm, we need to fail the hdfs zk start though too [16:51:26] so [16:51:33] maybe we need another exec as a depenendency? [16:51:48] doing it unless => will mean the init exec succeeds [16:57:18] ottomata: isn't grep -q returning 1 only if the field is not present? [16:57:36] we don't want to exec if ctime is there [16:57:38] right? [16:58:39] I mean, if ctime or ERROR are present the new check will return 0, so unless will not be 1 and the exec will not be executed [17:05:11] right, the init won't be executed [17:05:40] but, the only thing that is keeping service { 'hadoop-hdfs-zkfc': from starting at that point is this init exec [17:05:42] so [17:05:45] say, new cluster situation [17:05:48] zookeeper not running [17:06:00] zkCli.sh will ERROR [17:06:09] and with that greped for in unless [17:06:14] means the exec will succeed [17:06:36] the zkfs format exec will succeed according to puppet [17:06:44] and then service { 'hadoop-hdfs-zkfc': will have started [17:06:49] or, at least puppet will try to start it [17:06:55] even though one of its dependencies has actually failed [17:07:06] elukey: ^ [17:07:34] ottomata: just applied the changes so that userAgent object contains an is_bot field [17:07:35] https://gerrit.wikimedia.org/r/#/c/350234/ [17:10:01] ottomata: sorry I am still missing something.. :( ... new cluster -> zkCli.sh returns ERROR -> egrep -q ctime|ERROR will return 0 -> unless will not execute [17:11:45] ah ok starting to get it, let me check puppet [17:12:24] mmmm the service call requires Exec['hadoop-hdfs-zkfc-init'], that will not execute no? [17:13:08] cool fdans added comments, looks mostly good [17:13:40] elukey: service requires that Exec['hadoop-hdfs-zkfc-init'] is successful, not necessarily executed [17:13:46] if unless returns true [17:13:58] Exec['hadoop-hdfs-zkfc-init'] will be successful [17:14:14] so, even though the hadoop-hdfs-zkfc-init format command will not actually be run [17:14:23] the puppet Exec resource will be considered successful [17:14:53] unless 0 is true?? [17:15:05] haha [17:15:07] true/o whatever [17:15:09] sucess is [17:15:15] yes, retval 0 == success sorry [17:15:41] if the unless command returns 0, the whole exec will be considered successful [17:15:48] so anything that depends on it will be allowed to happen [17:16:08] ok this was the missing part [17:16:09] sigh [17:16:27] so [17:17:03] we could add another Exec in there that would be a dependency of the init [17:17:06] that checks if zookeeper is reachable [17:17:38] exec { 'check-zookeeper-up': command => '...' } [17:17:41] ottomata: edited pivot page please take a look cc milimetric [17:17:59] exec { 'zk init': ... require => [..., Exec['check-zookeeper-up']] } [17:18:24] nuria_: link? [17:19:36] or [17:19:49] elukey: we could add your grep for ERROR in the acutal exec[zkfc init] command [17:19:52] hmmm [17:19:53] no [17:19:58] that wouldn't work sorry, the unless would be run firrst [17:20:21] nuria_: nm, found link [17:20:47] ottomata: sorry https://wikitech.wikimedia.org/wiki/Analytics/Systems/Pivot cc milimetric [17:21:25] thx nuria_ [17:22:33] nuria_: , milimetric that looks great [17:22:49] ottomata: ok, will craft response linking to that [17:24:04] ottomata: it feels that we are adding layers of patches, maybe there is a cleaner solution.. will work on it! [17:27:51] ottomata: any chance you got a couple mins to talk specifics of the other half of the task? [17:32:37] going afk people, talk with you tomorrow! [17:32:42] fdans: ya fo sure [17:32:55] batcave? [17:33:08] batk8? [17:33:13] yesh [17:36:56] (03CR) 10Nuria: Fix broken select_missing_sequence_run_query (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/351666 (owner: 10Ottomata) [17:41:26] (03CR) 10Nuria: "Looks good, 1 minor nit in commit." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/351613 (owner: 10Joal) [17:56:31] 10Analytics, 06Community-Tech: Investigation: How can we improve the speed of the popular pages bot - https://phabricator.wikimedia.org/T164178#3232481 (10MusikAnimal) >>! In T164178#3231620, @Milimetric wrote: > While the purpose of this bot is awesome, the approach to get data is wrong. This kind of data sh... [18:01:59] AH! [18:02:01] This means you are trying to run druid on java 1.7 which is not supported anymore. You might have jdk1.8.0_131 but you probably only have jre1.7.? [18:02:02] uh oh [18:02:25] new druid doesn't work with java 1.7 [18:02:40] hmmm [18:06:43] joal, y still t? [18:07:43] been reading druid docs on how to specify different output in template, but it's a bit confusing, have you done it before? [18:09:07] Hey mforns [18:09:14] hey :] [18:09:17] you specify a different datasource as output, no > [18:09:19] ? [18:09:33] not sure, can you batcave? [18:09:46] yes [18:09:50] k :] [18:09:58] milimetric, ottomata : stalled community version of pivot: https://github.com/yahoo/swiv [18:10:28] 10Analytics, 10Analytics-EventLogging: EventLogging tests fail for python 3.4 in Jenkins - https://phabricator.wikimedia.org/T164409#3232541 (10Ottomata) [18:10:49] !! [18:11:31] Iinnteresting [18:12:24] ottomata: still, stalled for the most part [18:12:42] ya [18:12:47] ha, just our luck, yahoo of all companies [18:14:13] they do fix the two bugs we care about most: custom granularities and instability of metric choices. So it's better than our current version [18:17:29] oh joal I forgot, so my plan is to pass a banner_activity_directory that points to my home folder in hdfs and I'll mock the _SUCCESS file structure for the monthly job, does this make sense? [18:17:50] mforns: do you have that page you and Helen worked on? With the plan to move ee-dashboard? [18:17:57] milimetric, looking [18:20:03] 10Analytics, 06Developer-Relations, 10MediaWiki-API, 06Reading-Admin, and 5 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079#3232581 (10NHarateh_WMF) [18:20:28] kaldari: the data didn't go anywhere, it's in a git repo, I'll find it in a sec [18:20:48] kaldari: but no, it would take a whole bunch of hacking to revive limn, that's why we shut it down in the first place [18:21:10] kaldari: while I find the data, if you don't mind let me know what exactly you're looking for maybe it's easier to get now [18:21:20] rats, wish I had known it was being decommissioned. I would have at least taken screenshots of all the graphs :P [18:21:58] The anti-harassment team wants the usage data for the WikiLove and Thanks extensions. [18:22:19] WikiLove had a dedicated graph and Thanks was part of the Notifications graph. [18:24:45] milimetric, I can not find it, do you mean a wiki page? [18:25:03] kaldari: https://github.com/wikimedia/limn-editor-engagement-data that has all the data and history in it [18:25:21] kaldari: looking for the wikilove data, one sec, for which wiki? [18:26:12] Mostly for en.wiki [18:26:44] which might have been the only one we set up data for, can't remember [18:27:48] milimetric: also would be good to update https://wikitech.wikimedia.org/wiki/EE_Dashboard and https://wikitech.wikimedia.org/wiki/EE_Dashboard [18:28:02] oops... https://wikitech.wikimedia.org/wiki/Mobile_Reportcard [18:28:03] kaldari: https://datasets.wikimedia.org/public-datasets/enwiki/wikilove/ [18:28:20] kaldari: I'll update those, sure [18:28:50] thanks! [18:28:56] kaldari: I'm not seeing anything about "Thanks" extension though [18:29:08] It was part of the Echo data [18:29:14] for notifications [18:29:30] Thanks is a type of Echo notification [18:30:39] kaldari: gotcha, well, go to https://github.com/wikimedia/limn-editor-engagement-data/tree/master/datasources and search for "enwiki_echo" and you'll get a bunch of hits, all JSON files. If you open one, it'll link you to the corresponding CSV [18:31:10] kaldari: but I believe they're all in here: https://datasets.wikimedia.org/public-datasets/enwiki/echo/ [18:31:23] OK, probably not something I actually have time to dig into, but I'll let the anti-harassment team know. [18:31:58] really sad we don't have those dashboards running anywhere though [18:32:02] kaldari: just send them https://datasets.wikimedia.org/public-datasets/enwiki/echo/ and https://datasets.wikimedia.org/public-datasets/enwiki/wikilove/ and tell them if they have trouble to come to me [18:33:01] kaldari: it's sad that we built limn in the first place, ever since it went up everyone in the world screamed for it to go down. It is indeed weird that now people are asking for it to go back up, after we said we'd take it down over two years ago [18:33:14] but I guess that's how it goes [18:33:32] who wanted them taken down and why? [18:34:06] they seemed really useful to me [18:34:09] every director and above at WMF since it went live, because it was written in-house and it didn't meet their needs [18:34:26] hmmph [18:34:39] that's why we went with dashiki that's mostly glue code for outside libraries [18:35:07] that puts the pressure on people to structure their data into dashboards, and this particular data has no owners, that's the real problem [18:35:14] not the infrastructure, which is better now [18:43:12] mforns: That makes total sense :) [18:43:20] k joal [18:44:06] milimetric, I think it was an etherpad, but I can't find a way to list them, so... can't find it [18:48:18] 10Analytics, 03Community-Tech-Sprint: Investigation: How can we improve the speed of the popular pages bot - https://phabricator.wikimedia.org/T164178#3232663 (10Niharika) [18:52:22] mforns: helen made a table on-wiki with all the dimensions and combinations of data that was available, and a plan for what she wanted to start migrating. It's ok if you don't find it, but if you remember sometime let me know [18:52:29] milimetric: makes sense [18:52:53] milimetric: What are "monthy pagecounts" (at https://analytics.wikimedia.org/dashboards/reportcard/#pagecounts-dec-2007-dec-2016/monthly-pagecounts) [18:53:29] I have a random question - why is the pageviews API designed in a way that it we can only query for one page's views at a time? (I'm thinking of the URL structure ) [18:53:35] kaldari: are you asking what pagecounts are as opposed to pageviews or how the monthly aggregation works? [18:53:47] the first :) [18:53:52] just curious [18:53:59] Niharika: will answer in a moment [18:54:41] kaldari: pagecounts was the previous definition that did not filter out spiders and was kind of loose. Data for it was collected via lossy UDP and in a lossy infrastructure. So it's basically older lower quality [18:54:59] kaldari: more info here: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic [18:55:02] got it, thanks! [18:55:38] Niharika: that's on purpose because batch operations should not happen through an API. So the API was meant to provide page-centric or project-centric gadgets [18:56:13] I guess that means we're abusing the API then :) [18:56:29] yes, a few people are, I'm getting more and more worried about it [18:56:32] milimetric: But most of the tools out there using the API are probably making multiple requests like we are. [18:56:56] yeah, it doesn't make a lot of sense to do that kind of work through the API in my opinion [18:57:15] The MediaWiki API works like that though, doesn't it? [18:57:20] Is there a way to access the data more directly from Tool Labs? [18:57:39] Niharika + kaldari: it would be perhaps very productive to get together a lot of the batch use cases and examine them as a whole. Perhaps there's another API we can build that makes more sense, or a regular data dump [18:57:53] a-team, druid looking good with older pivot in labs [18:58:04] package and puppet ready to go [18:58:07] milimetric: How hard is it to get the data in a table on tool labs? [18:58:11] o/ [18:58:13] we can upgrade at any time [18:58:25] Niharika: depending on what data, it could be impossible because it wouldn't fit [18:58:39] Niharika: that's the main difference from mediawiki api, where data is relatively small [18:58:42] im' going to be off this week [18:58:45] next week* [18:58:45] we're talking about TBs here [18:58:47] today is wednesday [18:58:50] Right. [18:58:56] i think we should do it tomorrow morning [18:59:16] yall ok if I send an email right now and schedule this? it'll be a druid/pivot downtime, but I don't expect it to take more than 10 minutes [18:59:34] am I being too hasty? :) [18:59:38] ottomata: good for me, do it [18:59:49] ottomata: oh wait [18:59:51] turkey thing [18:59:59] maybe wait until nuria / managers say ok [19:01:02] Niharika: but again, depends on exactly what data, and we can custom-build data dumps that might either fit in a table on labs or in flat files that can be easily read by a generic aggregating API [19:02:22] milimetric: Yeah. Will depend on what data most tools are querying for. [19:03:04] right, that's why it's important to get a good overview of most tools we care about [19:04:39] milimetric: If/when you collect such data, I'd be happy to write up about our use cases for the API. [19:07:40] Niharika: I have zero visibility into that right now. I hope to get more later this year as I look at community backlogs of work (hopefully with your team's help). In the meantime, I think you know/have many more use cases than I do. [19:08:02] Niharika: are you mostly concerned with popular pages bot? [19:08:06] milimetric: Alrighty, I'll try and write up some of them. [19:08:10] nuria_: Yup. [19:10:14] Niharika: ya, that is not the most efficient way to do what you want, even if that bot could be improved a ton. What you want is a big data job that uses the data available, that being said your code (w/ improvements) will still 'work', it will just not be efficient. [19:11:07] milimetric: ok, CRing your dashiki code now [19:15:36] hmm, on second thought, it think it is wiser to be cautious [19:15:53] there could be hadoop/indexing task dependency issues I'm not seeing in labs. [19:16:05] would rather upgrade when i have plenty of time to respond to things [19:17:43] ottomata: +1 [19:46:32] 10Analytics: Preserve userAgent field in apps schemas - https://phabricator.wikimedia.org/T164125#3222702 (10mforns) Hi @Tbayer We should make sure though that all these schema_revision pairs do not have any fields that can constitute - together with the user agent map - a sensitive structure. For example: user... [19:46:55] 10Analytics, 10EventBus, 06Services, 05Multiple-active-datacenters, 15User-mobrovac: WANObjectCache relay daemon or mcrouter support - https://phabricator.wikimedia.org/T97562#3232934 (10Krinkle) [19:52:56] (03CR) 10Nuria: [V: 04-1 C: 04-1] "Two things:" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/350692 (https://phabricator.wikimedia.org/T160796) (owner: 10Milimetric) [19:58:34] (03PS2) 10Nuria: Update daily unique devices druid loading job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/351613 (https://phabricator.wikimedia.org/T164183) (owner: 10Joal) [19:58:58] 10Analytics, 10EventBus, 06Services, 07Availability (Multiple-active-datacenters), 15User-mobrovac: WANObjectCache relay daemon or mcrouter support - https://phabricator.wikimedia.org/T97562#3233022 (10Krinkle) [19:59:01] (03CR) 10Nuria: [V: 032 C: 032] "Updated commit, merging." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/351613 (https://phabricator.wikimedia.org/T164183) (owner: 10Joal) [19:59:46] 06Analytics-Kanban: Update druid unique Devices Dataset to only contain hosts having more than 1000 uniques - https://phabricator.wikimedia.org/T164183#3233027 (10Nuria) [20:00:45] 06Analytics-Kanban: Update druid unique Devices Dataset to only contain hosts having more than 1000 uniques - https://phabricator.wikimedia.org/T164183#3224787 (10Nuria) [20:09:28] 10Analytics, 03Community-Tech-Sprint: Investigation: How can we improve the speed of the popular pages bot - https://phabricator.wikimedia.org/T164178#3233060 (10Niharika) a:03Niharika [20:14:11] 06Analytics-Kanban, 15User-Elukey: Improve purging for analytics-slave data on Eventlogging - https://phabricator.wikimedia.org/T156933#3233093 (10mforns) @Tbayer > Can we set them to NULL instead of an arbitrary value? That would make it much easier to avoid the accidental inclusion of garbage data in analys... [20:16:32] COOOL [20:16:37] curl -XPOST -H'Content-Type: application/json' http://localhost:8082/druid/v2/sql/ -d '{"query": "SELECT country_code, COUNT(*) as cnt FROM \"unique-devices-daily\" GROUP BY country_code ORDER BY cnt DESC LIMIT 10" }' | jq . [20:30:26] ottomata: That is indeed very cool ! [20:30:36] * joal wants to test :) [20:32:58] druid101 in labs [20:33:05] i only have one day of that datasource loaded though [20:33:28] ottomata: I'll wait for upgrade, my questions are more around perf ;) [20:33:55] :)