[00:03:06] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3080331 (10faidon) What's the status of this (and stat1003 too)? Who is this waiting on? Is it @ottomata or @RobH? [00:07:23] (03CR) 10Jforrester: "If we can fold the data into Wikistats 2.0 in a year's time or whatever that might be good, but for now…" [analytics/limn-edit-data] - 10https://gerrit.wikimedia.org/r/344054 (https://phabricator.wikimedia.org/T160454) (owner: 10Nuria) [00:14:25] James_F: Since you are here... one question [00:15:05] nuria: Sure. :-) [00:16:09] James_F: You know the edit analisys dashboards you were mentioning as of recent [00:16:14] *analysis [00:16:23] James_F: let me get exact link [00:16:28] The 'ee' ones? [00:17:00] https://edit-analysis.wmflabs.org/editor-engagement/#projects=eswiki,enwiki,ruwiki/metrics=Daily%20Unique%20Nonbot%20Registered%20Editors [00:17:05] James_F: thsi one [00:17:06] *this [00:17:07] Yes. [00:17:21] James_F: this data is on data lake on pivot now [00:17:28] OK… [00:17:31] James_F: much richer than this one [00:17:34] But how that that help? [00:17:44] This is a dashboard. For public consumption. [00:18:03] Pivot doesn't serve that use case. [00:18:27] Unless I'm not understanding and Pivot is accessible to all? [00:18:53] James_F: no it is not but from our user stats seems that these dashboards are just used by you pretty much, thsu my question [00:18:55] *thus [00:19:34] Yeah, we're trying to close the old ones which get more hits (but are Limn-based) and move the data over but we don't have much time. [00:20:03] James_F: we alredy moved those to non limn, that is no issue, limn is dead [00:20:10] No? [00:20:21] James_F: ya, those are not on limn [00:20:23] The old content from http://ee-dashboard.wmflabs.org/ [00:20:48] Most of which isn't visible any more. [00:21:11] James_F: ok, let me see, one thing at a time: [00:21:15] There's product-specific metrics there for VE, Flow, Echo, Guided Tour, and a few others. [00:21:26] That's all eventually going to be moved over to https://edit-analysis.wmflabs.org/editor-engagement/ [00:23:13] James_F: ok, so in other words you still use these, I was asking cause they have very limited use and thus pivot might be sufficient [00:23:38] We /want/ to consolidate everything, but it's exceptionally slow-going. [00:24:29] Thanks! [00:24:54] James_F: ok, we will resurrect this one: https://edit-analysis.wmflabs.org/editor-engagement/#projects=eswiki,enwiki,ruwiki/metrics=Daily%20Unique%20Nonbot%20Registered%20Editors [00:25:22] James_F: limn will die at the end of month and all dashboards that were called out to be migrated to dashiki have been so [00:25:30] Oh, did it get actively switched it off without asking? That's not very nice. :-( [00:25:57] James_F: no, we asked a plenty : https://phabricator.wikimedia.org/T146308 [00:26:00] Where was the calling out? Did all the stuff we want to migrate get moved? [00:26:20] James_F: it is been called out by labs since at least 6 months ago that the machine that hosted this will die on march 31st [00:26:30] OK… [00:26:44] Does that mean all the data will be deleted? Or is it just the box serving the data that's going? [00:27:11] James_F: No, data is never been on labs, just rendering layer [00:27:20] OK, then I don't care. :-) [00:27:46] (Though we should get any data that's not rendered anywhere shown somewhere useful.) [00:27:54] James_F: ya, that is why we migrated that main things and they have been in dashiki for a while [00:28:13] Which instance of Dashiki is "in Dashiki"? [00:28:17] James_F: last outstanding i think is the one you mentioned cause we thought helen was doing that work ()https://edit-analysis.wmflabs.org/editor-engagement/#projects=eswiki,enwiki,ruwiki/metrics=Daily%20Unique%20Nonbot%20Registered%20Editors [00:28:34] From your words it seems not https://edit-analysis.wmflabs.org/editor-engagement/ right? [00:28:55] Helen was, yes, but then she left. :-( [00:29:32] James_F: dashiki is deployed in several places, it's just a client side app and thus doesn't need backend , the editing dashboards rendering is happening on labs [00:29:51] Yes. [00:30:02] James_F: teh browser dashboards and others are rendered in prod machines: https://analytics.wikimedia.org/dashboards/ [00:30:06] Is there meant to be a single place from which… ah. [00:30:15] Interesting. [00:30:31] James_F: ok, we will finish that one, i will put it in our queue so it updates, it's just missing crons for the data [00:30:35] I don't think I'd consciously seen that. [00:30:40] Nice. Thank you! [00:31:16] 06Analytics-Kanban, 06Editing-Analysis: editor-engagement dashboard on edit-analysis stopped updating on ~ 2017-02-21 - https://phabricator.wikimedia.org/T160807#3120463 (10Nuria) [00:32:02] 06Analytics-Kanban, 06Editing-Analysis: editor-engagement dashboard on edit-analysis stopped updating on ~ 2017-02-21 - https://phabricator.wikimedia.org/T160807#3111614 (10Nuria) Adding to kanban so we can finish this work, I think only crons for data might be missing. [00:32:46] James_F: ok, so you known all this will be available in wikistats: https://analytics.wikimedia.org/dashboards/standard-metrics/#projects=eswiki,itwiki,enwiki,jawiki,dewiki,ruwiki,frwiki/metrics=(Beta)%20Monthly%20New%20Editors [00:32:54] James_F: this was a 1 off computation effort [00:34:04] But eventually that data will be in the Wikistats 2.0 thing? [00:36:39] James_F: Yes, data IS already on data lake which is the data layer for wikistats 2.0, neilpquinn is using it. There is no frontend for wikistats yet and the metric computation needs to happen recurrently but wikistats 2.0 data is available now internally and will be available externally later on this year. [00:36:48] James_F: makes sense? [00:36:58] * James_F nods. [01:42:31] 10Analytics, 10Analytics-EventLogging: Investigate logging on right-click link navigation - https://phabricator.wikimedia.org/T46480#3120572 (10Krinkle) Is this task still relevant? It seems the logic provided by EventLogging was simplified to just the logging of the beacon. The attachment to click handler is... [02:49:29] 10Analytics, 10Analytics-EventLogging: Investigate logging on right-click link navigation - https://phabricator.wikimedia.org/T46480#3120680 (10Nuria) Doesn't seem relevant, we do not track generic clicks on pages. Declining [02:49:42] 10Analytics, 10Analytics-EventLogging: Investigate logging on right-click link navigation - https://phabricator.wikimedia.org/T46480#3120681 (10Nuria) 05Open>03declined [08:09:08] hi a-team :] [08:14:08] 10Analytics-Tech-community-metrics: Updated data in mediawiki-identities DB not deployed onto wikimedia.biterg.io? - https://phabricator.wikimedia.org/T157898#3120900 (10Lcanasdiaz) >>! In T157898#3110328, @Aklapper wrote: > Looks like [[ https://github.com/Bitergia/mediawiki-identities/commits/master/wikimedia-... [09:49:41] a-team - maybe something else to try??? https://mariadb.com/products/mariadb-columnstore [09:55:43] milimetric: Thanks for the patch in table creation !@ [10:12:10] joal, hellooooo [10:12:21] Hi mforns !! [10:12:25] What's up? [10:12:26] :] [10:12:39] can you do me a favor and be there while I deploy AQS? [10:12:49] mforns: Here I am :) [10:12:56] xD, batcave? [10:13:00] sure ! [10:13:02] k [10:13:30] joal, give me 1 min [10:47:50] 10Analytics, 10Analytics-EventLogging: Investigate logging on right-click link navigation - https://phabricator.wikimedia.org/T46480#495221 (10Tbayer) >>! In T46480#3120680, @Nuria wrote: > Doesn't seem relevant, we do not track generic clicks on pages. Declining To avoid misunderstandings, there are actually... [10:59:20] elukey: Hi ! [10:59:31] elukey: I'd need some root on stat1004 please :) [11:01:34] joal: isn't luca out this week? [11:01:44] fdans: Ahhh, forgot about that :) [11:01:56] sorry for thep ing elukey ! will wait for ottomata :) [11:02:06] Thanks fdans :) [11:02:12] ;) [11:13:58] thans fdanks :D [11:49:37] (03PS1) 10Fdans: Add legacy pageviews metric [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344114 (https://phabricator.wikimedia.org/T143906) [12:23:44] (03CR) 10MaxSem: [C: 031] Adding renamed tables to sql union statements [analytics/discovery-stats] - 10https://gerrit.wikimedia.org/r/344049 (https://phabricator.wikimedia.org/T160454) (owner: 10Nuria) [12:35:00] a-team - Taking a break now, will be back for standup [12:35:09] k joal :] [12:42:27] (03PS1) 10Mforns: Correct details in legacy pageviews endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/344131 (https://phabricator.wikimedia.org/T156391) [12:58:05] 10Analytics, 10ChangeProp, 10Citoid, 10ContentTranslation, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#3121707 (10MaxSem) [13:11:21] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3121732 (10Ottomata) stat1003 ticket is T159839. This stat1002 replacement ticket is waiting on feedback from @dartar and @halfak about acceptable GPU specs. [13:17:22] 10Analytics, 10Analytics-EventLogging, 10DBA, 10ImageMetrics, 13Patch-For-Review: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407#3121750 (10Marostegui) So far so good! ``` root@EVENTLOGGING m4[log]> show tables like 'ImageMetr... [13:17:36] 06Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3121752 (10Ottomata) [13:24:18] 06Analytics-Kanban, 06Editing-Analysis: editor-engagement dashboard on edit-analysis stopped updating on ~ 2017-02-21 - https://phabricator.wikimedia.org/T160807#3121756 (10mforns) a:03mforns [13:31:08] mforns: I'm trying to clone in a new folder to see why you're both having trouble [13:31:17] milimetric, ok thanks :] [13:33:56] ottomata, I think rsync from stat1003 to thorium (previously stat1001 right?) is broken [13:34:09] meaning from report files [13:34:32] btw hi! :] [13:37:06] oh ya? hm [13:37:10] ok will look in a few mins mforns... [13:41:06] 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#3121825 (10Andrew) Any update on this? [13:41:56] thanks ottomata [13:42:24] milimetric: having a look now at the prototype :) [13:42:37] fdans: something's broken about the build [13:43:00] I'm ihe batcave if anyone wants to take a look with me [13:43:33] this is a good opportunity to debug the webpack setup too, maybe [13:44:40] hm, yeah, it's the webpack version or one of the other dev tools [13:46:42] 06Analytics-Kanban, 06Editing-Analysis: editor-engagement dashboard on edit-analysis stopped updating on ~ 2017-02-21 - https://phabricator.wikimedia.org/T160807#3121878 (10mforns) Looking into this, RU is working as expected and the report files are generated correctly in stat1003:/srv/reportupdater/output/me... [13:51:34] mforns: can you give me an example file? [13:51:40] that you expect to see but its missing? [13:51:47] oh ticket... [13:51:48] i see it! [13:51:49] looking [13:51:57] ok, sorry, should have pasted it [13:52:21] ottomata, https://datasets.wikimedia.org/limn-public-data/metrics/ee/daily_edits/ [13:52:44] oh mforns [13:52:48] no longer at datasets.wm.org [13:52:50] milimetric: seems all good now :) except the 404 on semantic [13:52:54] because of refactor [13:52:58] analytics.wikimedia.org/datasets [13:53:04] ottomata, oh! [13:53:12] https://phabricator.wikimedia.org/T132594 [13:53:24] milimetric: has been handling it since, should we redirect? [13:53:28] not sure what the status is [13:53:54] we could change dashiki [13:53:59] i think dan was doing that [13:54:33] fdans, you have to: cd semantic; gulp build [13:54:44] ohhh right [13:54:49] thank you marcel :) [13:54:53] np :] [13:56:39] (I added that to the readme) [13:57:55] ottomata: I haven't reached out to potential owners of the "other" datasets, I wanted to finish the prototype first [13:58:12] let me know if I should move that up in priority [13:59:48] milimetric: i guess the crons have been changed so that potential owners of other datasets aren't getting updates [13:59:55] https://phabricator.wikimedia.org/T160807#3121878 [14:01:36] milimetric, I think the problem is that Helen's editor engagement dashboard is not in the config.yaml file for deploying [14:01:51] ah! [14:01:57] yeah, I only deployed the dashboards that were configured [14:02:08] this makes sense [14:02:28] but there's a conversation with James and nuria above where they kind of cover this [14:02:52] I feel like there's a lot of confusion, but it seems easy to fix the dashboard, just have to redeploy with latest dashiki code, which has the new analytics.wikimedia.org/datasets root [14:03:33] meanwhile, fdans / mforns: ready to hang out and talk prototype? [14:03:42] 06Analytics-Kanban, 06Editing-Analysis: editor-engagement dashboard on edit-analysis stopped updating on ~ 2017-02-21 - https://phabricator.wikimedia.org/T160807#3121935 (10mforns) After talking with @Ottomata and @Milimetric we recalled T132594, where the report files now are being rsync'd to analytics.wikime... [14:03:46] sure [14:03:56] ok [14:31:30] ottomata: weekly ops or not? [14:31:45] sure! ya [14:31:48] ottomata: if yes, https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave-2 [14:38:55] (03PS1) 10Ottomata: Add analytics1003 as refinery scap target [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/344154 [14:39:52] (03CR) 10Joal: [V: 032 C: 032] "LGTM, merging!" [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/344154 (owner: 10Ottomata) [14:54:22] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1003 replacement - https://phabricator.wikimedia.org/T159839#3080357 (10Halfak) I was just told that this task is blocking on me to provide GPU specs. Is that right? [14:58:36] 06Analytics-Kanban: Kill limn1 - https://phabricator.wikimedia.org/T146308#3122033 (10Nuria) Work will be completed by end of month by the time instance is turned off. [15:28:40] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3122125 (10Halfak) OK I did some digging. My assessment is that the Nvidia Tesla K80 is most desirable, but the closed source drivers will be a problem. The AMD Fir... [15:29:05] ottomata, https://phabricator.wikimedia.org/T159838#3122125 [15:29:09] I hope that's helpful [15:29:29] Oh I should have pinged in -operations so we can chat with Robh [15:29:32] * halfak does that [15:41:35] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3122155 (10RobH) I'd say the closed source driver is a blocker. We've blocked the use of some PCIe flash memory cards due to closed source driver usage in years past... [15:43:59] oh, ottomata - Back in da cave? [15:44:21] urandom: Hello - I'm sorry I have a conflict again today :( [15:45:02] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3122164 (10RobH) I'll plan on getting some quotes generated with the Dell and HP options, with the more open source friendly GPU options to start. If the closed sour... [15:45:50] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3122167 (10MoritzMuehlenhoff) Leaving our FLOSS policy aside, the proprietary Nvidia is going to be a significant problem both in terms of - maintainability (we tend... [15:46:35] joal: kk; still open to finding a better time [15:47:15] urandom: It's not a bad time, it's really bad luck I think - Other meetings I get at that time are involving lots of others, so I think it's not related [15:47:20] But it's still not cool :( [15:47:46] i'm +2 from the US west coast now (or UTC-5), and i get up quite early [15:48:14] everyone else is in europe, so something earlier would certainly be OK I think [15:48:18] something to consider [15:48:30] Would be very ok for me obviously [15:49:08] never thought i'd say this, but i think i'm a 'morning person' [15:49:21] :) [15:49:22] joal: gimme few, fixing crons... [15:49:26] sure ottomata [15:49:27] logrotate, etc. [15:49:46] urandom: I have changed my mind on myself a lot on this respect ! [15:51:18] joal: i get up at 6am most morning and *go to the gym*, of all things [15:51:30] me from 20 years ago would be shocked to hear this :) [15:51:43] :d [15:51:45] :D [15:52:02] urandom: Not so early for me, but yeah, morning is what I prefer [15:52:04] 10Analytics, 10Recommendation-API: productionize recommendation vectors - https://phabricator.wikimedia.org/T158973#3122176 (10leila) @Capt_Swing's recommendation based on https://meta.wikimedia.org/wiki/Research:Evaluating_RelatedArticles_recommendations#Results is for us not to productionize this service. I... [15:53:32] oh noes, elukey too! [15:53:59] Et tu, Brute? [16:29:41] 06Analytics-Kanban, 06Editing-Analysis: editor-engagement dashboard on edit-analysis stopped updating on ~ 2017-02-21 - https://phabricator.wikimedia.org/T160807#3122273 (10Nuria) a:05mforns>03Nuria [16:30:21] bye team! see you tomorrow [16:31:58] fdans: my internet is kaput, can you cover for me for the time I'll be off please [16:32:15] joal: sure thing [16:32:25] actually, I'm back [16:32:59] joal: I'm with cam off because my internet is being mean too [16:40:05] ok joal cron ready [16:40:23] awesome ottomata :) [16:40:26] in bc i you want to try together [16:40:27] Many thanks ! [16:40:40] ottomata: still in meeting, will ping you when finished [16:40:51] k [16:41:00] (03PS1) 10Nuria: Removing unused site configuration (vital-signs) [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344167 [16:47:07] joal: https://yarn.wikimedia.org/proxy/application_1488294419903_71249/ [16:47:18] hmmm, joal [16:47:19] WOoohoo :) [16:47:26] should we make the dates that get run for the previous month? [16:47:27] not sure. [16:47:53] ottomata: nope, should be ok (dates are to be previous to the first of the month IIRC) [16:47:59] ok [16:47:59] cool [16:48:08] I'll start oozie jobs :) [16:48:13] Thanks a lot ottomata !P [16:48:16] fyi, stdout goes to /var/log/refinery/sqoop-mediawiki.log on analytics1003 [16:48:37] great ottomata [16:49:17] ottomata: I also realised I forgot to put some parameters in the sqoop command that might make it fail :( [16:49:26] ottomata: let mo know if it does [16:50:36] oh? [16:50:39] ok... which ones? [16:51:25] ottomata: number of processors [16:51:53] ottomata: it will try to use CPU number, and I think it'll fail because too many triall for labs [16:51:57] But let's wait an see [16:52:02] ls -lah [16:52:04] oops [17:18:19] going prtettttyy slow joal :) [17:18:24] heading out to get lunch/change locations [17:18:26] back in a bit [17:18:29] ottomata: This is a known thing :) [17:18:50] ottomata: I thing it'll fail when starting a group with a lot of wikis (trying to paralelize toomuch for labs) [17:18:55] ottomata: We'll see :) [17:19:36] 06Analytics-Kanban: User history in hadoop - https://phabricator.wikimedia.org/T134793#3122447 (10Milimetric) [17:19:38] 10Analytics, 07Spike: Research spike: load enwiki data into Druid to study lookup table performance - https://phabricator.wikimedia.org/T141472#3122446 (10Milimetric) [17:19:40] 10Analytics, 07Spike: Spike - Slowly Changing Dimensions on Druid - https://phabricator.wikimedia.org/T134792#3122444 (10Milimetric) 05Resolved>03Open This was not resolved, we never loaded slowly changing dimensions the way we imagined here. It's fine if we decide we no longer want to do that, but then w... [17:19:52] 06Analytics-Kanban: Wikistats 2.0. - https://phabricator.wikimedia.org/T130256#3122449 (10Milimetric) [17:19:54] 06Analytics-Kanban: Redact data so it can be public - https://phabricator.wikimedia.org/T145091#3122448 (10Milimetric) 05Resolved>03declined [17:35:29] neilpquinn: hello, meeting data lake? [17:41:59] joal: should I just stop and relaunch it? [17:42:08] ottomata: I think it would be better [17:42:12] ok [17:42:39] ottomata: adding -k 5 will be good [17:44:30] joal: "default is the number of [17:44:30] processors on the machine" [17:44:30] ? [17:44:59] ottomata: yes, that's what python does by default [17:45:11] hm, oh so too many then? [17:45:16] correct ottomata [17:45:24] we are very limitted by labs [17:46:04] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3122489 (10faidon) Yeah, we had a similar conversation over email with Adam (@dr0ptp4kt) who was also inquiring about TensorFlow. I had the same considerations that w... [17:53:00] joal [17:53:01] 17/03/22 17:52:47 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://analytics-hadoop/wmf/data/raw/mediawiki/tables/revision/snapshot=2017-03/wiki_db=enwiki already exists [17:53:03] on second run [17:53:08] it doesn't overwrite! :o [17:53:14] ottomata: makes sense [17:53:22] ottomata: it is supposed to do so [17:53:26] oh ya? [17:53:30] ok so [17:53:31] ottomata: Let's remove the snapshot [17:53:36] i should stop job aagain and [17:53:37] ok.. [17:53:37] yeah [17:54:29] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3122500 (10dr0ptp4kt) I've emailed with a contact re: OpenCL support for the Nvidia Tesla `P100` (presumably facilitated by mainstreaming of OpenCL), and also shared... [17:55:24] sorry ottomata :( Would you prefer to have sqoop overwrite instead of fail? [17:55:34] hm, joal dunno! [17:55:37] up to yall [17:55:41] probably makes sense to fail [17:55:47] especially since we are running as a cron [17:56:04] ottomata: We're not supposed to import multiple times the same snapshot [17:56:07] but, maybe it should fail explicitly? like, if the snapshot exists at all, just stop there, instead of launching a hadoop job [17:56:29] ottomata: makes a lot of sense [17:56:45] ottomata: Will create a task about updating the script [17:56:51] k +1 :) [17:58:05] 10Analytics: Update refinery sqoop script to explicitely fail in case a snapshot / destination folder already exists - https://phabricator.wikimedia.org/T161128#3122505 (10JAllemandou) [17:58:08] ottomata: --^ [18:04:29] danke [18:08:17] oook joal this one [18:08:18] https://yarn.wikimedia.org/cluster/app/application_1488294419903_71401 [18:08:22] running with -k 5 [18:09:41] ottomata: Thanks a lot ! [18:09:49] ottomata: We can let this one bakr [18:11:58] cool [18:14:25] geerit might be kaput [18:15:22] (03PS1) 10Nuria: Upgrading config for editor-engagement dashboard [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344178 [18:17:00] 06Analytics-Kanban, 06Operations, 10Traffic, 06Wikipedia-iOS-App-Backlog, and 2 others: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#2915651 (10JMinor) The version returning to 10% sampling rate is up on the app store. [18:32:29] ottomata1: Do you have a minute to discuss python packages? [18:36:20] joal: sure [18:36:47] ottomata: batcave or IRC? [18:38:58] irc [18:39:00] if ok [18:39:02] at cafe [18:39:06] sure ottomata [18:39:56] ottomata: I think, from what I have managed to play with, that I'd only need debian packaged python things to be installed - Rest could be passed as eggs files - But, I have no way to test without trying :( [18:40:49] debian packaged python things? [18:40:57] ottomata: yes [18:41:00] oh, the ones that are already packaged you mean? [18:41:07] ottomata: correct [18:41:08] joal: you can try in labs? i still have a hadoop cluster there i think... [18:41:22] ottomata: feasible ! [18:42:07] ya, cdh3-* nodes [18:42:13] feel free to sudo do whatever you like :) [18:42:32] cool ottomata, I'll see if I can get something out of that :) [18:42:34] Thanks ! [18:42:38] cool! [18:50:43] 06Analytics-Kanban, 06Editing-Analysis, 13Patch-For-Review: editor-engagement dashboard on edit-analysis stopped updating on ~ 2017-02-21 - https://phabricator.wikimedia.org/T160807#3122815 (10Nuria) Dasboards have now data until today: http://localhost:5000/dist/metrics-by-project-EditorEngagement/#projects... [18:52:10] 06Analytics-Kanban, 06Editing-Analysis, 13Patch-For-Review: editor-engagement dashboard on edit-analysis stopped updating on ~ 2017-02-21 - https://phabricator.wikimedia.org/T160807#3111614 (10Nuria) FYI we track usage of this dashboard on piwik.wikimedia.org toegther with the rest of edit dashboards, they s... [18:57:17] ottomata: actually it won't be easy with debian packages [18:57:42] ottomata: Just checked with the trusty repo, and most packages, when exisiting, are outdated [18:57:53] ottomata: mwarf [18:57:56] what about jessie? [18:58:00] since we are about to upgrade all to jessie [18:58:03] one of the nodes there is jessie [18:58:08] will check [18:58:08] cdh3-5 [19:01:22] HaeB: do you happen to know... [19:01:24] ottomata: better on jessir [19:01:59] HaeB: if we have an opt out on android so users do not sent appinstall ids (and thus they are not counted towards android active users?) [19:02:14] yes [19:02:16] joal: better enough to make it work? [19:02:22] ottomata: only big one missing for python3 (and outdated for python 2) is scikit-learn [19:02:26] there's an opt-out [19:02:37] rest seems manageable [19:03:10] ottomata: I'll try to have it working on cdh3-5 tomorrow and touch base after [19:03:20] Thanks again for the heads up [19:03:29] Gone for tonight a-team - Laters [19:04:58] ok joal if scikit-learn doesn't itself have too many deps [19:05:03] that aren't satisfied by debs [19:05:06] we might be able to make a deb of it [19:05:40] ottomata: hi! sorry for the bother, just following my silly accidental deletion of a file in my home dir on stat1002, I've been asked to check 100% that there are no backups of such stuff anywhere... this is correct? Many thx!!! [19:07:16] ottomata There actually are debian tags in scikit-learn github repo - maybe something worth looking [19:09:27] let me triple cehck AndyRussG but i don't think so [19:09:44] ottomata: thx!!! :) [19:10:08] !!! AndyRussG I am wronge! [19:10:09] include role::backup::host [19:10:10] backup::set { 'home' : } [19:10:13] HMMMMM [19:10:19] i have never accessed backups efore [19:10:28] ottomata: oooh! [19:10:32] you may be in luck... [19:10:36] let's ask in ops [19:10:42] or, lemme search wikitech first.. [19:10:52] HaeB: got it , thank you. [19:11:40] nuria: yw. are you asking because of the monthly/daily uniques oozie job? [19:12:03] AndyRussG: whats the exact path and name of the file? [19:12:18] and, when do you think it was truncated? [19:12:28] or, when was the last time you knew it was complete? [19:13:19] ottomata: /home/andyrussg/banner_history_nov_2016.out [19:13:46] HaeB: because i did not see a mention on your readers report, so i wasn't sure if our numbers came only from opt-in [19:14:15] AndyRussG: the oldest backup we have for that file is feb 28 [19:14:34] The last time I know it was complete was... sometime around Feb. 27 this year. [19:14:38] That sounds right [19:14:51] 586 MB? [19:15:19] I may have truncated it shortly after I created it... [19:15:48] oof i have to restore the whole home dir into /var uhhh [19:16:11] not sure... (I found the culpable command in my bash history, but I don't know when exactly I executed it) [19:16:45] ergh, AndyRussG i'm going to have to ask alex K about this [19:17:14] 10Analytics, 13Patch-For-Review: Eventstreams graphite disk usage - https://phabricator.wikimedia.org/T160644#3122900 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi `eventstreams` is at 110G and cleaned up periodically, good enough for now [19:17:14] ottomata: K thanks so much!!!! [19:17:38] nuria: yeah, it's probably worth mentioning that as a caveat, will do that next time. (IIRC i checked a long time ago and it seemed small enough for practical purposes, but there's always the possibility of opt out rates changing for some reason. in that respect pageviews remain the most reliable measure of overall usage for both apps...) [19:18:01] HaeB: and pageviews do not send appinstallId? [19:18:43] they should be counted for all users, just like on the web [19:18:43] HaeB: if you have opt-ed out that is, i know opt-out prevents from sending EL data if I remember this right [19:18:58] no i mean the webrequest data [19:19:09] AndyRussG: email sent, we'll see what he says [19:19:27] HaeB: yes, but what i was wondering is that if you have opted out [19:19:37] ottomata: cool beans, thx again :) [19:19:42] HaeB: it makes sense that appinstallid is not sent with your pageview [19:20:16] i understand that's what is happening, yes [19:20:35] HaeB: this has no effect on pageview counting of course, but it has privacy implications, let me see [19:20:41] 10Analytics: Provide historical redirect information in Data Lake edit data - https://phabricator.wikimedia.org/T161146#3122912 (10Neil_P._Quinn_WMF) [19:22:00] 10Analytics: Provide historical redirect flag in Data Lake edit data - https://phabricator.wikimedia.org/T161146#3122925 (10Neil_P._Quinn_WMF) [19:24:01] running home, back in a bit [19:24:05] milimetric: if you are so kind to CR changes for editor engagement dashboard we can call that done, i have updated dashboard on labs and it is all good [19:24:52] 10Analytics: Provide cumulative edit count in Data Lake edit data - https://phabricator.wikimedia.org/T161147#3122928 (10Neil_P._Quinn_WMF) [19:25:26] (03CR) 10Milimetric: [C: 04-1] "I'd like to keep this here because otherwise it's very hard to know what piwik ID to use when deploying the vital signs dashboard. If it'" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344167 (owner: 10Nuria) [19:26:04] (03CR) 10Milimetric: [V: 032 C: 032] Upgrading config for editor-engagement dashboard [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344178 (owner: 10Nuria) [19:28:12] (03CR) 10Milimetric: [C: 04-1] Adding renamed tables to sql union statements (031 comment) [analytics/limn-flow-data] - 10https://gerrit.wikimedia.org/r/344055 (https://phabricator.wikimedia.org/T160454) (owner: 10Nuria) [19:28:35] (03CR) 10Milimetric: [C: 04-1] Adding renamed tables to sql union statements (031 comment) [analytics/limn-flow-data] - 10https://gerrit.wikimedia.org/r/344055 (https://phabricator.wikimedia.org/T160454) (owner: 10Nuria) [19:29:13] nuria: the ee dashboard is good, I +2 it, but I don't want to remove vital signs config unless we can put the piwik id somewhere else [19:29:16] we talked about that before [19:37:20] 10Analytics: Provide edit tags in the Data Lake edit data - https://phabricator.wikimedia.org/T161149#3122965 (10Neil_P._Quinn_WMF) [19:39:46] (03CR) 10Nuria: "But note that that config refers to a non-existing site on labs, how to build analytics.wikimedia.org (piwik included) is documented here:" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344167 (owner: 10Nuria) [19:40:52] (03CR) 10Milimetric: [V: 032 C: 032] "that's fine, I find that harder to remember than the central config, but ok" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/344167 (owner: 10Nuria) [19:41:24] 10Analytics: Use native timestamp types in Data Lake edit data - https://phabricator.wikimedia.org/T161150#3122979 (10Neil_P._Quinn_WMF) [19:45:19] 06Analytics-Kanban, 10Pageviews-API, 13Patch-For-Review: Monthly aggregate endpoint returns unexpected results and invalid timestamp - https://phabricator.wikimedia.org/T156312#3123018 (10Nuria) 05Open>03Resolved [19:46:22] 06Analytics-Kanban, 13Patch-For-Review: Productionise standard metrics from mediawiki denormalized history - https://phabricator.wikimedia.org/T160151#3123019 (10Nuria) Let's make sure the dataset with these standard metrics is documented [19:46:43] 06Analytics-Kanban: Productionize Edit History Reconstruction and Extraction - https://phabricator.wikimedia.org/T152035#3123021 (10Nuria) [19:46:45] 06Analytics-Kanban, 13Patch-For-Review: Create cron job in puppet sqooping prod and labs DBs - https://phabricator.wikimedia.org/T160083#3123020 (10Nuria) 05Open>03Resolved [19:47:06] 06Analytics-Kanban: Enable Pageviews API for test.wikipedia.org - https://phabricator.wikimedia.org/T160484#3123022 (10Nuria) 05Open>03Resolved [19:47:32] 06Analytics-Kanban: Productionize Edit History Reconstruction and Extraction - https://phabricator.wikimedia.org/T152035#2836132 (10Nuria) [19:47:34] 06Analytics-Kanban: Load edit history data into Druid - https://phabricator.wikimedia.org/T131786#3123026 (10Nuria) [19:47:36] 06Analytics-Kanban, 13Patch-For-Review: Productionize loading of edit data into Druid (contingent on success of research spike) - https://phabricator.wikimedia.org/T141473#3123024 (10Nuria) 05Open>03Resolved [19:47:52] 06Analytics-Kanban, 13Patch-For-Review: Add hive table that maps wikiCode to projectName - https://phabricator.wikimedia.org/T158330#3123027 (10Nuria) 05Open>03Resolved [19:48:05] 06Analytics-Kanban, 10Wikimedia-Stream: EventStreams Blog Post - https://phabricator.wikimedia.org/T160080#3123028 (10Nuria) 05Open>03Resolved [19:48:08] 06Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 06Services (watching), 15User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#3123029 (10Nuria) [19:48:27] 06Analytics-Kanban, 06Operations, 06Performance-Team, 06Reading-Admin, 10Traffic: Preliminary Design document for A/B testing - https://phabricator.wikimedia.org/T143694#3123030 (10Nuria) 05Open>03Resolved [19:48:31] 10Analytics, 06Operations, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#3123031 (10Nuria) [19:49:20] 06Analytics-Kanban: Productionize Edit History Reconstruction and Extraction - https://phabricator.wikimedia.org/T152035#3123036 (10Nuria) [19:49:23] 06Analytics-Kanban, 13Patch-For-Review: Create hive tables and queries for metrics computation out of mediawiki denormalized history - https://phabricator.wikimedia.org/T160155#3123035 (10Nuria) 05Open>03Resolved [19:49:42] 06Analytics-Kanban: Productionize Edit History Reconstruction and Extraction - https://phabricator.wikimedia.org/T152035#2836132 (10Nuria) [19:49:44] 06Analytics-Kanban, 13Patch-For-Review: Provide 2 static files to differenciate prod and labs projects to sqoop in - https://phabricator.wikimedia.org/T160153#3123037 (10Nuria) 05Open>03Resolved [19:49:58] 06Analytics-Kanban: Productionize Edit History Reconstruction and Extraction - https://phabricator.wikimedia.org/T152035#2836132 (10Nuria) [19:50:00] 06Analytics-Kanban, 13Patch-For-Review: Create oozie job for mediawiki edit history job - https://phabricator.wikimedia.org/T160074#3123039 (10Nuria) 05Open>03Resolved [19:50:22] 06Analytics-Kanban: Productionize Edit History Reconstruction and Extraction - https://phabricator.wikimedia.org/T152035#2836132 (10Nuria) [19:50:24] 06Analytics-Kanban: Scale MySQL edit history reconstruction data extraction - https://phabricator.wikimedia.org/T134791#3123043 (10Nuria) [19:50:26] 06Analytics-Kanban, 13Patch-For-Review: Extract edit history denormalized data from intermediate data - https://phabricator.wikimedia.org/T144717#3123041 (10Nuria) 05Open>03Resolved [19:50:53] 06Analytics-Kanban, 13Patch-For-Review: Add desktop only tab for browser reports on analytics.wikimedia.org - https://phabricator.wikimedia.org/T160642#3106427 (10Nuria) 05Open>03Resolved [20:00:49] HaeB: on your report I think you are mentioning 2013 when you mean 2015 in "May 2013 (the [20:00:49] earliest time for which we have data according to the current pageview definition):" [20:03:48] nuria: no, that's correct. it's referring to the earlier R implementation of the new definition (remember that we went through some work to assess the differences to the current implementation - it's paying off here and in other places ;) [20:06:00] HaeB: it is based on sample data thus significantly different but ok, your call. [20:08:34] we discussed this extensively in late 2015 ;) the sampling error was examined too (even with a direct comparison for the timespan where both implementations overlapped, IIRC) [20:15:18] HaeB: as i said, your call. Regarding app pageviews only 1/1000 is sending an app install id (seems to me from my brief queries on 1 hour of app traffic) thus the number of uniques on IOS and Android must be real far off from the actual one, right? [20:16:27] 06Analytics-Kanban, 13Patch-For-Review: Populate aqs with legacy page-counts - https://phabricator.wikimedia.org/T156388#3123115 (10Nuria) [20:20:09] Hey folks. Can you get the monthly pageviews for all pages in a wiki in any convenient way. [20:20:26] ^? [20:22:49] Hi halfak [20:23:00] halfak: easiest is through hive or spark [20:23:37] if you want a list I mean halfak - API style, AQS serves the info but throughput is not the same [20:24:04] 06Analytics-Kanban, 10Pageviews-API: Pageviews missing for article that received on-wiki edits - https://phabricator.wikimedia.org/T158681#3123154 (10Nuria) Looking at history for this page it was a draft until 16 february 2017. And we have pageviews for the draft page. See: https://wikimedia.org/api/rest_v1... [20:24:15] 06Analytics-Kanban, 10Pageviews-API: Pageviews missing for article that received on-wiki edits - https://phabricator.wikimedia.org/T158681#3123155 (10Nuria) a:05Milimetric>03Nuria [20:26:28] * joal go back to family life [20:27:01] halfak: agreggated? pageview api has that data [20:27:31] nuria, cool! Last time I looked (a year ago?) it did not do aggregations at that level. [20:28:04] joal, yeah, I'm looking for a dump of *all the pageviews for all the wikis* [20:28:54] halfak: https://wikimedia.org/api/rest_v1/metrics/pageviews/aggregate/all-projects/all-access/all-agents/monthly/2015100100/2016103000 [20:29:19] halfak: i see, so these are just "agreggated" pageview NUMBERS but you want tittles [20:29:24] halfak: correct? [20:29:52] nuria, yeah, that's right. [20:31:21] ok, then , answer would be spark/hive data is on pageview_hourly in a way that it will not be hard to get what you want [20:31:47] Damn. Asking for a volunteer. :\ [20:32:36] Maybe I could make a dump for him [20:32:50] Or maybe we could work from the clickstream datasets [20:33:38] halfak: he can also query pageview api on loop [20:33:45] halfak: that is whatthrottling is all about [20:34:00] halfak: ah no, wait, cause he would need a list of titles [20:34:00] We can certainly load test the API :) [20:34:19] halfak: scratch that it wouldn't work [20:34:32] halfak: because you do not need the titles before hand [20:34:46] We could query the wiki and do a lookup for every title. [20:34:59] But I need view rates for a all the wiki-titles [20:35:09] halfak: i am not worried about load, after the rebuild it is one order of magnitude below what we can sustain [20:35:27] OK we could try and see if throughput is high enough [20:35:41] halfak: ya, you will miss all renames and such so depends how precise that needs to be [20:36:09] halfak: it might be easy to parse files in this case [20:38:51] Yeah. That's what I was afraid of. I have a few things coming up that will benefit from "page view rate" signal and parsing file & handling redirects is gross [20:38:56] Oh yeah and renames :( [20:40:51] halfak: this wiki, page, view_count, date could be public data though [20:41:08] halfak: no reason for it not to be on a db somewhere on labs [20:42:00] nuria, +1. I'll come back some time soon to look into what aggregation and publishing might look like. This sounds like an oozie thing. [20:42:30] Renames seem like a good use of history reconstruction :) [20:54:01] nuria: (1/1000) that's surprising; what's the exact query you used? would love to look into this a bit more myself [20:54:58] nuria: for comparison, this is the drop we saw when the ios app switched to opt-in https://phabricator.wikimedia.org/T130432 [20:55:25] 10Analytics, 10EventBus, 05MW-1.29-release (WMF-deploy-2017-03-28_(1.29.0-wmf.18)), 06Services (done): Page properties-change event is rejected if page was deleted - https://phabricator.wikimedia.org/T158702#3123239 (10Pchelolo) Patch merged. Moving to 'done' until it gets deployed. [20:57:36] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: EQIAD: stat1002 replacement - https://phabricator.wikimedia.org/T159838#3123246 (10Halfak) To be clear, it's likely that using AMD/opencl could involve "a ton of extra effort", but I don't see a good alternative given what @MoritzMuehlenh... [21:00:52] 06Analytics-Kanban, 06Editing-Analysis, 13Patch-For-Review: editor-engagement dashboard on edit-analysis stopped updating on ~ 2017-02-21 - https://phabricator.wikimedia.org/T160807#3123248 (10Jdforrester-WMF) 05Open>03Resolved >>! In T160807#3122815, @Nuria wrote: > Dasboards have now data until today:... [21:01:14] a-team: who wants access to wmf-deployments to help with deploying mediawiki code when needed? I'm filing a task now, ottomata? nuria? [21:02:11] i probably should! [21:03:28] k, adding you [21:03:36] https://phabricator.wikimedia.org/T161157 [21:31:33] nuria: i took a quick look myself; it appears the apps send the id in two different ways: in the header and as query parameter [21:32:32] ...with that, i'm counting 29% of app requests having an id (of those taken into account by the uniques oozie job, which may differ a bit from pageviews) [21:33:20] ..i also think there may a bug in the uniques job in that it may now be counting some (opted-in) ios apps towards the android count :-o [21:33:38] ...need to work on something else now, but will write this up later on phabricator [21:34:08] (29% is still too low, but larger than 0.1% ;) [21:34:27] (oh and that's for both apps together) [22:54:45] 10Analytics-Tech-community-metrics: Updated data in mediawiki-identities DB not deployed onto wikimedia.biterg.io? - https://phabricator.wikimedia.org/T157898#3123683 (10Aklapper) Thanks! Now I only need to see the content of mediawiki-identities also reflected in the frontend. :P /me crosses fingers [23:04:31] Anyone around who understand the MW History beta data in Pivot? Was trying to answer the e-mail halfak sent to analytics-l, but the numbers don't seem real and I can't work out why. [23:07:47] bearloga: joining the meeting? [23:08:30] Specifically, https://tinyurl.com/muo6dgw (content namespace unique 1+ revision non-bot logged-in editors in October 2016) gives 117.2k; dropping the content namespace filter balloons that to 203k, but skip forward to November and the numbers are 71k/438k and December is 50k/254k, which makes me suspect that I'm doing something wrong. :-) [23:08:39] bearloga: (we're in the hangout) [23:44:10] HaeB: crap crackers! I'm so sorry; I somehow missed the meeting when looking at the calendar for today and then had to go pick my partner up from work [23:44:48] bearloga: we're still in it ;) [23:48:45] 10Analytics, 10ChangeProp, 10Citoid, 10ContentTranslation, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#3123813 (10Pchelolo) Vagrant was updated to node 6 as well.