[08:33:57] !log rerun wikidata-specialentitydata_metrics-wf-2018-4-1 [08:34:00] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:20:10] (03Abandoned) 10Amire80: [WIP] Interlanguage links SQL [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/362877 (https://phabricator.wikimedia.org/T158835) (owner: 10Amire80) [11:25:09] !log Repair cu_changes hive table afer succesfull sqoop import and add _PARTITIONED file for oozie jobs to launch [11:25:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:55:06] o/ hi all, sorry am on a bit late, did an early pottery session and had to finish and clean up! (am still at studio) [14:01:30] ottomata: hey, just a heads up, doing the nfs switchover from dataset1001 to labstore1006/7 today, starting with instances in a few, will be ready for stat in an hour or two if things go well. I have a plan here https://etherpad.wikimedia.org/p/dumps-migration (look at Part 2: stat*) and all the gerrit patches are linked from there [14:01:55] ok great, thanks madhuvishy! [14:04:03] ottomata: cool :) also could you look at T189283 when you get a chance? thanks! [14:04:03] T189283: Replace cron jobs from EZachte's home directory on stat1005 with rsync fetches - https://phabricator.wikimedia.org/T189283 [14:40:10] hey all! I'm going to try to bike home real fast before standup, i think i can make it but i might be a couple of minutes late [14:50:20] joal: btw the load job that adds partitioned flags for cu_changes is merged, but I didn't want to deploy on Monday. I'll be gone tomorrow but should I deploy it now so we don't forget? [15:00:05] a-team: standduppp [15:19:24] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4097274 (10Pchelolo) While resolving the cirrus search issues the next bulk of jobs can be switched. Here's what I propose:... [15:23:25] 10Analytics, 10Analytics-Wikistats: Adding ranks to the map tooltip - https://phabricator.wikimedia.org/T191141#4095313 (10Milimetric) p:05Triage>03Low [15:25:39] 10Analytics, 10Analytics-Wikistats, 10Patch-For-Review: Page views and Country name table columns overlapping in the Page Views By Country metric on Dashboard - https://phabricator.wikimedia.org/T191121#4094664 (10Milimetric) p:05Triage>03High [15:31:19] 10Analytics, 10EventBus, 10Wikimedia-Stream: EventStreams consumer backpressure for slow HTTP clients - https://phabricator.wikimedia.org/T191207#4097302 (10Milimetric) p:05Triage>03High [15:36:28] 10Analytics: Varnishkafka does not play well with varnish 5.2 - https://phabricator.wikimedia.org/T177647#3665564 (10Milimetric) p:05Triage>03Normal [15:36:30] 10Analytics: Varnishkafka does not play well with varnish 5.2 - https://phabricator.wikimedia.org/T177647#3665564 (10Milimetric) p:05Triage>03Normal [15:37:23] 10Analytics, 10Analytics-Cluster: hdfs password file for mysql should be re-generated when the password file is changed by puppet - https://phabricator.wikimedia.org/T170162#3421167 (10Milimetric) p:05Triage>03Normal [15:38:02] 10Analytics, 10Analytics-Cluster: hdfs password file for mysql should be re-generated when the password file is changed by puppet - https://phabricator.wikimedia.org/T170162#3421167 (10Milimetric) p:05Normal>03Low [15:38:58] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add the prometheus jmx exporter to all the Zookeeper daemons - https://phabricator.wikimedia.org/T177460#3659754 (10Milimetric) p:05Triage>03High [15:39:09] 10Analytics: Give +w permission for users in /srv folder in SWAP Machines - https://phabricator.wikimedia.org/T176093#4097416 (10Ottomata) 05Open>03Resolved a:03Ottomata On new notebook hosts, /home is actually in /srv, so you should have more space to write in your home dirs. [15:40:26] 10Analytics, 10EventBus, 10Wikimedia-Stream: Hits from private AbuseFilters aren't in the stream - https://phabricator.wikimedia.org/T175438#4097421 (10Milimetric) ping @Catrope: this seems normal, right, these changes wouldn't show up in Recent Changes? @Nirmos, do you see this showing up on the Recent Cha... [15:40:56] 10Analytics, 10EventBus, 10Wikimedia-Stream: Hits from private AbuseFilters aren't in the stream - https://phabricator.wikimedia.org/T175438#3593335 (10Milimetric) p:05Triage>03Low [15:42:42] 10Analytics, 10Android-app-feature-Compilations-v1, 10Reading-Infrastructure-Team-Backlog: Track zim file downloads - https://phabricator.wikimedia.org/T171117#3454614 (10Milimetric) @JMinor we're just putting this in radar for now, it looks like the project is stalled, let us know otherwise. [15:43:47] 10Analytics, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice: Make banner impression counts available somewhere public - https://phabricator.wikimedia.org/T115042#4097441 (10Milimetric) p:05Normal>03Triage [15:44:48] 10Analytics: Edit analysis dashboard Failures by User Type chart does not update correctly - https://phabricator.wikimedia.org/T148656#4097444 (10Milimetric) p:05Normal>03Triage [15:45:48] 10Analytics, 10EventBus, 10Wikimedia-Stream, 10service-template-node, 10Services (watching): Tests for swagger spec stream routes in EventStreams - https://phabricator.wikimedia.org/T150439#4097448 (10Milimetric) 05Open>03declined the automated testing doesn't work easily with this, it's too much wor... [15:45:51] 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 10Services (watching), 10User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#4097450 (10Milimetric) [15:46:16] 10Analytics, 10Performance-Team (Radar): Eventlogging client needs to support offline events - https://phabricator.wikimedia.org/T162308#4097451 (10Milimetric) p:05Normal>03Triage [15:46:48] 10Analytics, 10Wikimedia-Stream, 10Patch-For-Review: Create /v2/schema/:schema_uri endpoint for eventstreams that proxies schemas from eventbus - https://phabricator.wikimedia.org/T160748#4097456 (10Milimetric) p:05Normal>03Low [15:47:16] 10Analytics, 10Wikimedia-Stream, 10Patch-For-Review: Create /v2/schema/:schema_uri endpoint for eventstreams that proxies schemas from eventbus - https://phabricator.wikimedia.org/T160748#3109641 (10Milimetric) p:05Low>03Triage [15:48:05] 10Analytics, 10EventBus, 10Patch-For-Review, 10Services (watching): Check eventbus Kafka cluster settings for reliability - https://phabricator.wikimedia.org/T144637#4097463 (10Milimetric) 05Open>03Resolved a:03Milimetric reopen if necessary [15:50:21] 10Analytics, 10Operations, 10Traffic, 10User-Elukey, 10Varnish: Sort out analytics service dependency issues for cp* cache hosts - https://phabricator.wikimedia.org/T128374#4097473 (10Milimetric) p:05Normal>03Triage [15:51:03] 10Analytics, 10Operations, 10Traffic, 10User-Elukey, 10Varnish: Sort out analytics service dependency issues for cp* cache hosts - https://phabricator.wikimedia.org/T128374#2072088 (10Milimetric) p:05Triage>03Low [15:51:28] 10Analytics, 10Analytics-Dashiki: Have dashiki read and write GET params to pass stateful versions of dashboard pages {crow} - https://phabricator.wikimedia.org/T119996#4097485 (10Milimetric) p:05Low>03Triage [15:52:45] 10Analytics: Investigate lowering "per-article" resolution data in AQS - https://phabricator.wikimedia.org/T144837#4097489 (10Milimetric) p:05Low>03Triage [15:53:41] 10Analytics, 10Analytics-Dashiki: Provide filterable line graph for browser-family/browser-major - https://phabricator.wikimedia.org/T150713#4097496 (10Milimetric) p:05Low>03Triage [15:55:02] 10Analytics, 10Analytics-Dashiki: Provide filterable line graph for browser-family/browser-major - https://phabricator.wikimedia.org/T150713#2794164 (10Milimetric) p:05Triage>03Normal [15:55:21] 10Analytics: Investigate adding user-friendly testing functionality to Reportupdater - https://phabricator.wikimedia.org/T156523#4097500 (10Milimetric) p:05Low>03Triage [15:56:17] 10Analytics, 10ChangeProp, 10EventBus, 10Services (later): Support per-topic configuration in EventBus service - https://phabricator.wikimedia.org/T157092#4097509 (10Milimetric) p:05Low>03Triage [15:57:00] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: cdh::hadoop::directory (and other hdfs puppet command?) should quickly check if namenode is active before executing - https://phabricator.wikimedia.org/T130832#4097512 (10Milimetric) 05Open>03Resolved a:03Milimetric Andrew made a good enough patch f... [16:01:05] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: cdh::hadoop::directory (and other hdfs puppet command?) should quickly check if namenode is active before executing - https://phabricator.wikimedia.org/T130832#4097529 (10Milimetric) p:05Low>03High [16:01:36] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: cdh::hadoop::directory (and other hdfs puppet command?) should quickly check if namenode is active before executing - https://phabricator.wikimedia.org/T130832#2147910 (10Milimetric) 05Resolved>03Open [16:01:52] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: cdh::hadoop::directory (and other hdfs puppet command?) should quickly check if namenode is active before executing - https://phabricator.wikimedia.org/T130832#2147910 (10Milimetric) 05Open>03Resolved [16:02:20] 10Analytics, 10Analytics-Cluster: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. {pika} - https://phabricator.wikimedia.org/T89397#4097543 (10Milimetric) p:05Low>03High [16:02:37] 10Analytics, 10Analytics-Cluster: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. {pika} - https://phabricator.wikimedia.org/T89397#1035278 (10Milimetric) p:05High>03Low [16:03:50] 10Analytics, 10EventBus, 10Operations, 10User-Elukey: Eventbus does not handle gracefully changes in DNS recursors - https://phabricator.wikimedia.org/T171048#4097549 (10Milimetric) p:05Low>03Triage [16:04:04] 10Analytics, 10EventBus, 10Operations, 10User-Elukey: Eventbus does not handle gracefully changes in DNS recursors - https://phabricator.wikimedia.org/T171048#3452124 (10Milimetric) p:05Triage>03Low [16:19:05] 10Analytics, 10Analytics-Cluster: Requesting account expiration extension - https://phabricator.wikimedia.org/T183291#4097603 (10Jdcc-berkman) OK. I do think we will be able to collaborate on this - the timing just hasn't worked out yet. Can we extend for a few more months? [16:26:30] 10Analytics, 10Analytics-Cluster: Requesting account expiration extension - https://phabricator.wikimedia.org/T183291#4097627 (10Nuria) Let's maintain access until the end of June. Thank you. [16:28:34] 10Analytics, 10Data-release, 10Research, 10Privacy: An expert panel to produce recommendations on open data sharing for public good - https://phabricator.wikimedia.org/T189339#4097633 (10Nuria) I think @Ijon is real interested on this too. [16:32:04] 10Analytics, 10Analytics-Wikistats, 10Easy, 10Patch-For-Review: [Wikistats2] The detail page for tops and maps metrics does not indicate time range - https://phabricator.wikimedia.org/T182990#4097636 (10Nuria) >'for February' makes more sense than 'on February', maybe? The copy used in the dashboard is "Co... [16:32:31] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Intervals/buckets for data arround pageviews per country in wikistats maps - https://phabricator.wikimedia.org/T188928#4097638 (10Nuria) a:03Milimetric [16:33:28] (03CR) 10Nuria: "Can you please add a bug number describing feature/issue." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/423336 (owner: 10Jonas Kress (WMDE)) [16:59:10] ottomata: getting close to being done with instances. could you take a look at the updated patches for rolling out on stat and +1? https://gerrit.wikimedia.org/r/#/c/420083/, https://gerrit.wikimedia.org/r/#/c/422892/, https://gerrit.wikimedia.org/r/#/c/422896/ [17:03:29] thank you! [17:07:13] (03CR) 10Smalyshev: [C: 031] Add wikidata tag to webrequest refine process [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/423064 (https://phabricator.wikimedia.org/T191022) (owner: 10Nuria) [17:07:58] (03PS1) 10Milimetric: Add interlanguage dashboard [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/423506 [17:08:09] (03CR) 10Milimetric: [V: 032 C: 032] Add interlanguage dashboard [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/423506 (owner: 10Milimetric) [17:14:59] 10Analytics, 10MobileFrontend, 10Readers-Web-Backlog, 10Performance-Team (Radar), 10Technical-Debt: Figure out XAnalytics stuff - https://phabricator.wikimedia.org/T190381#4097790 (10Jdlrobson) @MaxSem do you have an memories of this? [17:20:58] milimetric: you also would need hiera configs for this: https://gerrit.wikimedia.org/r/#/c/423506/ right? [17:21:44] (03CR) 10Nuria: [V: 032 C: 032] "Looks good, we need to add the hiera config for this domain." [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/423506 (owner: 10Milimetric) [17:22:00] 10Analytics, 10EventBus, 10Wikimedia-Stream: Hits from private AbuseFilters aren't in the stream - https://phabricator.wikimedia.org/T175438#4097800 (10Nirmos) Filter hits aren't in RecentChanges, regardless of whether the filter is private or not. [17:26:27] nuria: no, but we do need to have that pivot feature that Cindy was also asking for [17:26:44] nuria_: when it's a new domain we need Hiera, but this is just a subfolder of the language-reportcard one [17:27:02] milimetric: ah, i see [17:27:04] o/ Anyone know of an API call to get the total number of active users on a wiki? [17:27:30] I see there's the `allusers` API, but it seems to list the actual users, but I'd just like a total count. [17:27:31] dbrant: how do you define active? [17:27:37] dbrant: active users? [17:27:59] Accessed the site within 30 days [17:28:07] dbrant: as editors? [17:28:11] dbrant: or readers? [17:29:15] The `allusers` API has a flag called "activeusers", so whatever criteria it uses. [17:29:35] I'm guessing it's either readers or editors. [17:32:12] 10Analytics, 10Analytics-Dashiki: Add pivot parameter to tabular layout graphs - https://phabricator.wikimedia.org/T126279#4097833 (10Milimetric) p:05Normal>03High [17:33:08] dbrant: allusers api? [17:33:17] dbrant: I think we need a bit more context [17:33:38] dbrant: can you explain a little bit what api are you talking about? [17:33:49] dbrant: as readers would be hard, because we don't track that [17:34:06] (like, we don't track browsing for logged-in users by their user id) [17:34:45] as editors, making at least 1 edit, we do have that in the AQS api, but it's more # of users making at least 1 edit per month, as a timeseries, does that work? [17:34:53] dbrant: we count devices when we talk about "active" [17:35:04] dbrant: as in "devices that have accessed wikipedia last month" [17:35:51] dbrant: https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/editors, bottom left split by activity level and filter to 1..4 [17:36:17] oh, sorry, don't filter, the total is the number you're looking for [17:36:22] 'cause it's at *least* 1 [17:36:53] and if you want the API call, look in the console, that XHR will show you the URI you need [17:36:55] dbrant: i think if you give us a bit more context we can probably help you better [17:38:30] Right; so a bit more high-level context: In the Android app, when we show the user a list of possible Wiki languages to select, we'd like to sort them by 'popularity' or 'activity'. [17:38:54] dbrant: ahahahaha [17:39:09] dbrant: ok, got it [17:39:23] And here's the API that I've looked at: Here's the API I'm referring to: https://en.wikipedia.org/w/api.php?action=query&format=json&list=allusers&formatversion=2&aulimit=50&auactiveusers=1 [17:39:38] dbrant: language and project are not the same thing but Now i understand what you are looking fow [17:39:40] *for [17:40:30] dbrant: this api query is editor data [17:41:10] dbrant: ok, the data that most closely answers your question is Unique Devices data for mobile [17:41:25] dbrant: let me know if you can access http://pivot.wikimedia.org [17:41:58] yep! [17:43:39] dbrant: these are unique devices daily: https://pivot.wikimedia.org/#unique-devices-per-domain-daily [17:45:04] dbrant: popularity will be driven by usage, thus, there you see what you would expect : "more popular wikipedias - in terms of reading- have more unique devices" [17:46:20] dbrant: activity can be readers activity (usage) or editors activity, this second one is influenced by how big the site is so bigger sites get more edit activity (in terms of absolute numbers) [17:47:30] nuria_: Thanks, That's promising... [17:47:37] But is there a way to get the data programmatically, in a structured way? [17:47:55] e.g. How does WikiStats get the data for active users: https://wikistats.wmflabs.org/display.php?t=wp [17:48:01] dbrant: yes, there is an api [17:48:08] 10Analytics, 10MobileFrontend, 10Readers-Web-Backlog, 10Performance-Team (Radar), 10Technical-Debt: Figure out XAnalytics stuff - https://phabricator.wikimedia.org/T190381#4097902 (10MaxSem) No I don't. > Or: Consider a way for MF/ApiMobileView to communicate the title in a more interoperable way to Wik... [17:48:25] dbrant_: but data -in terms of pageview ranking of sites- does not change [17:48:41] dbrant_: much such you have to query it frequently to "do" a ranking [17:48:44] dbrant_: makes sense? [17:49:07] dbrant_: see https://wikitech.wikimedia.org/wiki/Analytics/AQS/Unique_Devices [17:50:16] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Goal, 10Services (doing): FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus - https://phabricator.wikimedia.org/T190327#4097921 (10mobrovac) [17:50:55] nuria_: I see; will look into that, then. Thanks! [17:51:25] dbrant_: wikistats data for say jp.wikipedia [17:51:54] dbrant_: https://stats.wikimedia.org/v2/#/ja.wikipedia.org [17:52:12] dbrant_: the link you passed along is just a labs tool with an unfortunate similar name [17:52:57] nuria_: Yes, but the labs tool is showing the exact kind of data that I'd like [17:53:14] dbrant_: mmm.. no, "users" are defined as registered users [17:53:28] dbrant_: someone sometime registering an account [17:53:38] dbrant_: 15 years ago for all you know [17:53:57] dbrant_: it does not define activity as "consumption" of wikipedia content [17:54:58] Ah yes indeed. The AQS rest service that you linked looks the most promising so far. [17:55:00] dbrant_: "active users" is also not "users" (in the web sense) rather is editors [17:55:58] dbrant_: you could also define activity including edits and say "active devices"/number-of-editors but i do not think in any case for your use case the "total number of registered accounts" is relevant [17:56:24] nuria_: yep, agreed. [17:58:41] dbrant_: ok, let us know if we can help you further [18:00:49] nuria_: cool! thanks again [18:01:07] milimetric: isn't this data a better fit for vital signs layout? [18:01:10] mihttps://language-reportcard.wmflabs.org/interlanguage/#desktop/interlanguage-navigation-source-project [18:01:17] sorry [18:01:19] milimetric: https://language-reportcard.wmflabs.org/interlanguage/#desktop/interlanguage-navigation-source-project [18:01:50] milimetric: ah, i think the way files are produced are not per wiki [18:02:50] milimetric: also data shown it is labeled from 2017-12-31? [18:04:48] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4098045 (10mobrovac) [18:04:58] nuria_: there's historical data we just merged in [18:05:21] nuria_: there's no way to output metrics-by-project compatible files from reportupdater Hive scripts [18:05:27] that's a task that got deprioritized [18:05:36] but if we do the pivot one, we don't need ot [18:05:38] *to [18:07:14] milimetric: wait, i think we are talking about two different things, the time interval of the data shown, now there are data points for 1 week and that's it [18:07:47] milimetric: sorry, let me explain better [18:08:09] 10Analytics, 10MobileFrontend, 10Readers-Web-Backlog, 10Performance-Team (Radar), 10Technical-Debt: Figure out XAnalytics stuff - https://phabricator.wikimedia.org/T190381#4098054 (10Krinkle) >>! In T190381#4097902, @MaxSem wrote: >> Or: Consider a way for MF/ApiMobileView to communicate the title in a m... [18:09:01] milimetric: data returned is from 2017-12 to 2018-03, yet the table is showing only data from 2017-12 [18:09:45] milimetric: see data : https://analytics.wikimedia.org/datasets/periodic/reports/metrics/interlanguage/percent_interlanguage_navigation_curr.tsv [18:10:02] milimetric: makes sense? [18:10:30] yes [18:11:01] nuria_: I mean, yes, it makes sense what you're saying but I think that's a feature of dashiki, looking to remember [18:11:12] "feature" haha, 'cause it has a filter so you can set the dates you want to see [18:11:22] but it's pretty bad for this kind of data [18:11:27] milimetric: mmm, no i do not think so , see browser reports [18:11:41] yeah, but they have fewer points per date [18:11:45] this has 300+ [18:12:13] milimetric: it is a bug [18:12:32] nuria_: nope, feature :) https://github.com/wikimedia/analytics-dashiki/blob/master/src/components/visualizers/table-timeseries/table-timeseries.js#L17 [18:12:46] limits to 200 rows [18:13:07] milimetric: it should show latest first though [18:13:21] yeah, that would make more sense [18:13:28] but it works the same in browser reports [18:13:43] milimetric: taht is the bug [18:13:51] milimetric: agreed [18:13:59] yeah, to me the whole thing's the bug [18:14:12] I just mean, we saw this at some point and didn't care and obviously nobody else ever looks at it either [18:14:17] milimetric: i mean the bug in both places is showing oldest data first [18:14:22] so I think this view in general must not be very useful [18:14:57] yeah, agreed, but I'm saying nobody looks at this view, because if they did and actually tried to use it, they'd complain right away [18:15:39] milimetric: agreed but if that si the case then we should not setup the language dashboards like this by default, right? [18:17:44] nuria_: oh sorry, I see what you mean, no these are just a placeholder until we do the pivot thing in a few weeks. I can fix up the table view then as well [18:17:54] milimetric: we can talk in batcave if you want [18:18:29] I'm there nuria_ [18:18:36] ok [18:19:02] milimetric: batcave? [18:19:07] I'm there, I see you [18:32:09] ottomata: /mnt/data is moved over :) [18:34:49] nice! [18:34:50] awesome [18:39:22] 10Analytics, 10MobileFrontend, 10Readers-Web-Backlog, 10Performance-Team (Radar), 10Technical-Debt: Figure out XAnalytics stuff - https://phabricator.wikimedia.org/T190381#4098150 (10Tgr) I'm not familiar with XAnalytics but it seems like b9d8ead6485 broke it; either it should have fixed the buffer logic... [18:46:03] ottomata: I have follow up questions on the rsync script thing, I'm gonna get a bit and then will ping back [18:49:48] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4098187 (10Pchelolo) [18:53:46] k [19:04:33] nuria_: one other thing: the python reportupdater pivoting won't work by itself for Amir's case 'cause he still has 300+ columns and the line graph wouldn't work for that [19:04:50] so the visualizer has to also do something else to deal with too many lines, like show only the top 20 [19:05:45] milimetric: right, amir's best layout is the vital signs one i think that is the learning, in the absnce of that however just ploting a subset might work [19:05:57] milimetric: we can talk to him and see [19:21:35] quick question about eventlogging server: when one of the fields in the schema definition is an enum, the server validates the received event data against the schema, right? so if the accepted values for field 'action' are 'a','b','c' but the instrumentation sends 'd' (same revision), those events are ignored due to invalidity, correct? [19:22:46] bearloga: i believe that is correct [19:35:00] ottomata: thanks! I thought that was the case but wanted to confirm with AE to be 100% sure :) [19:40:47] bearloga: i'm not 100% sure, but I'm pretty sure that's how json schema validation works [19:40:58] if that's how jsonschema works, then that's how eventlogging works too [19:44:49] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Services (doing): Add support for catch-all rule in ChangeProp - https://phabricator.wikimedia.org/T191238#4098324 (10Pchelolo) p:05Triage>03Normal [19:51:24] 10Analytics, 10Analytics-EventLogging, 10Performance-Team, 10Patch-For-Review: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#4098355 (10Imarlier) 05Open>03Resolved Coal has now been running for 4 days, and appears to be performing as expected. [19:54:28] ottomata: hey so your note about using the hardsync script and puppet class - I think to fetch from our end we already have a dumps::web::fetches::job generic class set up [19:54:36] this is already fetching things from stat1005 [19:54:59] oh [19:55:01] ok looking [19:55:05] in dumps::web::fetches::stats [19:55:10] ah ok [19:55:11] yes [19:55:14] that's fine, for the rsync part [19:55:41] my suggestion was mostly about making a default way for anybody to publish to dumps.wm.org via putting files in a dir on stat1005 (or other hosts) [19:55:47] if you know you'll only ever have one host that will do that [19:55:54] then you don't need this hardsync thing [19:56:07] right got it [19:56:15] that way you woudln' need to configure a dumps::web::fetches::job for each of ezachte's job outputs [19:56:21] we'd just make one dir on stat1005 [19:56:23] that youd' rsync over [19:56:27] to the right place [19:56:33] and then it would show up on dumps.wm.org [19:57:02] I see, that's cool. I'll look into that more. I think I want to have the basic thing done with the current set up first though [19:57:39] Is there some place in puppet I can may be add a /srv/dumps directory on stat1005 so that ezachte can then write to it? [20:00:27] madhuvishy: i think profile::statistics::private [20:00:30] is probably the best place [20:00:57] eventually this should probably in statistics::compute class [20:01:03] like the published_datasets thing (for thorium) is [20:01:19] but if you are only going to make it work for stat1005, then profile::statistics::private [20:03:29] ottomata: right, this is all only until these datasets are published from eric's jobs i think [20:05:55] 10Analytics, 10Operations, 10Research, 10Traffic, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3773297 (10gh87) I was using the Apple Safari 11.0.3 (13604.5.6). Then I enabled the "Develop" tab in the Menu, then right-clicked to open... [20:07:30] 10Analytics, 10Android-app-Bugs, 10Wikipedia-Android-App-Backlog: EventLogging sees MobileWikiAppFindInPage parsing errors - https://phabricator.wikimedia.org/T147196#4098410 (10Dbrant) 05Open>03Resolved [20:14:06] ottomata: https://gerrit.wikimedia.org/r/#/c/423533/ [20:17:00] 10Analytics, 10Operations, 10Research, 10Traffic, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#4098467 (10Tgr) Apparently it doesn't implement fallbacks as per spec (the last fallback value is `origin`). I don't think there's anythin... [20:41:45] ottomata: I can't launch `./sbin/start-thriftserver.sh` in yarn :( [20:43:33] chelsyx: k gimme a few... [20:45:24] ottomata: Thanks! [20:51:35] ottomata: one more for you, hopefully last :) https://gerrit.wikimedia.org/r/#/c/423540/ [20:51:50] no hurry on this [20:53:25] madhuvishy: i don't think you need that [20:53:32] oh? [20:53:33] srv is already an rsync module on the stat boxes [20:53:39] interesting [20:53:43] and i'm pretty sure you added labstore* to the list of 'statistics_servers' [20:53:43] so [20:53:46] ::srv/dumps/ ... [20:53:46] yeah [20:53:47] should work [20:53:49] awesome [20:54:03] I'll abandon this then [20:54:12] k [20:54:30] thanks a ton for all the things :) [20:55:15] !log deployed multi-instance mirrormaker for main -> jumbo. 4 per host == 12 total processes [20:55:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:56:43] chelsyx: oook [20:56:56] so first (dumb) q: did you source ./spark-env.sh [20:56:57] ? [20:57:10] yes [20:58:47] k so what's happening? [20:59:50] ottomata: so this is in beeline: [20:59:58] https://www.irccloud.com/pastebin/Ln0KTK2a/ [21:00:44] ohhh yes chelsyx ok that makes sense [21:00:48] in this case, it won't be localhost [21:00:51] hmm [21:00:54] chelsyx: actually [21:01:00] how are you starting thriftserver in yarn? [21:01:15] `./sbin/start-thriftserver.sh --files /etc/hive/conf/hive-site.xml --master yarn` [21:04:31] hmmm [21:04:38] yes ok ithink still makes sense [21:04:39] ok [21:04:51] so in yarn, the thriftserver process is actually going to be launched elsewhere [21:05:08] i had kinda thought that maybe with just --master yarn, you'd be in client mode, an the thriftserver would run in the spark master [21:05:10] but i guess not! [21:05:12] so, we have to find it [21:05:17] lemme try and see what the output says [21:06:58] ohh chelsyx hah, your process hasn't even launched [21:07:04] so, since that command backgrounds itself [21:07:11] you should tail -f the log file it gives you [21:07:12] so you get more inf [21:07:13] o [21:07:36] when i started mine, it said [21:07:38] starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/otto/spark-2.3.0-bin-hadoop2.6/logs/spark-otto-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-stat1005.out [21:07:41] so I did [21:07:46] tail -n 1000 -f /home/otto/spark-2.3.0-bin-hadoop2.6/logs/spark-otto-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-stat1005.out [21:07:49] to see what it says [21:08:05] ooooh [21:08:15] also, your yarn app is here: https://yarn.wikimedia.org/cluster/app/application_1521803817821_34093 [21:08:21] looking like it is having problems starting [21:08:23] we will see.. [21:13:08] hm interesting, from yarn logs I got [21:13:12] 2018-04-02 21:01:53 ERROR ApplicationMaster:70 - Failed to connect to driver at stat1005.eqiad.wmnet:43339, retrying ... [21:13:21] i got that by running yarn logs -applicationId application_1521803817821_34093 [21:16:00] ottomata: let me stop it and start it again [21:16:32] something is def weird, i haven't gotten it to work yet either [21:20:44] so after I create the ssh tunnel, in the log: [21:20:51] https://www.irccloud.com/pastebin/8YndFomA/ [21:21:06] yeah [21:23:03] ottomata: No rush for this. I can create a ticket for it if that's better for you? [21:23:27] hang on a few [21:23:31] trying... :) [21:35:07] chelsyx: i [21:35:15] i'm going to edit some things in your spark-2.3.0 dir [21:35:19] i know why it isnt' working [21:35:33] ottomata: Yay! Thanks!!! [21:36:14] ok chelsyx [21:36:32] the reason it wasn't working was because I had set (in spark-env.sh) SPARK_CONF_DIR to /etc/spark2/conf [21:36:46] which caused yarn jobs to upload the spark 2.1 dependencies we use with the spark2 package I built [21:36:56] instead of the 2.3 version that this uses [21:37:12] also, for yarn to get the hive settings, the file needs to be in the conf/ dir [21:37:12] so [21:37:26] i've done that, and modified your spark-env.sh file [21:37:30] so not set SPARK_CONF_DIR [21:37:33] this way it will use the ./conf/ one [21:37:36] also [21:37:40] to make things more transparent [21:37:50] i added a bin/thriftserver.sh script [21:37:53] that will not background the process [21:38:09] that way you can see what is happening (and just ctrl-c the thing when you aren't using it) [21:38:19] so, new instructions for launching thriftserver: [21:38:26] source ./spark-env.sh [21:38:40] ./bin/thriftserver.sh [--master yarn, or whatever else you want] [21:38:43] that should be it! [21:38:50] i don't thin you have to pass --files hive-site.xml anymore [21:39:15] try it, lemme know how it goes [21:39:20] cool! trying it [21:39:21] also, localhost:10000 will still work [21:39:23] for beeline [21:43:30] ottomata: so in beeline, `show databases`works, but then when I tried to execute a simple query `SELECT dt, referer, client_ip, user_agent_map FROM wmf.webrequest WHERE year=2018 and month=3 and day=20 and hour=1 LIMIT 5;`, i got `permission denied` [21:44:05] https://www.irccloud.com/pastebin/6StOIleo/ [21:46:01] hmmmmm [21:46:11] user=anonymous, interesting. [21:53:34] ah! [21:54:52] chelsyx: duh [21:54:56] when connecting with beeline [21:54:59] enter your username :p [21:55:01] no pw [21:55:14] aha! [21:59:42] ottomata: Yay! it works! [22:00:03] ottomata: Thank you so much! and sorry for bothering you about this for sooo long... [22:00:17] glad it works! [22:00:22] excited to hear how you use it [22:28:53] !log bounce mirror maker to pick up client_id config changes [22:28:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:48:34] 10Analytics: Mount dumps on SWAP machines (notebook1001.eqiad.wmnet / notebook1002.eqiad.wmnet) - https://phabricator.wikimedia.org/T176091#4098944 (10madhuvishy) You'd need to apply class https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/dataset_mount.pp, and add the servers to ht... [22:55:27] (03PS7) 10Nuria: Label map and top metrics with the month they belong to [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/423144 (https://phabricator.wikimedia.org/T182990) (owner: 10Amitjoki) [22:57:59] (03CR) 10Nuria: "I think this change might have a syntax error as it does not render." (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/423144 (https://phabricator.wikimedia.org/T182990) (owner: 10Amitjoki)