[00:46:37] 10Analytics, 10Analytics-Kanban: Install pyArrow in Cluster - https://phabricator.wikimedia.org/T202812 (10Ottomata) Alright, I spent a few hours trying to solve this for the distributed YARN case, and it is not easy! I'm not giving up yet, but I think the solution will be one of: - Download the pyarrow (0.8... [05:41:50] Morning A Team! [05:52:11] o/ [06:15:16] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Archiva (meitnerium) to Debian Stretch - https://phabricator.wikimedia.org/T192639 (10elukey) The current status is: * archiva.wikimedia.org is controlled by meitnerium via `letsencrypt::cert::integrated` * archiva-new.wikimedia.o... [06:23:21] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Archiva (meitnerium) to Debian Stretch - https://phabricator.wikimedia.org/T192639 (10elukey) >>! In T192639#4534751, @Gehel wrote: > * wdqs still needs to be validated by @Smalyshev Hi @Smalyshev, any news about wdqs? :) [06:24:27] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) [07:08:35] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Archiva (meitnerium) to Debian Stretch - https://phabricator.wikimedia.org/T192639 (10Smalyshev) Seems to work OK for archiva-new for me. [07:09:08] 10Analytics: Upgrade bohrium (piwik/matomo) to Debian Stretch - https://phabricator.wikimedia.org/T202962 (10elukey) p:05Triage>03Normal [07:12:51] 10Analytics, 10Analytics-Kanban: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642 (10elukey) [07:15:59] 10Analytics, 10Operations, 10vm-requests: eqiad (1) - VM request for Piwik/Matomo - https://phabricator.wikimedia.org/T202963 (10elukey) p:05Triage>03Normal [07:16:18] 10Analytics, 10Analytics-Kanban: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642 (10elukey) [07:16:20] 10Analytics: Upgrade bohrium (piwik/matomo) to Debian Stretch - https://phabricator.wikimedia.org/T202962 (10elukey) 05Open>03stalled VM requested in T202963 [07:24:02] 10Analytics, 10Operations, 10vm-requests: eqiad (1) - VM request for Piwik/Matomo - https://phabricator.wikimedia.org/T202963 (10elukey) [07:29:42] Hi elukey :) [07:29:48] Hi addshore [07:29:58] Morning :D [07:30:29] :) [07:30:52] elukey: I'm gonna start the deploy process now (merging everything that need etc) [07:31:17] joal: from everybody's test it seems that the new archiva is good, the only bit remaining is the let's encrypt certificate part [07:31:24] (I also reviewed it with Andrew) [07:31:35] so in theory we could even do the switch before the deploy [07:31:37] elukey: We can try and use it:) [07:31:50] 10Analytics, 10Analytics-Wikistats: Wikistats for Wikidata lists several bots as normal users - https://phabricator.wikimedia.org/T59379 (10Addshore) [07:31:57] 10Analytics, 10Analytics-Wikistats, 10Wikidata: Wikistats for Wikidata lists several bots as normal users - https://phabricator.wikimedia.org/T59379 (10Addshore) [07:32:32] but I'd need to wait Valentin (traffic) to validate my assumptions for switching the TLS cert before proceeding [07:32:47] not sure if you want to deploy now or if you can wait a couple of hours [07:32:55] elukey: depends on how many hours ;() [07:33:01] ;) [07:33:03] 10Analytics, 10Analytics-Wikistats, 10Wikidata, 10User-Addshore: Wikistats for Wikidata lists several bots as normal users - https://phabricator.wikimedia.org/T59379 (10Addshore) [07:33:36] in theory the best case scenario is Valentin logs in in an hour or so, tells me that I am not crazy, I merge two code reviews and archiva is done :) [07:33:53] if in 2h I have no updated I'll skip the upgrade for today [07:34:03] elukey: sounds god to me :) [07:34:06] +o [07:34:10] super [07:34:13] maaaan - need more coffee [07:34:50] 10Analytics, 10MediaWiki-API, 10Patch-For-Review: Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables - https://phabricator.wikimedia.org/T137321 (10Addshore) 05Open>03stalled Just giving this a poke 1 year on as it blocked {T174474} what's the status here? Marking as stalled until this is... [07:34:54] 10Analytics, 10Developer-Advocacy, 10MediaWiki-API, 10Reading-Admin, and 4 others: Metrics about the use of the Wikimedia web APIs - https://phabricator.wikimedia.org/T102079 (10Addshore) [07:35:01] 10Analytics, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: ApiAction log in data lake doesn't record Wikibase API actions - https://phabricator.wikimedia.org/T174474 (10Addshore) 05Open>03stalled Stalled, blocked on {T137321} [07:35:18] 10Analytics, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10wikidata-tech-focus: ApiAction log in data lake doesn't record Wikibase API actions - https://phabricator.wikimedia.org/T174474 (10Addshore) [07:35:51] 10Analytics, 10MediaWiki-API, 10Patch-For-Review, 10User-Addshore: Run ETL for wmf_raw.ActionApi into wmf.action_* aggregate tables - https://phabricator.wikimedia.org/T137321 (10Addshore) [07:37:19] joal: it is early [07:37:31] I was up at 6am today to maybe see the sunrise but it is all cloudy :( [07:37:53] meeh addshore - no justice [07:38:48] addshore: About having wikidata on HDFS - Would a parse of the JSON dumps onto parquet a good solution ? [07:39:01] hahahaa, I was just about to sk you about this again :D [07:39:22] So, I don't know what would be a good idea, my hadoop foo is not as amazing as yours [07:39:23] I try to make a habit of not reading minds too much, but sometimes fail ;) [07:39:28] what would the alternatives be? [07:40:00] I was wondering if we would be able to easily answer questions like https://phabricator.wikimedia.org/T202894 with this data in hadoop, rather than scaning the dumps [07:40:19] addshore: the json -> parquet job already exists (needs some love cause there probably have been changes), and that means ~monthly updates [07:40:52] addshore: easy indeed [07:42:19] so, what does the parquet solution not allow us to do? :) [07:42:36] or is that more of a question that we will find the answer to as we throw problems at it? [07:43:15] addshore: parquet allows for reasonably easy querying, either by entity or by graph (need to load the data as graph, tutorial to be provided) [07:43:15] I guess the parquet job will allows to trigger other jobs after it doing extra analysis too? [07:43:48] Main issue is availability of data (depending on dumps) [07:43:59] hmm, how do you mean> [07:44:00] ? [07:44:14] I guess this would only be of current revisions of everything? [07:44:56] We'd have full history, but updated through XML-dumps, therefore monthly [07:45:13] gotcha, dont want to use the json dumps? [07:45:32] Oh ! Very true - current revision you're absolutely right [07:45:40] the json dumps only provide current revision only though [07:45:44] yeah [07:45:45] it would be super awesome to have all revisions [07:46:08] addshore: we'll have them, through XML-dumps (much more difficult parsing I think) [07:46:29] The JSON provided in the XML dumps text field and the JSON in the JSON dumps can differ, especialy for the older revisions [07:46:42] mwarf [07:46:53] and when we start looking at the lexeme entities there is currently no guarantee that the raw text json will remain the same [07:47:14] That's what I expected - We can try reading the JSON historically, but could be cumbersome [07:47:43] yup, it would be much better to grab the json dumps [07:48:22] This would mean: monthly updates, data written in parquet queryiable through Hive and Spark, and graph oriented quering/analysis using spark [07:48:49] we also have incremental json dumps [07:49:07] hm - Tell me more :) [07:50:43] well, full json dumps are weekly https://dumps.wikimedia.org/wikidatawiki/entities/ [07:51:28] maybe I was pretending we have incremental json dumps actually [07:51:32] https://phabricator.wikimedia.org/T72246 [07:52:14] addshore: also seems we have "truthy" and "all" alternate every week [07:53:38] yup [07:54:11] json dumps don't currently include the revisionId, I guess that could be usefull..... https://phabricator.wikimedia.org/T192715 [07:54:26] addshore: we could have 2 datasets, one for truthy, the other for all, but I wouldn't mix them :) [07:54:58] addshore: For linking dataset from json-dumps to another one (for instance historical xml-dumps), it would ! [07:59:36] okay, :) [07:59:47] well, current revisions only would be an awesome start :) [08:00:08] I guess once old revisions are loaded we don't actually have to load them again and again and again? or? [08:00:30] addshore: unfortunately we have no good way not to reload them [08:00:59] addshore: as of today, we plan on monthly dumping full history onto hdfs [08:01:33] >addshore: unfortunately we have no good way not to reload them [08:01:52] addshore: We also have ideas about more efficient was of having updated data, but it will still involve regular full reload [08:04:40] ack [08:04:54] when do i get to write my first job against the parquet data then? :D [08:05:48] addshore: demo data is already available :) [08:06:32] And you can have a look at https://wikitech.wikimedia.org/wiki/User:Joal/Wikidata_Graph#Playing_with_GraphFrames_-_v1 [08:07:59] addshore: Was actually playing with it - For the 2018-01-08 json dumps: we had 42336942 entities [08:08:30] i may have a play today :) and see if I can answer the question in the ticket i linked above [08:09:11] addshore: sounds a fun idea :) [08:14:31] !log Restart virtualpageview-hourly-wf-2018-8-27-21 [08:14:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:19:51] (03PS4) 10Joal: Update MediawikiHistoryChecker adding reduced [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/441378 (https://phabricator.wikimedia.org/T192481) [08:20:21] (03CR) 10Joal: [C: 032] "Merging after rebase." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/441378 (https://phabricator.wikimedia.org/T192481) (owner: 10Joal) [08:41:48] (03PS1) 10Joal: Update changelog.md for v0.0.71 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/455774 [08:41:57] elukey: --^ please :P) [08:48:42] (03CR) 10Elukey: [C: 031] Update changelog.md for v0.0.71 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/455774 (owner: 10Joal) [08:49:47] still waiting for archiva, can you hold off a bit? [08:59:50] elukey: I'm on hold :) [09:19:44] joal: I still didn't get any ping for archiva, I don't want to delay you any further, please go ahead if you want [09:20:22] elukey: as you wish - I'm in no real rush, but still would like to be able to deploy today :) [09:20:35] elukey: Do you want me to wait for another hour? [09:21:36] joal: I have no idea when Valentin will come online, so it might be this afternoon.. not sure what work plans you made, so if you prefer to proceed before lunch please go ahead, we'll test archiva during the next round of deployment [09:22:06] ok elukey - Starting Jenkins then [09:22:29] (03CR) 10Joal: [C: 032] Update changelog.md for v0.0.71 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/455774 (owner: 10Joal) [09:27:49] (03CR) 10Joal: [V: 032 C: 032] Update changelog.md for v0.0.71 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/455774 (owner: 10Joal) [09:29:10] 10Analytics, 10Analytics-Wikistats, 10Wikidata, 10User-Addshore: Wikistats for Wikidata lists several bots as normal users - https://phabricator.wikimedia.org/T59379 (10ezachte) Less than before. I am the only maintainer of Wikistats 1 and since July I work 30 hours per month, on my own request. Wikistats... [09:46:19] 10Analytics, 10Analytics-Wikistats, 10Wikidata, 10User-Addshore: Wikistats for Wikidata lists several bots as normal users - https://phabricator.wikimedia.org/T59379 (10Addshore) p:05Normal>03Low [09:47:23] a-team: Another interesting read - https://tamino.wordpress.com/2018/08/08/usa-temperature-can-i-sucker-you [10:22:38] (03PS14) 10Joal: Add validation step in mediawiki-history jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/440005 (https://phabricator.wikimedia.org/T192481) [10:22:49] !log Refinery-source v0.0.71 deployed onto archiva [10:22:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:23:08] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/440005 (https://phabricator.wikimedia.org/T192481) (owner: 10Joal) [10:26:06] (03PS4) 10Joal: Add check to mw-history-reduced druid indexation [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445373 (https://phabricator.wikimedia.org/T192483) [10:30:00] (03CR) 10Joal: "> Patch Set 3:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445373 (https://phabricator.wikimedia.org/T192483) (owner: 10Joal) [10:30:19] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445373 (https://phabricator.wikimedia.org/T192483) (owner: 10Joal) [10:34:41] (03PS3) 10Joal: Fix oozie mediawiki-history druid indexation job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450962 [10:35:33] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450962 (owner: 10Joal) [10:36:12] (03PS2) 10Joal: Replace ids with text in mediawiki-history-reduced [analytics/refinery] - 10https://gerrit.wikimedia.org/r/454242 (https://phabricator.wikimedia.org/T201617) [10:36:43] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/454242 (https://phabricator.wikimedia.org/T201617) (owner: 10Joal) [10:41:52] (03PS1) 10Joal: Update mediawiki-history-reduced jar version [analytics/refinery] - 10https://gerrit.wikimedia.org/r/455809 [10:42:13] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/455809 (owner: 10Joal) [10:42:37] elukey: Can I deploy using scap? [10:43:58] sure! [10:44:04] !log Deploying refinery from scap [10:44:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:44:06] :) [10:59:24] !log deploying refinery onto HDFS [10:59:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:11:44] CRAP ! I missed a needed change :( [12:12:26] :( [12:14:29] Good side of things is I realized it before prod job ... But still [12:27:12] addshore: I have a result for you :) [12:30:20] oooooooh [12:33:09] addshore: I have found {nbEntities: 42336942, nbEntitiesWithInLinks: 10312826, nbEntitiesWithOutlinks: 40096647} [12:33:19] how long did that take? [12:33:28] From this, you can derive entities without in-links and without out-links [12:33:57] addshore: most of it was me figure out how to get it :) [12:34:37] addshore: graph building takes some time (less than an hour), the rest is fast [12:35:06] hmm, but that doesnt mean there are 2 million entities without does it? [12:35:37] what that means to me is there 2M entities without OUT links [12:35:48] addshore: also, it was in 2018-01 [12:36:13] addshore: And that also means ~32M entities without IN links !!! [12:37:13] ahh wait, i just re read the ticket :) [12:37:42] So the number of entities without links from other entities is _32 million [12:38:07] How hard would be to to also get the number with either our or in links? [12:38:09] *out [12:38:57] addshore: minutes [12:39:16] :D [12:39:26] addshore: nb entites with at least one link (either in or out): 40115806 [12:40:37] amazing, would you mind dumping how you got to that somewhere for me to have a read? [12:41:24] addshore: another funny result - https://gist.github.com/jobar/75120f03565407b8ec7db0aa06c0a54e [12:42:51] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Tbayer) >>! In T201124#4528382, @Jdlrobson wrote: > Skipping sign off. QA will be handle... [12:43:29] joal: how about if you count sitelinks as an in / out link? [12:43:57] essentially also counting them as entities in the graph [12:44:03] addshore: a lot more difficult in that it means the construction of the graph changes [12:44:15] ack [12:46:10] joal: ^^ in that gist I think that is the most linked to entities? [12:47:03] should we have a ticket for tracking this work on getting wikidata in hadoop? [12:47:41] correct addshore :) [12:48:05] those entites are linked-to in the order of magnitude 10^7 :) [12:48:21] addshore: the code for the degrees-analysis: https://gist.github.com/jobar/ec44542614c0fe261a23cc3b4acf8e00 [12:50:05] addshore: if you look at that, you'll realize the triky bit is building the graph from the json - What I do is use mainSnaks only, for which dataType is "wikidata-%", and extract the item ID from the dataValue, making this a link between two entities [12:51:15] addshore: with this approach, qualifiers-links and references-links are left out [12:51:31] okay [12:52:09] I left a comment on https://phabricator.wikimedia.org/T202894#4538268 with the details so that we can always track down that code in the future :) [12:53:53] addshore: it's in my wish-list to manage to build the full RDF graph (https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#/media/File:Rdf_mapping-vector.svg) [12:54:15] addshore: for reference, most of it comes from https://wikitech.wikimedia.org/wiki/User:Joal/Wikidata_Graph#Playing_with_GraphFrames_-_v2 [12:57:21] (03PS1) 10Joal: Correct mediawiki-history-reduced checker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/455825 [12:58:43] elukey: I'll wait for a review of --^, and redeploy tomorrow - Maybe a good moment for archiva test? [13:00:04] !log Restart mediawiki-history and mediawiki-history-druid jobs [13:00:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:02:19] joal: yep! I had a chat with Brandon and I realized that there was one step missing, I've updated https://phabricator.wikimedia.org/T192639#4537062 [13:05:54] elukey: will all that be feasible before tomorrow? [13:06:52] tomorrow morning I'd say, it should take 10 minutes more or less [13:07:00] \o/ ! [13:07:16] I already merged the change to reduce the TTL from 1H to 5M in the DNS [13:07:18] o/ [13:07:22] hiiiii [13:07:38] elukey: You're magic :) [13:08:32] elukey: asking the permission to manually index a new datasource on druid public [13:10:56] joal: do I need to grant you permissions to do that ?? :D [13:11:42] elukey: I prefer to ask, like that you know, and you lao tell me if you think I'm doing something wrong :) [13:13:14] elukey: I take your answer as a yes ;) [13:14:27] :) [13:17:37] wow a-team, i just discovered that snakebite has autocomplete for hdfs paths! :o [13:19:00] WAT? [13:19:16] ottomata: also, snakebite only supports py2 :( [13:19:31] oh rly? [13:19:40] ottomata: I'm using it nonetheless, but that's not super great [13:21:53] ya [13:24:31] its also been a awhillle since they committed anything [13:24:34] we are on the latest :/ [13:24:50] mwarf :( [13:27:01] elukey: are we restarting nodes currently? [13:27:20] elukey: I'm surprised, it's the second job failing in a small time [13:27:22] nope [13:27:27] grumble [13:28:00] elukey: when you have a moment: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/455743/1/hieradata/role/common/cache/text.yaml wanted to also discuss naming (and the little bit of role name refactoring i did last week) [13:29:11] also, i'd like to install ipython everywhere [13:29:31] i realized that I can do PYSPARK_PYTHON=ipython3 [13:29:33] which is quite nice on CLI [13:29:43] but it needs to be installed everywhere to work in YARN [13:29:48] so the change looks fine, I don't remember though how the directors are mapped to domains [13:29:57] elukey: that will comein a different change [13:29:57] but I guess it is somewhere in puppet :) [13:30:01] ya [13:30:09] gonna add those one at a time [13:30:11] first hue/yarn [13:30:17] makes sense yes [13:30:28] I saw that you were able to solve the stretch deps for hue [13:30:35] so we should be ready to go [13:30:36] ya hackily [13:31:09] for ipython it looks fine, don't have any opposition if it makes sense (as you explained) [13:31:25] k coo [13:31:59] oh right, elukey so i made a new role: [13:31:59] role::analytics_cluster::hadoop::ui [13:32:05] that will be applied on tool1001 [13:32:11] for hue and yarn [13:32:18] best name i came up with, wonder if you have something better [13:33:21] all good, clear for me [13:33:43] k thx [13:35:51] ottomata: I have an upgrade plan for archiva (https://phabricator.wikimedia.org/T192639#4537062), planning to send an email now and do it tomorrow morning [13:35:54] (EU time) [13:36:13] saw it, looks great elukey :) [13:36:31] super :) in theory we could even proceed in ~30 mins [13:36:35] when the cache expires [13:36:39] but there are meetings etc.. [13:36:40] uff [13:36:44] elukey: if you like, i'm here now? oh rght [13:36:49] oh ya we got the one with chase [13:40:22] 10Analytics, 10New-Readers, 10Browser-Support-Opera, 10Easy: Split opera mini in proxy or turbo mode - https://phabricator.wikimedia.org/T138505 (10Liuxinyu970226) [13:40:28] 10Analytics, 10New-Readers, 10Browser-Support-Opera, 10Easy: Split opera mini in proxy or turbo mode - https://phabricator.wikimedia.org/T138505 (10Liuxinyu970226) [13:41:12] 10Analytics, 10Analytics-General-or-Unknown, 10Browser-Support-Opera: Count X-CS=502-16 only if it came through an Opera Mini PROXY - https://phabricator.wikimedia.org/T58118 (10Liuxinyu970226) [13:42:08] milimetric: if you have a couple minutes do you mind batcaving to pair? [13:42:23] a-team i'll be switching yarn.wm.org and hue.wm.org to the new VM in a bit, let me know if anything is weird! [13:42:55] I’m in jury duty fdans [13:43:04] ohhh that's right sorry [13:45:55] fdans but if you have questions I can answer on IRC I can do my best [13:46:06] we’ll be waiting around for a while [13:48:04] yarn looks good to me! [13:48:06] doing hue [13:49:04] +1 [13:55:11] hue looks good too! [13:55:32] +1 :) [13:59:07] hm, gonna keep going, turnilo next! [14:03:05] elukey: whatcha think, shoudl I remvoe the pivot.wm.org redirect and domain? [14:03:09] as part of this move? [14:06:55] coool turnilo looks good too, new version! [14:07:52] proceeding with superset [14:07:54] ottomata: nah let's leave it there [14:07:57] ok [14:19:33] !log Restart Workflow pageview-druid-hourly-wf-2018-8-28-11 [14:19:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:20:59] hmm superset broken after upgrade, trying to figure out why... [14:21:07] looks like bad db migration or something [14:50:56] 10Analytics, 10Operations, 10Traffic, 10Services (blocked): Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10mobrovac) While the proposed solution will work for us in this case, I second @Ottomata's thoughts that having this header (or the lack thereof) included in the lo... [15:00:38] a-team: hangout not working too well [15:00:55] luca and i will be in standup soon...finishing up meeting [15:01:30] ping ottomata elukey [15:10:06] ping ottomata elukey coming? [15:10:40] nuria: soirry we are ina good meeting with chase [15:10:42] nuria: we are still in the meeing with Chase [15:10:56] np elukey ottomata - We're starting [15:18:56] 10Analytics, 10Maps: Switch maps metrics from hourly to daily - https://phabricator.wikimedia.org/T150708 (10Mholloway) p:05Triage>03Normal Per convo with @Gehel, it sounds like this concerns a script used to track usage of the Kartographer-specific tags across all projects. Maybe there is a more standard... [15:27:56] joal: [15:28:00] https://www.irccloud.com/pastebin/80vD1jGJ/ [15:31:32] we will be at retro soon! [15:31:34] sorry yall [15:31:50] ottomata: no retro - tomorrow to have Dan [15:32:02] oh [15:32:02] ok [15:32:29] ack thanks! [15:32:44] joal: are you guys in the cave or already off? [15:32:48] I guess we'll send the e-scrum [15:33:05] elukey: discussing with nuria in the cave, but standup done and no retro [15:34:17] elukey: also, tomorrow morning I plan on redeploying for my missing-change, and to deploy AQS with the new metrics (not yet publicly available) - Ok for oyu? [15:35:00] sure, please wait for me to upgrade archiva first :) [15:36:12] elukey: of course :) [15:36:37] I already sent an email to engineering@ and analytics@ as heads up [15:41:19] 10Analytics, 10Product-Analytics, 10Reading-analysis: Assess impact of ua-parser update on core metrics - https://phabricator.wikimedia.org/T193578 (10Nuria) Given that we have provided an upper bound for the effect of this change, neither @fdans nor me understand what is the ask. The wmf-grown regex has not... [16:21:32] superset fixed! [16:23:11] \o/ [16:25:58] upgrade instructions added here https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Upgrading [16:26:07] the part that got me was the export PYTHONPATH=/etc/superset step [16:26:24] the db migration i did created a new sqlite db, it didn't pick up the mysql confis we had [16:47:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update superset (we have 0.20.6, 0.26.3 is available) - https://phabricator.wikimedia.org/T201430 (10Ottomata) This is done! Plz try it out :) [16:51:53] 10Analytics, 10Analytics-Kanban: Reimage thorium to Debian Stretch - https://phabricator.wikimedia.org/T192641 (10Ottomata) a:03Ottomata [16:52:47] elukey: any objections to wed sept. 5th for thorium upgrade? [16:52:57] nope! +1 [16:58:05] * elukey off! [17:00:11] nuria: should the wikis in the sqoop list match the ones in the pageview whitelist? [17:00:24] it seems like we haven't updated the sqoop list since february [17:02:05] there's 736 wikis in the sqoop list, while there are way more active wikis that we have no editing metrics for [17:02:14] (joal ^) [17:03:50] fdans: please update the list ! [17:04:00] yessir [17:04:47] ottomata: first superset try with my dashboard is great -> Snapier, prettier [17:13:14] ottomata: I hwever get a 500 when trying to access the "Sources --> Databases" menu [17:17:32] Gone for diner team, back after [17:18:58] fdans: let's sync up on your tasks in 1 hr? [17:23:38] nuria: I'm working on the 6h train, so connection might be a bit spotty [17:24:10] right now I'm with updating the super out of date sqoop list [17:27:55] joal: I think we don't need the edit count from 2016 on the sqoop list right? just the group is necessary [17:28:09] like, the sqoop script doesn't take the edit count into account [17:28:10] https://github.com/wikimedia/analytics-refinery/blob/master/bin/sqoop-mediawiki-tables#L61 [17:29:04] we could add the wikis to the list, and update the count numbers when all are already loaded [17:39:00] joal: loking! [17:39:20] ah ha me too! [17:39:22] ok looks obvious on it [17:51:57] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update superset (we have 0.20.6, 0.26.3 is available) - https://phabricator.wikimedia.org/T201430 (10Ottomata) [18:01:32] fdans: only interesting reason for the count is to build the groups [18:03:02] joal: yeah but how do you get the count when we haven't yet sqooped the tables? [18:04:16] shoudl be fixed joal [18:04:30] ottomata: indeed, testing a hive query from sql-LAb :) [18:04:34] Many thanks ottomata :) [18:04:57] fdans: I know Dan did some magic here, but I can't recall [18:05:16] fdans: i updated sqoop list recently though to add a wiki [18:05:42] fdans: just last month, let me see [18:05:50] nuria: you sure you're not referring to the whitelist? [18:06:03] fdans: whitelist is just used for pageviews [18:06:19] nuria: https://github.com/wikimedia/analytics-refinery/blob/master/static_data/mediawiki/grouped_wikis/prod_grouped_wikis.csv [18:06:24] fdans: in this case it was a wiki for which we HAD pageviews [18:06:31] fdans: but no edit info [18:08:01] fdans: one sec [18:10:43] ottomata: not sure if we can do anything for that, but the job-tracking button for hive queries in SQL-Lab leads to http://analytics1001.eqiad.wmnet.... instead of yarn.wikimedia.org [18:11:57] nuria: ohhhhh i see [18:12:11] nuria: but that’s the labs list right? [18:12:29] fdans: right, cause we scqqop from labs not prod [18:12:49] when you did the atk language [18:13:09] fdans: so, yeah added new language july 12 [18:13:32] ooooo right [18:15:14] nuria: we’re still missing about 150 wikis (in difference with the pageview list) [18:16:25] fdans: the labs list has 738 wikis that exist on labs replicas [18:16:32] fdans: not all wikis are in teh labs replicas [18:16:48] fdans: makes sense? [18:17:05] fdans: is this a ticket ? [18:18:07] nuria: yes, coming from T187414 [18:18:08] T187414: Wikistats 2.0: "aa.wikipedia.org" exists and has data available, but marked "Invalid" - https://phabricator.wikimedia.org/T187414 [18:19:30] fdans: that is a closed wiki: https://aa.wikipedia.org/wiki/Main_Page [18:19:39] fdans: so it might not be in teh labs replicas [18:20:36] fdans: makes sense? [18:21:24] nuria: yep [18:21:47] fdans: i think that ticket we can decline [18:22:52] fdans: if data not in labs replicas [18:23:07] fdans: or ask dbas to put it on, do check if data exists and let us know [18:23:41] nuria: so the criteria to add items to that list is people requesting it via phab? [18:24:04] joal hmm looking not sure if we can either [18:24:18] ottomata: small detail, just saying :) [18:25:37] fdans: we can surface anything on labs replicas but the dbas do not replicate there all wikis that exists, we whitelisted 750 wikis and we have been adding others by hand [18:25:46] joal where is job tracking button? [18:26:33] ottomata: you need to start a query in SQL-Lab using hive - Waiting for the result, tracking button in addition to advancement-bar! [18:28:50] fdans: if you give a list of projects, I can check it against labs for you [18:30:05] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0: "aa.wikipedia.org" exists and has data available, but marked "Invalid" - https://phabricator.wikimedia.org/T187414 (10Nuria) @Krinkle: usability.wikimedia.org does not have pageviews cause i suspect we have a bug on our end and we are n... [18:30:47] fdans: the issue with usability wikimedia.org is a different one , we are not parsing that domain cause i bet we have a bug on pageview code that does not identify it those as pageviews [18:30:51] joal: these are the réplicas in mysql in labs right? i can check myself thank you :) [18:31:19] fdans: as you wish - I have a script is all ;P) [18:32:56] nuria: looking [18:33:41] fdans: so solving that ticket requires two different fixes [18:39:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update superset (we have 0.20.6, 0.26.3 is available) - https://phabricator.wikimedia.org/T201430 (10Nuria) Ping @mpopov please check superset and let us know, it is upgraded to latest now. [18:45:17] nuria: yeah 1-add usability to whitelist and 2-investigate why it didn't produce an alert right? [18:46:35] fdans: sure, you will need to add some tests, error is here: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java#L76 [18:47:10] nuria: niiice, thanks [18:47:45] * fdans cranks up intellij idea [18:48:09] fdans: regex can be changed or add "usability" [18:48:31] the [a-zA-Z]{2,3}) covers chapters websites [18:50:38] 10Analytics-Kanban: Deploy wikistats from master branch - https://phabricator.wikimedia.org/T203017 (10Nuria) [18:50:43] 10Analytics-Kanban: Deploy wikistats from master branch - https://phabricator.wikimedia.org/T203017 (10Nuria) [18:51:42] 10Analytics-Kanban: Deploy wikistats from master branch - https://phabricator.wikimedia.org/T203017 (10Nuria) Please update deployment docs [18:52:35] 10Analytics, 10Discovery-Analysis, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: Productionize per-country daily & monthly active app user stats - https://phabricator.wikimedia.org/T186828 (10mpopov) 05Open>03stalled We'll discuss with Josh and Charlotte (once she's back from vacation) [18:59:44] (03PS1) 10Fdans: Add usability.wikimedia to pageview whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/455894 (https://phabricator.wikimedia.org/T187414) [19:01:46] (03CR) 10Nuria: [V: 032 C: 032] "Remember that we need to add this value to table via inserting it too." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/455894 (https://phabricator.wikimedia.org/T187414) (owner: 10Fdans) [19:02:42] Gone for tonight - see you tomorrow team [19:04:25] baiiii joal [19:54:19] nuria: we wouldn't want to consider pageviews those in private wikis such as office.wikimedia, checkuser.wikimedia, board.wikimedia, etc, right? [19:54:35] fdans: and many others yes [19:54:36] they are in the sitematrix and therefore are searchable in wikistats [19:54:43] fdans: for edits [19:54:54] nuria: but for pageviews? [19:55:32] fdans: they are searchable for edits if data is pushed sorry, i doubt that it is [19:56:09] fdans: so they are "searchable" in the sitematrix menu, right, is that what you mean? [19:56:38] nuria: I'm asking if we would want to add those sites to the whitelist [19:56:45] fdans: no we do not [19:57:22] nuria: hm, and we don't mind that they are searchable in wikistats? [19:58:02] fdans: ideally they should not be in our menu, we can file a ticket for that [20:00:09] i wonder if there is any indication in the sitematrix that those wikis are private [20:00:58] nuria: ohhhh they have a "private" key on the sitematrix [20:01:14] so it would be super easy to disable them on the site [20:01:24] fdans: well , good, that we can use to exclude them. If you file a ticket we can do that [20:01:34] much like closed wikis, they have a "closed" key [20:02:18] fdans: ya, closed we do serve if they had been closed somewhat recently [20:03:06] cool [20:17:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Research: 20K events by a single user in the span of 20 mins - https://phabricator.wikimedia.org/T202539 (10bmansurov) @Nuria anything we can do to mark such user agents as 'bot'? [20:30:05] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats 2.0: "aa.wikipedia.org" exists and has data available, but marked "Invalid" - https://phabricator.wikimedia.org/T187414 (10fdans) @Nuria I've been checking sites listed under "special" in the sitematrix that aren't privat... [20:34:57] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats 2.0: "aa.wikipedia.org" exists and has data available, but marked "Invalid" - https://phabricator.wikimedia.org/T187414 (10fdans) In the case of aa.wikimedia.org, we should probably change the "INVALID" label to "CLOSED".... [20:35:30] 10Analytics, 10Operations, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345 (10RobH) So we ordered spare systems on T195418. The price per unit is listed on that #procurement task. The specs are: Dual Intel Xeon Silver 4110 2.1G, 8C/16... [20:37:30] 10Analytics, 10Operations, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345 (10RobH) [20:38:55] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats 2.0: "aa.wikipedia.org" exists and has data available, but marked "Invalid" - https://phabricator.wikimedia.org/T187414 (10Nuria) @fdans : we should probably open tickets to refactor all regexes but we can do that at a la... [20:57:28] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Research: 20K events by a single user in the span of 20 mins - https://phabricator.wikimedia.org/T202539 (10Nuria) @bmansurov nothing easy I can think of at this time server side. This schema suffers much more than others regarding issues with bo... [21:01:44] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Research: 20K events by a single user in the span of 20 mins - https://phabricator.wikimedia.org/T202539 (10bmansurov) @Nuria thanks! [21:14:46] (03PS1) 10Fdans: Fix strategy and usability sites not being counted as pageviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) [21:17:31] (03CR) 10jerkins-bot: [V: 04-1] Fix strategy and usability sites not being counted as pageviews [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) (owner: 10Fdans) [21:23:24] Hallo. [21:23:49] I cannot open any Config:Dashiki:* pages on Meta. [21:23:52] Is it a known issue? [21:26:33] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Install pyArrow in Cluster - https://phabricator.wikimedia.org/T202812 (10Ottomata) > Download the pyarrow (0.8.0) release and include it in the spark2 .deb package we make, and install it on all nodes in /usr/lib/spark2/python somewhere. I was able to d... [21:26:40] bearloga, fdans, milimetric, nuria ^ [21:27:02] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Install pyArrow in Cluster - https://phabricator.wikimedia.org/T202812 (10Ottomata) [21:27:35] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Move internal sites hosted on thorium to ganeti instance(s) - https://phabricator.wikimedia.org/T202011 (10Ottomata) [21:53:08] aharoni: not known, if you file a ticket and cc release team it will be best [21:53:24] aharoni: the pages themselves had not changed in forever [21:53:36] aharoni: will ping people in mediawiki-dev [21:55:09] nuria: Yes, they probably haven't chanhed in a while. I want to add a new chart based on report updater. If I understand correctly, one of the steps to do it is to add such a page. But I cannot see any of the existing ones. [21:56:39] 10Analytics, 10Analytics-Dashiki: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10Amire80) [21:56:46] nuria: https://phabricator.wikimedia.org/T203029 [21:57:42] 10Analytics, 10Analytics-Dashiki: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10Amire80) [21:57:47] 10Analytics, 10Release-Engineering-Team: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10Nuria) [21:58:32] 10Analytics, 10Release-Engineering-Team: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10Nuria) Ping @greg et team, while we edit those pages we have not changed them in a while so probably somthing is a miss with the latest wikipedia deployment that has broken them? [21:58:51] 10Analytics, 10Release-Engineering-Team: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10Krenair) PHP fatal error: Call to undefined method JsonConfig\JCContent::getLicenseObject() [22:01:30] 10Analytics, 10MediaWiki-extensions-JsonConfig, 10Release-Engineering-Team: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10greg) p:05Triage>03High [22:01:37] 10Analytics, 10MediaWiki-extensions-JsonConfig, 10Release-Engineering-Team: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10Jdforrester-WMF) Same bug as {T203006}. :-( [22:02:38] 10Analytics, 10MediaWiki-extensions-JsonConfig, 10Release-Engineering-Team: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10Krenair) https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/JsonConfig/+/450397/ is looking fishy [22:05:07] (03CR) 10Nuria: Fix strategy and usability sites not being counted as pageviews (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/456022 (https://phabricator.wikimedia.org/T187414) (owner: 10Fdans) [22:10:46] (03CR) 10Nuria: Correct mediawiki-history-reduced checker (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/455825 (owner: 10Joal) [22:39:15] 10Analytics, 10MediaWiki-extensions-JsonConfig, 10Release-Engineering-Team, 10Patch-For-Review: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10Jdforrester-WMF) Unfortunately it seems my optimism that I50250b483b would fix this was mis-placed; I can't see why the Das... [23:19:32] 10Analytics, 10Research, 10Services: Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10bmansurov) [23:26:15] 10Analytics, 10Research, 10Services: Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10Nuria) @bmansurov Please consult with services team (now part of Core platform) as data used to power features on mediawiki is stored in service's cluster. Seems that storage requireme... [23:31:07] 10Analytics, 10MediaWiki-extensions-JsonConfig, 10Release-Engineering-Team, 10MW-1.32-release-notes (WMF-deploy-2018-08-28 (1.32.0-wmf.19)), and 2 others: Config:Dashiki:* on Meta can't be opened - https://phabricator.wikimedia.org/T203029 (10Krinkle) [23:46:54] 10Analytics, 10Research, 10Services: Storage of data for recommendation API - https://phabricator.wikimedia.org/T203039 (10leila)