[00:02:56] (PS1) MaxSem: Fix metric name [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307457 [00:15:58] (CR) Yurik: [C: 2] Fix metric name [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307457 (owner: MaxSem) [00:16:08] (CR) Yurik: [V: 2] Fix metric name [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307457 (owner: MaxSem) [00:17:36] (CR) Yurik: [C: 2 V: 2] Fix graphite port [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307449 (owner: MaxSem) [00:18:40] (CR) Yurik: [C: 2 V: 2] Refactor the logging command [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307448 (owner: MaxSem) [00:19:37] (CR) Yurik: [C: 2 V: 2] Remove unused uses [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307447 (owner: MaxSem) [00:45:01] (PS1) MaxSem: Merge branch 'master' into production [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307460 [01:12:18] Does Hadoop (Or some other analytics data stream) store info on what cookies people make requests with? [06:10:49] Just checked the oozie alarms [06:11:09] the number of dt:"-" is decreased a lot [06:11:13] but still present sometimes [06:24:54] ah and now a ERROR [06:24:55] sigh [06:52:20] Hey elukey [06:52:37] Can I help? [06:53:41] joal: o/ [06:53:57] ah yes we need to run oozie with a more tolerant % of failures :( [06:55:58] I am investigating the new source of dt:"-" [06:56:05] that are a lot less than yesterday but still [06:56:36] I believe that we are now seeing some VSL timeouts that were covered before by the VSL store overflow errors [06:56:55] hm [06:57:20] so now vk is able to keep more incomplete records in memory without dropping the oldest one periodically [06:57:22] elukey: just checked the % of errors: we move between 0.5 and 5 [06:57:37] and more of them are able to go to timeout :) [06:58:02] yesterday before the patch the errors were in the thousands per hour [06:58:07] now it is like houndreds [06:58:21] (I am talking per host errors) [06:59:07] elukey: What strategy do we go for for refinery? 2 options: re-run only errors or restart a new coordinator for all? [07:00:09] joal: I thought that the only option was to run a more permissive coordinator [07:00:15] elukey: Thanks A LOT for going with the VK stuff - It's really complex ! [07:00:53] elukey: correct, but I can either run a coord with 1 action, and run it evey time we have an error, or restart a global new coord (as we did for misc) [07:01:26] ahhh okok! So a one time thing vs a more permanent one [07:01:37] elukey: excatly [07:01:42] I'd say a one time coord run would be fine for the moment [07:01:47] I hope to solve this mess today [07:02:15] k sir, I'll do that now [07:02:35] thanks for the sir :D [07:02:59] * elukey thinks that the last time he said these kind of strong statement he ended up coding for a month [07:03:20] huhuhu :D [07:08:33] elukey: Job started [07:10:11] thanks! [09:02:32] https://gerrit.wikimedia.org/r/#/c/307483/ - created a patch to raise the timeout to 1500 seconds (now: 700) [09:02:47] I don't like this approach since it is not really data driven [09:03:07] but I don't have any data from varnish so I need to make guesses [09:16:13] Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2593914 (Aklapper) [09:16:15] Analytics-Tech-community-metrics: Deployment of Demography panel - https://phabricator.wikimedia.org/T138757#2593912 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io [09:16:17] Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper) [09:16:20] Analytics-Tech-community-metrics: Deployment of Gerrit Delays panel for engineering - https://phabricator.wikimedia.org/T138752#2593915 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io [09:16:21] Analytics-Tech-community-metrics: Deployment of Mediawiki panels - https://phabricator.wikimedia.org/T138006#2593918 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io [09:16:24] Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper) [09:16:26] Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper) [09:16:28] Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper) [09:16:29] Analytics-Tech-community-metrics: Deployment of Mailing List panel - https://phabricator.wikimedia.org/T138001#2593921 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io [09:16:32] Analytics-Tech-community-metrics: Deployment of Gerrit Backlog panel for engineering - https://phabricator.wikimedia.org/T138000#2593924 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io [09:16:34] Analytics-Tech-community-metrics: Deployment of Gerrit (basic) panel - https://phabricator.wikimedia.org/T137999#2593926 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io [09:16:36] Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper) [09:16:38] Analytics-Tech-community-metrics: Deployment of Git panel - https://phabricator.wikimedia.org/T137998#2593930 (Aklapper) Open>Resolved This is now available as a panel on https://wikimedia.biterg.io [09:16:40] Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper) [09:18:03] Analytics-Tech-community-metrics: Deployment of IRC panel - https://phabricator.wikimedia.org/T138004#2386456 (Aklapper) [09:18:05] Analytics-Tech-community-metrics: IRC support to be added to GrimoireLab - https://phabricator.wikimedia.org/T138005#2593937 (Aklapper) Open>Resolved This is done. [09:18:40] Analytics-Tech-community-metrics: Deployment of Mediawiki panels - https://phabricator.wikimedia.org/T138006#2593945 (Aklapper) [09:18:42] Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016): Mediawiki support to be added to GrimoireLab - https://phabricator.wikimedia.org/T138007#2593943 (Aklapper) Open>Resolved This is done. [09:52:35] Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016), Security, Vuln-XSS: Potential XSS on korma.wmflabs.org - https://phabricator.wikimedia.org/T132966#2594002 (Aklapper) Open>Resolved This is deployed and resolved. [09:52:38] Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016), Security, Vuln-XSS: Potential XSS on korma.wmflabs.org - https://phabricator.wikimedia.org/T132966#2594004 (Aklapper) [09:56:24] Analytics, Labs: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594010 (Stigmj) Any chance for this to be expedited? At nowiki we are currently using these datasets for a local stats-service (https://tools.wmflabs.org/pagecount/) and the users are complaining (ht... [09:59:15] Analytics-Tech-community-metrics: Maniphest support to be added to GrimorieLab - https://phabricator.wikimedia.org/T138003#2594011 (Aklapper) p:Normal>High [10:00:30] Analytics-Tech-community-metrics: Mismatch between numbers for code merges per organization - https://phabricator.wikimedia.org/T129910#2594013 (Aklapper) p:Normal>Low a:Lcanasdiaz>None Moved to Backlog. Does not make sense to investigate in korma (legacy) hence lowering priority. [10:00:44] Analytics-Tech-community-metrics: korma: Mismatch between numbers for code merges per organization - https://phabricator.wikimedia.org/T129910#2594016 (Aklapper) [10:05:50] merged! [10:19:23] taking a reak a-team, see you in a bit [10:57:14] wikimedia/mediawiki-extensions-EventLogging#593 (wmf/1.28.0-wmf.17 - efc8c4d : Antoine Musso): The build has errored. [10:57:14] Change view : https://github.com/wikimedia/mediawiki-extensions-EventLogging/commit/efc8c4d0986d [10:57:14] Build details : https://travis-ci.org/wikimedia/mediawiki-extensions-EventLogging/builds/156186051 [11:52:16] Analytics-Tech-community-metrics: korma: Profile names in UTF-8 incorrectly displayed as ??? - https://phabricator.wikimedia.org/T119540#2594097 (Aklapper) p:Low>Lowest [11:53:57] Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2594100 (Aklapper) [11:54:36] Analytics-Tech-community-metrics, Developer-Relations: Migration to new Bitergia's development dashboard - https://phabricator.wikimedia.org/T137997#2386336 (Aklapper) p:Triage>Normal [11:55:10] Analytics-Tech-community-metrics: korma: Font used for "Organizations" header on contributors.html looks a bit out of place - https://phabricator.wikimedia.org/T100569#2594106 (Aklapper) [11:58:42] Analytics-Tech-community-metrics: korma: "Last 30 days" stats for specific mailing list display an account as one list item per username character - https://phabricator.wikimedia.org/T123927#2594109 (Aklapper) p:Low>Lowest [11:59:27] Analytics-Tech-community-metrics: korma: Clicking "Age of open changesets by Affiliation" explanation link / legend goes to top of page - https://phabricator.wikimedia.org/T110874#2594111 (Aklapper) [11:59:42] Analytics-Tech-community-metrics: korma: Time axis on repository.html only displays two months, repeated several items - https://phabricator.wikimedia.org/T115872#2594112 (Aklapper) [11:59:56] Analytics-Tech-community-metrics: korma: Empty "subject" and "creator" fields for mailing list thread on mls.html - https://phabricator.wikimedia.org/T116284#2594113 (Aklapper) [12:00:11] Analytics-Tech-community-metrics: korma: Illegible overlapping tables on narrow screens due to CSS - https://phabricator.wikimedia.org/T97115#2594115 (Aklapper) [12:00:24] Analytics-Tech-community-metrics, JavaScript: korma: Failed to load resource: the server responded with a status of 404 (Not Found) - https://phabricator.wikimedia.org/T65061#2594116 (Aklapper) [12:00:26] Analytics-Tech-community-metrics, JavaScript: korma: Syntax error, unrecognized expression on Korma profiles - https://phabricator.wikimedia.org/T126325#2594117 (Aklapper) [12:01:29] Analytics-Tech-community-metrics: korma: Panel for "Wiki revisions" on people.html does not provide 2016 data - https://phabricator.wikimedia.org/T141228#2594123 (Aklapper) p:Triage>Lowest [12:07:50] Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016): Create basic/high-level Kibana (dashboard) documentation - https://phabricator.wikimedia.org/T132323#2594129 (Aklapper) [12:18:56] Analytics-Tech-community-metrics, Developer-Relations (Jul-Sep-2016): Identify Wikimedia's most important/used info panels in korma.wmflabs.org - https://phabricator.wikimedia.org/T132421#2594147 (Aklapper) [12:53:03] Analytics-Kanban, Labs, Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594241 (Ottomata) p:Triage>High a:Ottomata [12:54:43] HIIII ottomata [12:59:31] hiIi [13:01:42] o/ joal [13:01:46] Live systems chatting? [13:44:32] o/ joal and millimetric [13:44:41] Hard crash. Trying to recover [13:44:49] good luck halfak|Mobile ! [13:48:27] Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2594422 (elukey) Next step that I'd like to do is compare the variation in traffic from the 17th onwards, to see if different queries started to land on AQS. [14:10:25] joal: do you have time for a naive question about where to find aqs logs? [14:10:32] sure [14:10:42] elukey: I think you know more than me ;) [14:12:20] no no like webrequest logs :P [14:12:24] ah ok :) [14:12:51] can I find them via beeline with something like [14:12:53] FROM wmf_raw.webrequest [14:12:53] WHERE uri_host = "wikimedia.org" AND uri_query like '%/api/rest_v1/metrics/pageviews/per-article%' [14:13:05] or am I completely out of track? [14:13:27] sounds correct, but why wmf_raw instead of wmf? [14:14:34] also, if you want to do traffic analysis, look at wmnf.aqs_hourly [14:14:44] elukey: --^ [14:15:21] mmm I thought that wmf was refined, so I preferred as much info as possible.. aaaand I didn't know about wmnf.aqs_hourly :D [14:18:53] (CR) Mforns: [C: 2 V: 2] "LGTM Thanks a lot for doing that!" [analytics/reportupdater] - https://gerrit.wikimedia.org/r/307112 (https://phabricator.wikimedia.org/T144119) (owner: Hashar) [14:21:28] (PS3) Mforns: Support passing the exploded values by file path [analytics/reportupdater] - https://gerrit.wikimedia.org/r/306966 (https://phabricator.wikimedia.org/T132481) [14:21:49] Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2594536 (elukey) The last outage that we registered on the 26th was related to a OOM: Dates in UTC: ``` 03:09 PROBLEM - cassandra CQL 10.64.48.117:9042 on aqs1003 is CRITICAL: C... [14:23:56] ottomata: Hi ! [14:24:05] hiya! [14:24:11] ottomata: Do we go for zookeeper change for Druid ? [14:24:28] OH! [14:24:37] yes lets' do it, i'm on the phone, and in the middle of a thought [14:24:45] so hmm, gimme a few mins, hopefully we can start before standup [14:24:56] ottomata: you tell me when you're ready :) [14:25:40] mforns: thanks :] [14:25:47] mforns: will add CI conf later on [14:26:19] actually right now since I have found the patch :D [14:26:55] hashar, thank you! [14:29:46] elukey: does wmf.aqs_hourly contains your need? [14:29:47] (CR) Hashar: "recheck" [analytics/reportupdater] - https://gerrit.wikimedia.org/r/307112 (https://phabricator.wikimedia.org/T144119) (owner: Hashar) [14:30:00] nuria_: hello [14:30:41] nuria_: I added a line in the cassandra-backfilling etherpad to double check with you that month 2015-11 has been loaded (it has no line in the doc) [14:31:05] Analytics, Continuous-Integration-Config, Patch-For-Review: Add test runner and CI configuration to analytics/reportupdater - https://phabricator.wikimedia.org/T144119#2594575 (hashar) All good. Thank you @mforns [14:31:24] mforns: it is all set now. To reproduce CI run, you should be able to just run "tox" on your local machine [14:31:30] should gives you the same environement [14:31:44] hashar, yes it works! already tried it :] [14:32:11] awesome [14:32:11] joal: it is like candy land for me [14:32:18] elukey: ;) [14:32:19] thanks :D [14:32:30] elukey: if you want I have spark scripts to help [14:32:33] elukey: let me know :) [14:32:51] Analytics, Continuous-Integration-Config, Patch-For-Review: Add test runner and CI configuration to analytics/reportupdater - https://phabricator.wikimedia.org/T144119#2594581 (hashar) Open>Resolved [14:34:27] joal: I'd be really happy to look at them with you during the next days, I didn't know that I had so much data available :( [14:34:40] I feel super lazy to not have it checked before [14:34:59] elukey: https://gist.github.com/anonymous/d7ac82770f52d8eaa71edead8f9e3a28 [14:35:16] elukey: we can go through that whenever you want, I think it can help :) [14:35:51] wow really nice! [14:39:00] joal / mforns: I'm just catching up on email, we can talk anytime you two want (oh man, it's almost standup!) [14:39:35] milimetric / mforns : I have time now if you want, or after standup :) [14:39:47] sure, batcave! [14:39:50] ok [14:40:58] ok joal ah you busy now? [14:41:09] so, i'm just going to merge the puppet patch, run puppet, and then restart druid, ja? [14:41:14] this will set up a zk cluster on the druid nodes [14:41:17] and point the druid configs at it [14:41:29] ottomata: I think that's right [14:41:47] ottomata: Since everything is stored in deep storage, should be ok [14:41:50] k [14:42:16] ottomata: I'll still triple check [14:54:10] joal: corrected etherpad, loading 0003741-160826130408204-oozie-oozi-C was for 2015-11 [14:54:24] thx nuria_ ! [14:57:08] elukey: where can i see the logs that had the memory OOM for cassandra? [15:00:07] nuria_: /var/log/cassandra/ and then system.log.1.zip IIRC [15:00:23] we are using Xmx 16Gb [15:00:27] that is huuuge [15:01:06] (PS3) Milimetric: [WIP] Script sqooping mediawiki tables into hdfs [analytics/refinery] - https://gerrit.wikimedia.org/r/306292 (https://phabricator.wikimedia.org/T141476) [15:01:16] ottomata: standddupppp? [15:02:09] ottomata: holaaa [15:03:17] Analytics-Kanban, Analytics-Wikimetrics: Stop vital signs metric creation on wikimetrics - https://phabricator.wikimedia.org/T143715#2594681 (Nuria) a:Nuria [15:03:34] AHHHH [15:03:35] sorrry [15:08:37] Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2594696 (jcrespo) [15:43:25] Analytics-Kanban, Labs, Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594755 (Ottomata) Done! I'm running the first rsync over now. I didn't create a specific rsync module, it seems like the ::dumps one should be enough. @Stigmj, I don't... [15:47:46] mmmm ottomata why puppet was failing for zk on 100[12] ? [15:47:53] (curious) [15:48:53] elukey: that is druid 1001, etc. [15:48:58] elukey: not sure yet, just started to ask _joe_ a q in mw sec [15:49:17] but, it looks like the host specific override i have in hiera is not being used [15:49:25] ah okok [15:59:47] joal: ops sync? [15:59:50] sure [16:00:32] milimetric, are you planning on continuing with denormalized table today? [16:03:36] mforns: no, I think I will start writing spark for it [16:03:43] or at least thinking of an algorithm that would be more efficient [16:03:50] drawing on my chalkboard :) [16:03:56] wanna do it together? cc joal [16:04:03] (right now I'm just updating the calendar) [16:04:37] milimetric, sure [16:05:56] milimetric, in one of the patches of the page/user scala code there's this method I ended up deleting, that historifies the admin user name [16:07:07] milimetric, this might be similar with what we want to do... dunno [16:07:19] * mforns looks [16:08:11] a-team: I think I cleaned up the ops-duty events. I tried to adjust them to everyone's work schedule (later for nuria earlier for joal). Feel free to move them yourselves, the events are all editable by everyone [16:08:33] thx a lot milimetric ! [16:08:33] milimetric, looks perfect for me, thanks! [16:09:07] milimetric: super thanks [16:09:17] mforns: a-batcave-2 to look together? [16:09:46] Analytics-Kanban, Labs, Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594863 (Nuria) Open>Resolved [16:09:58] Analytics-Kanban, Analytics-Wikimetrics: Stop vital signs metric creation on wikimetrics - https://phabricator.wikimedia.org/T143715#2594864 (Nuria) Open>Resolved [16:10:29] milimetric, omw [16:10:43] elukey: i can look at ttraffic patterns on aqs today if you want [16:12:16] nuria_: if you do that, there is a spark script I wrote that can help : https://gist.github.com/anonymous/d7ac82770f52d8eaa71edead8f9e3a28 [16:12:35] joal: i was planning on borrowing/stealing that one, yes [16:12:43] nuria_: great ;) [16:12:53] nuria_: I was not sure you'd seen that [16:13:18] joal: will let you know what i found cc elukey [16:13:36] great, thx nuria_ [16:18:10] nuria_: please do, thanks a lot :) [16:30:24] Analytics-Kanban, Labs, Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594927 (Stigmj) @Ottomata yeah they are there. Thank you. [16:31:43] nuria_: would it be possible to move the hw meeting from Monday? (ops meeting colliding) [16:31:55] elukey: ah yes, let me allow edits [16:32:13] elukey: done [16:36:27] thanks! [16:53:41] PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:56:21] RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING [17:04:53] joal: FYI druid restarted with new zk cluster [17:04:58] i think it looks ok [17:05:43] ottomata: Just looked through pivot: no change for me ! [17:06:08] by the way a-team, no more chrome41 issue [17:06:13] going afk team! byyyeeee [17:06:22] ottomata: I'm gonna double check using coordinator UI [17:06:26] Bye elukey ! [17:06:27] k cool [17:06:28] cool [17:06:33] i see lots of segments being loaded i think [17:06:42] ottomata: Great :) [17:06:46] That's good sign ! [17:06:59] milimetric: no more issue since aug 17th [17:08:18] ottomata: Shall we remove the pageviews datasource (it was test) ? [17:08:21] milimetric: --^ [17:12:07] joal: uhhhh, up to yall [17:12:41] k ottomata :) [17:12:54] ottomata: Thanks for the zookeeper thing [17:21:27] yup, np! [17:21:28] lunchtime! [17:32:54] joal: the chrome issue went away with teh chnages brandon did to send "unauthorized" [17:33:06] to those requests , it stopped teh spam [17:33:08] *the [17:33:45] makes sense nuria_, didn't follow the ticket (my bad) [17:34:13] joal: np at all, it took several tries [17:37:06] joal: yeah, we can remove that, sorry was distracted [17:37:12] np milimetric [17:39:37] Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2594696 (DarTar) @jcrespo found the culprit. Let me talk to a few people but I think this is a legacy job that we can disable. [17:40:01] Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2595274 (DarTar) p:Triage>Normal a:DarTar [17:45:29] !log Drop pageviews test datasource in druid [17:45:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [17:47:26] Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2595287 (DarTar) @Neil_P._Quinn_WMF @Jdforrester-WMF @Milimetric this is one of the legacy scripts populating dashboards such as http://... [17:49:35] logging off a-team, bye ! [17:58:18] (CR) MaxSem: [C: 2 V: 2] Merge branch 'master' into production [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307460 (owner: MaxSem) [18:19:07] Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2595384 (Jdforrester-WMF) Yes, we want to migrate them to Dashiki. CCing @HJiang-WMF who's working on the broader objective. [18:23:24] Analytics-Kanban: Capacity projections of pageview API document on wikitech - https://phabricator.wikimedia.org/T138318#2595390 (Nuria) https://wikitech.wikimedia.org/wiki/Analytics/AQS#Capacity_Projections [18:30:53] Analytics: Dashboards working on mobile - https://phabricator.wikimedia.org/T144299#2595396 (Nuria) [18:31:07] Analytics, Analytics-Dashiki: Dashboards working on mobile - https://phabricator.wikimedia.org/T144299#2595408 (Nuria) [18:42:44] jdlrobson: did you get your question answered? hive -f blah.sql > out.txt does what you want [18:42:56] jdlrobson: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries [18:44:15] (PS3) Mforns: Disable the deprecated option by_wiki [analytics/reportupdater] - https://gerrit.wikimedia.org/r/306968 (https://phabricator.wikimedia.org/T132481) [18:45:05] (CR) jenkins-bot: [V: -1] Disable the deprecated option by_wiki [analytics/reportupdater] - https://gerrit.wikimedia.org/r/306968 (https://phabricator.wikimedia.org/T132481) (owner: Mforns) [18:54:02] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1743135 (bd808) >>! In T116206#2582429, @elukey wrote: > Thanks for reporting, this is my bad since analytics_hadoop_hosts is not in hiera labs. Since this value sh... [19:07:35] ottomata: yt? [19:10:02] hi yup [19:10:04] nuria_: [19:10:30] ottomata: wondering about this gist that joseph passed along: https://gist.github.com/anonymous/d7ac82770f52d8eaa71edead8f9e3a28 [19:10:52] ottomata: in order to look at the data [19:11:27] i added couple lines: [19:11:29] ottomata: [19:11:40] https://www.irccloud.com/pastebin/BXWhZsza/ [19:12:12] nuria_: ok, not familiar with it, but what's up? [19:12:22] ottomata: but executing this gives tons of errors on spark shell [19:12:26] https://www.irccloud.com/pastebin/rESGWdaq/ [19:12:51] how did you launch spark shell? [19:13:43] ./spark-hell [19:13:45] ./spark-shell [19:13:46] jajaj [19:13:50] ^ ottomata [19:13:56] do i need to add jars? [19:14:23] hm, no, just wondering about that requesting execturo thing [19:14:39] nuria_: those are all warnings though [19:14:41] does it not work? [19:14:53] ottomata: well there are pages 7 pages & pages of warnings [19:15:10] nuria_: i am not sure. hm [19:15:22] nuria_: I'm here for a minute [19:15:28] ottomata: ok [19:15:28] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2595633 (bd808) [19:15:29] this one sounds worrysome [19:15:30] WARN ExecutorAllocationManager: Unable to reach the cluster manager to request 25 total executors! [19:15:36] but the others sound harmless [19:15:49] nuria_: try launching spark-shell using --master yarn [19:15:56] oh ja :) [19:15:59] joal: ahahahaha [19:16:32] nuria_: hey i sort of got the answer. The problem I have is I'm working with too many rows. I think I need to sample it somehow. [19:16:49] joal : me forgot all about spark [19:17:02] nuria_: also, to prevent having to kill the shell, use LIMIT in your select * query, you'll have reasonable result size :) [19:17:26] (haven't worked out best way to do that yet) [19:17:30] jdlrobson: amounts of data are huge, you need to query like this: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/Queries#Always_restrict_queries_to_a_date_range_.28partitioning.29 [19:17:44] nuria_: or, use results.take(10).foreach(println) [19:17:45] yup that's what i figured [19:18:15] joal: ya, i was using limit but errors were all over teh place but hey, cause i was running locally [19:18:32] I'll wait in case you have another error :) [19:20:03] Quarry: Forking your own query results in a new one owned by YuviPanda - https://phabricator.wikimedia.org/T144309#2595654 (Huji) [19:22:39] nuria_: works better? [19:22:40] joal: nah, do not worry, we will figure it out. it is running now [19:22:45] joal: yes [19:22:46] ok great :) [19:22:50] tomorrow :) [19:23:04] ciao [19:37:49] ottomata: i take it back it no work [19:37:53] https://www.irccloud.com/pastebin/PaXKedzK/ [19:39:25] ottomata: will look around to see if there is a setting we canuse [19:40:58] nuria_: have you been looking at the spark job UI too? [19:41:06] ottomata: ah no [19:41:13] ottomata: in yarn? [19:41:32] ja [19:41:42] ssh -N stat1002.eqiad.wmnet -L 8088:analytics1001.eqiad.wmnet:8088 [19:41:43] then [19:41:50] localhost:8088 [19:41:51] find your job [19:41:54] click on application master [19:42:00] you ight have to change the uri back to localhost:8088 a few times [19:42:04] but ja [20:35:25] (CR) Nuria: [C: -1] Bookmark for browser dashboard regarding graph and time (2 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/306980 (https://phabricator.wikimedia.org/T143689) (owner: Nuria) [20:57:01] (PS1) MaxSem: Set permissions like mediawiki-config [analytics/discovery-stats] (refs/meta/config) - https://gerrit.wikimedia.org/r/307583 [20:59:33] Quarry: Forking your own query results in a new one owned by YuviPanda - https://phabricator.wikimedia.org/T144309#2595935 (yuvipanda) whoops :( try again? [21:56:49] Analytics-Cluster, Operations, Patch-For-Review: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596103 (Dzahn) I have rsynced the entire /var/lib/archiva from titanium over to meitnerium, the new jessie server. One single file, the conf/archiv... [22:02:05] nuria_: you around? [22:03:01] Analytics-Dashiki, Analytics-Kanban: Sort tabs layout alphabetically - https://phabricator.wikimedia.org/T144322#2596141 (Milimetric) [22:04:38] (PS1) Milimetric: Sort legend alphabetically [analytics/dashiki] - https://gerrit.wikimedia.org/r/307634 (https://phabricator.wikimedia.org/T144322) [22:07:15] Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2596173 (Milimetric) @HJiang-WMF: happy to help you migrate these to reportupdater / dashiki. Actually, VERY happy to do that, because... [22:30:30] Analytics-Cluster, Operations, Patch-For-Review: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596235 (Dzahn) Now we have the data but still get an Error 503 - Service Unavailable from the new server, even though the archiva service is running... [22:33:18] Analytics-Cluster, Operations, Patch-For-Review: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596244 (Dzahn) the issue is caused by archiva user being a different UID on old and new server, which means permissions are messed up even when we p... [22:46:45] Analytics, Analytics-EventLogging, DBA, Research-and-Data: Queries on PageContentSaveComplete are starting to pileup - https://phabricator.wikimedia.org/T144278#2596251 (DarTar) Same here. @HJiang-WMF @Milimetric: happy to help as needed, going over the SQL I used for the dashboards etc. I didn... [22:50:06] howdy! Trying here alongside #wikimedia-operations. I have a question about https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups. The "analytics-users" group has access to Hadoop/Hive on stat1004, but "(NO PRIVATE DATA)". what is meant by that? can someone have access to hadoop/hive but not private data (e.g. a sanitized, PII-less subset [22:50:06] of wmf.webrequest)? [22:50:52] Analytics-Cluster, Operations, Patch-For-Review: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596254 (Dzahn) fix running: root@meitnerium:/var/lib# find /var/lib/archiva/ -uid 108 -exec chown archiva:archiva {} \; [22:51:17] Analytics-Cluster, Operations: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596255 (Dzahn) [22:56:02] Analytics-Cluster, Operations: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2596268 (Dzahn) fixed, restarted service. i got the Archiva web UI now on meitnerium (when hacking my /etc/resolv.conf to point archiva.wm.org to it). [23:24:29] (CR) EBernhardson: [C: 2 V: 2] Set permissions like mediawiki-config [analytics/discovery-stats] (refs/meta/config) - https://gerrit.wikimedia.org/r/307583 (owner: MaxSem) [23:55:53] Analytics: Can't log into https://piwik.wikimedia.org/ - https://phabricator.wikimedia.org/T144326#2596371 (Tbayer)