[00:01:10] (CR) Elukey: "I had a chat with Madhu today about the overall code, I added some comments in the Fabric's code to make it clearer but nothing super impo" (11 comments) [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/261579 (https://phabricator.wikimedia.org/T122228) (owner: Madhuvishy) [00:47:45] madhuvishy: hey! around? [00:47:54] YuviPanda: yes [00:48:11] madhuvishy: can you help me debug why I think some of my events might be getting lost? [00:48:20] YuviPanda: sure, where are you? [00:48:33] madhuvishy: near the hammock [00:48:39] madhuvishy: I can come there [01:07:29] Analytics: Pageview API demo doesn't list be-tarask - https://phabricator.wikimedia.org/T119291#1933649 (Ijon) Ping @Milimetric ? [01:12:15] Analytics-Tech-community-metrics, Developer-Relations, DevRel-January-2016: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1933663 (Qgil) Thank you! If we assume that it is true that we have lost 23% (within a... [01:12:44] Analytics-Tech-community-metrics, Developer-Relations, DevRel-January-2016, developer-notice: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1933668 (Qgil) [01:38:46] (PS7) Madhuvishy: Fabric deployment setup for wikimetrics [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/261579 (https://phabricator.wikimedia.org/T122228) [01:56:49] (CR) Madhuvishy: Fabric deployment setup for wikimetrics (10 comments) [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/261579 (https://phabricator.wikimedia.org/T122228) (owner: Madhuvishy) [02:00:30] (CR) Madhuvishy: "recheck" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/263782 (owner: Madhuvishy) [02:10:57] (PS8) Madhuvishy: Fabric deployment setup for wikimetrics [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/261579 (https://phabricator.wikimedia.org/T122228) [09:28:01] (PS1) Hashar: Restrain pep8 to 1.5.x [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/264057 [09:28:24] (CR) Hashar: "The flake8 issues are due to a new version of pep8. https://gerrit.wikimedia.org/r/264057 fix it." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/263782 (owner: Madhuvishy) [09:47:08] Analytics-Wikimetrics, Education-Program-Dashboard: I want WikiMetrics integration with the education dashboard that lets you easily pull reports about courses, institutions, etc. - https://phabricator.wikimedia.org/T92454#1934045 (awight) [11:43:53] Analytics-Tech-community-metrics, DevRel-January-2016, Easy, Patch-For-Review: Entered text in Typeahead search field nearly not visible in Firefox 42: Fix the CSS - https://phabricator.wikimedia.org/T121101#1934145 (Aklapper) Merged upstream in https://github.com/VizGrimoire/VizGrimoireJS/commit/d7... [12:46:26] Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1934183 (Aklapper) ...and merged in https://github.com/VizGrimoire/GrimoireLib/commit/d646bcd07b584932a0f... [13:36:32] Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1934240 (Lcanasdiaz) Library deployed in our server. Waiting to generate a new set of JSON files. [13:47:32] Analytics-Tech-community-metrics, DevRel-January-2016, Easy, Google-Code-In-2015, Patch-For-Review: Clarify Demographics definitions on korma (Attracted vs. time served; retained) - https://phabricator.wikimedia.org/T97117#1934249 (Aklapper) >>! In T97117#1932785, @Nemo_bis wrote: >> Attracted: N... [14:19:16] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1934262 (Krenair) Well since I was already on it, a review had already been requested... [14:31:38] nuria: Please review my patch. I posted the link at Google Code-In website. :) [14:44:28] Analytics-Tech-community-metrics, DevRel-January-2016: "Unavailable section name" displayed on repository.html - https://phabricator.wikimedia.org/T121102#1934309 (Lcanasdiaz) The reason is a wrong URL returned by the search box, if we add &ds=scr to the URL it works. My workmate Quan and I are working on... [14:56:51] Analytics: Pageview API demo doesn't list be-tarask - https://phabricator.wikimedia.org/T119291#1934326 (Milimetric) We've been pondering this, @Ijon. I'm leaning towards declining it because this really was just a simple demo and proper support for *all* projects is not trivial (our project codes, dbnames,... [15:05:04] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1934342 (Sadads) @Krenair Thanks for the review, also I am new to gerrit. Learning the ins and... [15:17:00] Analytics: python-mwviews does not handle unicode in titles - https://phabricator.wikimedia.org/T123200#1934361 (Milimetric) Thanks @ResMar, getting to these issues will take a bit, as I have a bunch of high priority work, but I appreciate the code and thoughts. [15:21:58] Analytics-Backlog: Update reportcard.wmflabs.org with July-October data - https://phabricator.wikimedia.org/T116244#1934362 (Milimetric) > Thanks for the explanation! (For the record, the Phabricator convention has since been [[https://lists.wikimedia.org/pipermail/analytics/2016-January/004763.html |changed... [15:52:24] Analytics, Analytics-Kanban, Patch-For-Review: Foundation-only Geowiki stopped updating - https://phabricator.wikimedia.org/T106229#1934472 (Milimetric) The new problem seems to be that all the databases Evan's scripts write to are now in read-only mode: ``` ERROR 1290 (HY000): The MariaDB server is... [15:52:53] Analytics-Kanban: Foundation-only Geowiki stopped updating - https://phabricator.wikimedia.org/T106229#1934473 (Milimetric) [15:53:33] does anyone know why s1-analytics-slave.eqiad.wmnet is in --read-only mode? [15:53:44] (same with s*-analytics-slave) [16:01:10] milimetric: hola [16:01:14] hi [16:01:40] milimetric: is this your ops week? [16:02:04] no, but I kind of skipped mine during all-staff [16:04:53] hi ottomata, maybe you know this: [16:04:58] Analytics: Pageview API demo doesn't list be-tarask - https://phabricator.wikimedia.org/T119291#1934481 (Nuria) Agreed with @milimetric, our demo is just a showcase of what you can do with Api and it has many shortcomings,.We are planning on developing tools that are more robust. [16:05:05] why is s1-analytics-slave.eqiad.wmnet in --read-only mode [16:05:32] nuria: It's my week [16:05:49] Analytics-Kanban: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1934482 (Nuria) NEW [16:07:26] nuria: Is it for EL you were asking ? [16:07:51] joal; yes, are you subscribed to ops list? [16:07:58] Yes [16:08:02] I have seen faidon comment [16:08:29] Analytics-Tech-community-metrics, DevRel-January-2016: gerrit_review_queue can have incorrect data about patchsets "waiting for review" - https://phabricator.wikimedia.org/T121495#1934489 (Lcanasdiaz) It seems I was wrong! I was talking with @dicortazar and it seems the metric is right but **it does not r... [16:09:03] its in read only mode? [16:09:54] ottomata: yeah, I get an error about it from geowiki, which stopped updating because all the s*-analytics-slave(s) are in read only mode [16:10:01] Analytics-Kanban, DBA: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1934492 (Nuria) [16:10:32] by "they are in read only mode" I mean when you try to insert you get a db error about MariaDB being in --read-only mode [16:10:52] Hm, ok, don't know why, that is a different db than the EL one, right? [16:10:58] i will look into it in a bit [16:11:21] Analytics-Kanban, DBA: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1934482 (Nuria) From ops list: "SELECT * FROM information_schema.processlist ORDER BY time DESC" informs us of this: | 5599890 | research | 10.64.36.103:53669 | enwiki... [16:12:04] nuria: do we take a few minutes in cave to discuss what is expected as an answer to DB issues like that ? [16:13:29] joal: sure, let's do it after standup. I have filed a ticket to compile the info, so far I think that is the only thing we can do. [16:13:40] ok, thx [16:14:38] o/ [16:14:45] Hi elukey :) [16:18:15] Analytics: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1934501 (Tbayer) >>! In T123595#1933431, @Nuria wrote: > Tables will start existing once blacklisting is lifted, let us know when new sampling ratio has taken effect. > > I unde... [16:21:46] Analytics, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API from the Event Bus perspective - https://phabricator.wikimedia.org/T112956#1934503 (Milimetric) >>! In T112956#1906630, @Tgr wrote: >>>! In T112956#1904116, @Milimetric wrote: >> I just mean, can we find like five to ten... [16:24:10] Analytics: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1934514 (Nuria) @Tbayer: Given the many issues we have in our data store right now. Hardware: https://phabricator.wikimedia.org/T123546 Replication: https://phabricator.wikimedia... [16:28:04] Analytics: Pageview API demo doesn't list be-tarask - https://phabricator.wikimedia.org/T119291#1934519 (Milimetric) Speaking of a proper robust tool, check out this class of 10 students that's about to tackle the problem: T120497 !! [16:28:42] (CR) Nuria: [C: 2] Restrain pep8 to 1.5.x [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/264057 (owner: Hashar) [16:30:05] Analytics: Track pageview stats for outreach.wikimedia.org - https://phabricator.wikimedia.org/T118987#1934526 (Milimetric) >>! In T118987#1900916, @TFlanagan-WMF wrote: > Thanks, @Nuria. Is the process you mention quick and easy? I'm just thinking ahead if we need to report some pageview numbers for interna... [16:32:57] Analytics-Kanban, DBA: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1934530 (Nuria) From mobile team: https://gerrit.wikimedia.org/r/263538 was merged Monday 11th so new sampling rate of 0 should be applied to all wikis from tomorrow (Thursday 14th)... [16:33:17] Analytics-Tech-community-metrics, DevRel-January-2016: Legend for "review time for reviewers" and other strings on repository.html - https://phabricator.wikimedia.org/T103469#1390947 (Lcanasdiaz) Ok, so you want to display the content of the "desc" field for each metric displayed on the box on the left wh... [16:35:48] Analytics: update comScore description on report card - https://phabricator.wikimedia.org/T122059#1934536 (Milimetric) does anything else need to happen here? [16:39:27] a-team, FYI, see email I just sent to analytics list. i've disabled public access to yarn.wikimedia.org [16:39:48] ok ottomata [16:39:53] Awesome [16:39:55] oh elukey, btw, you should add pingyness to the 'a-team' keyword [16:39:55] ottomata: sweet, less holes [16:39:57] ottomata: any issue ? [16:39:59] we use that to ping each other [16:40:07] :) [16:40:32] cool ottomata [16:40:33] joal: no, real issue. someone emailed the security list and was worried about it, i actually wasn't aware that the REST API was on the same port as the GUI [16:40:46] you couldn't do anything more with the API than you could witih the GUI [16:40:51] makes sense to close it :) [16:41:02] but, it still wasn't good, it was very easy for me to curl the job history [16:41:12] i couldn't POST any actions [16:41:13] it sure was [16:41:17] but i didn't like that I could try to POST them [16:41:22] :) [16:41:44] I think with job history you get 7 days of details infos (like full quesries and stuff) [16:42:02] ottomata: So to access it now, VPN ? [16:42:04] full queries I also wasn't aware of, i thought it was just the beginning [16:42:08] ssh tunnel :/ [16:42:17] MwaaaaAAAAAAArf :( [16:42:27] unless we put some auth in front of it [16:42:39] That would be best [16:42:50] Analytics-Kanban, DBA: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1934539 (Nuria) {F3227882} [16:42:50] ja there is a ticket [16:43:05] or we really need something to make hue stable (hue has a job browser) [16:43:32] ok milimetric tell me about this db read only thing again? [16:43:37] this is not the EL db, rigth? [16:43:41] it's not, yes [16:43:50] it's the mediawiki db slaves [16:43:58] Hmmm ottomata I thought we could put it behind LDAP. But no? [16:44:05] and i think it's the old name, but i didn't follow up with how they changed that setup [16:44:24] madhuvishy: not via hadoop itself [16:44:27] maybe with varnish, dunno [16:44:41] though, ottomata some of those slaves have a "log" database in them, which is very weird IMO [16:45:06] i guess maybe they're all just pointing to analytics-store? :/ [16:47:15] seesh no idea [16:49:53] * elukey highlights a-team [16:50:07] * elukey probably pinged everybody [16:50:07] hmmm milimetric dunno, eventlogging_sync runs on this slave too? [16:50:09] :) [16:50:31] interesting... [16:50:41] it's not just one slave, it's all s*-analytics-slave(s) [16:50:50] there's like s1, s2, ..., s9 I think [16:51:11] so we're lost without Jaime... [16:51:13] hm [16:51:50] why the heck does eventlogging get replicated to all these slaves?! [16:55:29] oh hm [16:56:23] oh [16:56:25] wha? [16:56:35] milimetric: s1-analytics-slave is one of 2 m4-masters.... [16:56:44] .... [16:56:45] that are proxied to round robin by haproxy [16:57:05] er.... so Evan's script was writing to a prod db?!! [16:57:53] no um [16:58:13] i *think* 'm4-master' is an alias for eventlogging prod db, but it is actually spread across two dbs for inserts [16:58:21] and then this custom script selects from both of them [16:58:26] and inserts into analytics-store [16:58:43] * elukey is confused [16:58:49] so, eventlogging inserts into the log db on a prod slave [16:58:57] elukey is not confused, the setup is confused, elukey is normal [16:59:00] and then stuff from all prod slaves is replicated to analytics-store [16:59:10] oof [16:59:13] i have no idea though [16:59:16] i can't find any docs about this [16:59:35] i'm about 65% that's how things are [17:01:09] standuuuppp [17:01:13] ping ottomata madhuvishy standdddupppp [17:02:17] wha thaaa is going on? [17:02:17] ok! [17:04:48] Analytics-Kanban, Patch-For-Review: Add install instructions to script that calculates kanban metrics [1] - https://phabricator.wikimedia.org/T122626#1934565 (Nuria) Open>Resolved [17:05:56] Analytics-Kanban: Quaterly review 2016/01/22 (slides due on 19th) - https://phabricator.wikimedia.org/T120844#1934568 (Nuria) [17:05:58] Analytics-Kanban: Gather metrics about cluster usage {hawk} [5 pts] - https://phabricator.wikimedia.org/T121783#1934567 (Nuria) Open>Resolved [17:08:31] Analytics-Kanban: Investigate sample cube pageview_count vs unsampled log pageview count [13 pts] {hawk} - https://phabricator.wikimedia.org/T108925#1934579 (Tbayer) Update: So @JAllemandou was in SF last week for the Dev Summit and All Hands, and he and I took the opportunity on Friday to sit down in person... [17:09:49] Analytics-Kanban, Patch-For-Review: Add piwiki beacon to financial report website [5] - https://phabricator.wikimedia.org/T123263#1934586 (Nuria) Open>Resolved [17:10:24] Analytics-Kanban, Patch-For-Review: Piwik beacon on prod instance should be accessible [5 pts] - https://phabricator.wikimedia.org/T123260#1934589 (Nuria) Open>Resolved [17:10:41] Analytics-Kanban: Quaterly review 2016/01/22 (slides due on 19th) - https://phabricator.wikimedia.org/T120844#1934592 (Nuria) [17:10:43] Analytics-Kanban: Gather preliminary metrics of Pageview API usage for quaterly review {slug} [5pts] - https://phabricator.wikimedia.org/T120845#1934591 (Nuria) Open>Resolved [17:26:06] Analytics: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1934622 (Tbayer) Understood that these are timely and severe issues; I really appreciate the Analytics' team's hard work on fixing them. I should check with other stakeholders to b... [17:30:01] Analytics: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1934628 (Nuria) >would that be that a realistic timeframe for restoring them? Before adding more data we need to do the tokudb conversion, once we have an ETA for that we will u... [17:35:04] Analytics-Tech-community-metrics, DevRel-January-2016: Legend for "review time for reviewers" and other strings on repository.html - https://phabricator.wikimedia.org/T103469#1934642 (Aklapper) @Lcanasdiaz: That was my idea, indeed. But please tell me if it does not make sense or if it is way too complica... [17:39:32] Analytics-Tech-community-metrics, DevRel-January-2016, Easy, Google-Code-In-2015, Patch-For-Review: Clarify Demographics definitions on korma (Attracted vs. time served; retained) - https://phabricator.wikimedia.org/T97117#1934662 (Nemo_bis) > No idea how not mentioning additional terms helps eit... [17:40:34] Analytics: Find performance thresholds of piwik production instance - https://phabricator.wikimedia.org/T123640#1934665 (Nuria) NEW [17:58:24] madhuvishy: jfyi, I'm now collecting data for https://meta.wikimedia.org/wiki/Schema:CommandInvocation [17:58:32] about 50k events in 17h and steady state [17:58:44] a-team, will be 5 or 10 late to grooming, ok? [17:58:55] am running simple queries (select count(*)) on m4 master directly but will make sure to not overdo it [17:58:57] np [17:58:57] YuviPanda: okay - replication lag etc so data won't show up in slaves for a while [17:59:01] * YuviPanda picks out lice from ottomata [17:59:28] * milimetric watches in horror as YuviPanda starts to eat the lice [17:59:29] YuviPanda: can you add the SchemaDoc template to your talk? [17:59:31] madhuvishy: yeah, I'm just checking master to verify [17:59:39] madhuvishy: what's the SchemaDoc template? [17:59:51] I guess I can find out :D [17:59:53] ok [18:00:08] https://meta.wikimedia.org/wiki/Template:SchemaDoc [18:00:11] YuviPanda: ^ [18:00:19] yup [18:00:21] adding [18:00:23] now [18:00:27] I wasn't aware of this [18:00:29] https://meta.wikimedia.org/wiki/Schema_talk:Echo [18:00:30] cool [18:13:55] Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1934748 (Cmjohnson) The server is out of warranty but I have several spare DIMM for the R610's on-site. It appears that DIMM A3 is bad and needs to be replaced. I will need abou... [18:14:15] Analytics, Project-Creators: Dedicated and/or automated Wikimedia pageviews API project/tag in Phabricator Maniphest [1 pts] - https://phabricator.wikimedia.org/T119151#1934749 (Milimetric) a:madhuvishy [18:14:21] ottomata: bam, ^ might explain thing! [18:15:32] Analytics: Productionize last access jobs for daily and monthly calculations {bear} - https://phabricator.wikimedia.org/T122514#1934753 (Milimetric) p:Triage>High [18:16:00] YuviPanda: the mem chip? [18:16:01] Analytics-Kanban: Productionize last access jobs for daily and monthly calculations {bear} - https://phabricator.wikimedia.org/T122514#1906263 (Milimetric) [18:16:06] yeah [18:16:29] Analytics: Productionize last access jobs for daily and monthly calculations {bear} - https://phabricator.wikimedia.org/T122514#1906263 (Milimetric) [18:17:07] ottomata: I think you can failover to use a different dbproxy first [18:17:44] Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1934760 (Nuria) @ottomata: can you coordinate a 5 minutes outage today? [18:20:36] Analytics, Analytics-Cluster, Patch-For-Review: Single Kafka partition replica periodically lags - https://phabricator.wikimedia.org/T121407#1934764 (Milimetric) a:Ottomata>elukey [18:21:18] Analytics-Tech-community-metrics: Mismatch between six names and certain email address in mediawiki-identities data - https://phabricator.wikimedia.org/T123643#1934774 (Aklapper) NEW [18:21:47] Analytics: Pageview API demo doesn't list be-tarask - https://phabricator.wikimedia.org/T119291#1934787 (Ironholds_backup) It's worth noting that the API itself does include be-tarask data (I just checked it). Agreed on the need for a stats.grok.se replacement. Began idly noodling on one using the stats.grok... [18:28:39] Analytics: Create Pageview API dashboard to monitor response times - https://phabricator.wikimedia.org/T121277#1934817 (Milimetric) a:GWicke [18:28:49] Analytics: Create Pageview API dashboard to monitor response times - https://phabricator.wikimedia.org/T121277#1934819 (Milimetric) Open>Resolved Done thanks to Gabriel, re-assigning. https://grafana.wikimedia.org/dashboard/db/pageviews [18:29:12] Analytics: 'is_spider' column in eventlogging user agent data {flea} - https://phabricator.wikimedia.org/T121550#1934823 (JAllemandou) The easiest way to add user-agent refinement to eventlogging would be to use the refinery code through hive or spark on eventlogging logged into hadoop. [18:29:41] Analytics: 'is_spider' column in eventlogging user agent data {flea} - https://phabricator.wikimedia.org/T121550#1934826 (Nuria) Adding this column to the capsule requires work on the EL mysql database end of things which is having a lot of issues right now (as a new column needs to be added to every single... [18:30:00] Analytics: Use a new approach to compute monthly top 1000 articles (brute force probably works) - https://phabricator.wikimedia.org/T120113#1934827 (Milimetric) [18:31:11] ottomata: https://phabricator.wikimedia.org/T114199 - Educational task? What do you think?? [18:31:45] (we all love systemd) [18:31:59] heheh, yeah! hm. [18:32:01] could be good. [18:32:08] tricky though, because eventlogging is currently all about upstart [18:32:16] but ja [18:32:20] * milimetric out for lunch [18:32:45] elukey: the newish eventlogging-service (for eventbus) has been puppetized for systemd and jessie [18:33:16] so we'll probably want to model the other daemons after that [18:33:17] maybe. [18:33:20] if I did it right :) [18:33:43] Analytics, operations, ops-eqiad: Possible bad mem chip or slot on dbproxy1004 - https://phabricator.wikimedia.org/T123546#1934841 (Ottomata) Eeee, I'm not so sure. Are we sure eventlogging is the only user of m4-master? [18:34:24] ottomata1: I'll need to familiarize with event logging first (and its new incarnations) so it might be really good for me [18:35:01] otherwise if it is already in your plans I'll be glad if you CC me :) [18:37:50] elukey: it is, but it isn't first priority [18:38:01] elukey: i recently merged some eventlogging docker stuff [18:38:08] :O [18:38:19] if you get docker set up and check out recen tmaster, you'll be able to run eventlogging in a docker instance [18:39:17] ottomata1: nuria I was talking to YuviPanda yesterday and we were wondering if we should make the wikimetrics dev setup docker too, instead on mw vagrant - because there's no real mediawiki dependency [18:39:20] milimetric: i have no idea about the s*-analytics slaves being in read only [18:39:32] madhuvishy: not a bad idea [18:39:37] especially if only for development purposes [18:39:42] ya just for dev [18:39:43] although [18:39:48] doesn't wikimetrics query mw dbs? [18:40:07] yeah - YuviPanda says we can just use the labs replica directly [18:40:59] ottomata: i don't know if it conflicts with the way it's currently setup, but seems feasible [18:41:59] madhuvishy: what would change in the move between Vagrat to Docker only for dev purposes? Just curious [18:42:12] neilpquinn: yt? [18:42:21] It would be great to have the container even for Labs [18:42:28] the labs replica is queriable from outside of labs?! [18:42:49] elukey: we'd just stop using the mw vagrant puppet setup [18:42:51] ottomata: kindof. [18:42:53] ottomata: ssh tunnels [18:42:57] ottomata: and I want to do it anyway [18:42:59] ah [18:43:06] since that'll make lcoal development for tools folks too far easier [18:43:21] nuria: do you know who neil is? i want to contact him about this query [18:43:28] i'm really not sure what we should do [18:43:32] ottomata: i know neil [18:43:37] oh? [18:43:48] he's a product analyst in editing [18:44:01] is it insane to just ask that no one do long queries on anatlyics-store until next week and jaime can address these issues? [18:44:53] he has an select/insert query on analytics-store running for the last 20 hours [18:44:57] YuviPanda: if you have time today/tomorrow it would be great to see the difference, I am still completely ignorant (about current dev cycle in wikimedia) [18:45:00] ottomata: oh [18:45:18] likely not the source of hte problems [18:45:22] but it surely isn't helping [18:46:10] ottomata: I usually just kill it (on labsdb at least) [18:46:15] ottomata: we've been having replag too [18:46:25] usually just 'KILL ALL THE QUERIES, FIND PEOPEL RUNNING QUERY AND STOP THEM' [18:46:27] fixes it [18:46:36] ottomata: yes ping neilpquinn [18:46:57] ottomata: i think asking that should be fine [18:46:57] elukey: sure! I wasn't going to come to office today but can come by in the afternoon to chat [18:47:06] ottomata: given that those queries are not likely to succeed [18:47:08] even tomorrow, don't worry :) [18:47:40] elukey: i don't think i'm coming today too. YuviPanda lets do it tomorrow? [18:47:49] madhuvishy: sure [18:47:53] what is 'it' though? [18:48:05] hello mforns. :-) I have left a comment about my assessment of the pageview stats tool (from the community wishlist) and mentioned the demo you have worked on. just fyi: https://meta.wikimedia.org/wiki/Talk:2015_Community_Wishlist_Survey/Top_10/Status_draft [18:48:06] YuviPanda: he he docker things [18:48:16] * leila says hi to folks in this channel. [18:48:21] madhuvishy: ah ok [18:48:22] hiii! [18:48:44] leila: o/ [18:51:16] ottomata: I agree with YuviPanda on stopping queries that are not likely to work I mean, what else can we do? [18:52:50] ottomata: pinging neilpquinn again ... [18:53:05] yeah, plus I kill them first and then tell people [18:53:07] :D [18:53:11] rather than the other way around [18:58:01] milimetric: [18:58:18] nuria: can you think of any reason why this custom replication script would need to have records inserted in order of uuid? [18:58:39] hmmMmMM [18:58:45] maybe I should limit hte mysqldump query that this thing is doing [18:58:50] to like 10000 ro 1000000 rose [18:58:52] rows [18:59:59] ottomata: it used to be we had an autoincrement on the tables that is no longer there , maybe the uuid order is from that? [19:00:20] ottomata: cause no, there is no need to preserve uuid order that i can think of [19:00:21] it might just be what mysqldump does by default [19:00:30] this is for replicaiton though, so auto increment won't matter [19:00:36] this is just the query the mysqldump does to grab the data [19:00:40] so it does the inserts in order [19:00:48] but uuid is already set in the master [19:00:51] so it shouldn't matter [19:00:53] that should help speed up [19:01:04] as well as maybe limiting the number of records inserted at a time [19:01:44] Ironholds: good news [19:01:49] i special cased your table and ran a sync [19:01:54] good news? [19:01:55] WikipediaPortal_14377354 is up to date now [19:02:00] ooohhhh [19:02:03] ja, its just that big tables are blocking small ones [19:02:06] since htey have so many records [19:02:19] i'm going to try limiting the number of records batched by the mysqldump and see if it helps [19:02:28] ottomata, thanks! [19:02:51] will that persist, or? [19:02:56] IOW, will my scripts break again tomorrow? ;p [19:02:59] its up to date as of now [19:03:03] i just ran a sync specially for your table [19:03:10] the main sync is still busted blocking on other large tables [19:03:15] i'm going to try to fix that [19:03:24] so that those large tables might lag, but smaller ones won't [19:03:27] not sure how it will go, but will try [19:03:38] oh, weird meeting happening now, ja? [19:03:43] gotta put that on in backrounnnd [19:03:44] :) [19:03:52] pffft ottomata join the meetingless :P [19:05:19] ottomata: 4 years! [19:06:38] woo ottomata! [19:06:56] * YuviPanda is also 4y but won't count because contractor for part of it [19:07:16] :) [19:07:29] YuviPanda: I think it should? Talk to HR. [19:07:41] James_F: I was also part time for another 5 months afterwards. [19:07:51] when I was in uni [19:07:55] but yeah, maybe I should [19:08:15] YuviPanda: Correcting 'start' dates is a continuing effort. [19:08:37] :D [19:08:42] I'll talk to them [19:08:46] thanks for the poke, etc :D [19:09:18] I was also theoretically not working for the WMF for a few months in the middle (but wrote the Commons app at that time :P) but not sure if that's a leave of absence or not [19:09:20] oh well [19:11:30] wait whahhhh [19:11:37] i thought i deleted on the MobileWebSectionUsage_15038458 tables nuria [19:11:38] apparently not [19:11:43] one of the masters (that i didn't know existed) [19:11:47] whatata? [19:11:48] still has it [19:11:56] and it is being selected from and inserted back into [19:12:12] inserted? [19:12:16] ottomata: how so? [19:12:28] ottomata: selected maybe [19:12:33] ottomata: but inserted? [19:12:34] dunno, i'm still confused by which databases are where [19:12:38] i thought there were two [19:12:40] but there are 3 [19:12:44] ottomata: what timestamps do inserts have? [19:12:45] 2 of which are 'm4-master' [19:12:55] 20160113192418 on analytlics-storre [19:13:09] the dump processes are selecitng LOTS of data from db1046 [19:13:10] ottomata: so that is yesterday's [19:13:22] i think that is hte real master [19:13:29] yes [19:13:32] db1046 for writes [19:13:40] 1046 is teh master for writes [19:13:43] db1047 is another slave? [19:13:48] but also called m4-master? [19:13:58] ? [19:14:04] no idea [19:14:30] [@dbproxy1004:/home/otto] $ cat /etc/haproxy/conf.d/db-master.cfg . [19:14:30] listen mariadb 0.0.0.0:3306 [19:14:30] mode tcp [19:14:30] balance roundrobin [19:14:30] option tcpka [19:14:31] option mysql-check user haproxy [19:14:31] server db1046 10.64.16.35 check inter 3s fall 3 rise 99999999 [19:14:32] server db1047 10.64.16.36 check backup [19:15:14] let me see , i ahd : https://grafana.wikimedia.org/dashboard/db/server-board?from=1446931397231&to=1449523157231&var-server=db1046&var-network=eth0 [19:15:17] for master [19:15:38] what is db1047, why is it running eventlogging_sync, and why does haproxy to m4-master include it in configs? [19:16:05] ottomata, is there a phab task for these problems? Tomasz wants something to subscribe to [19:16:21] ottomata: looking at graphs teh one working is 1046 [19:16:34] Ironholds: yes. https://phabricator.wikimedia.org/T123634#1934492 [19:17:12] ta! [19:17:25] ok, i'm going to stop those queries and make sure this table is gone, we can talk about bringing it back later [19:17:30] but re-replicating it all now is not good [19:17:55] ottomata: agreed [19:19:13] ottomata: if you look at graphana definitely 1047 is receiving cyclic updates: https://grafana.wikimedia.org/dashboard/db/server-board?var-server=db1047&var-network=eth0 [19:23:42] PROBLEM - Host erbium is DOWN: PING CRITICAL - Packet loss = 100% [19:24:09] ottomata: is that you --^ [19:24:14] ? [19:24:25] joal: i doubt it [19:24:30] hmm [19:24:38] nuria: what do we have on erbium ? [19:24:39] joal: what's erbium ? wasn't it udp2log? [19:24:49] no, but erbium is decommed [19:24:49] hmmm, synchronicity :) [19:24:55] probably someone just taking it down? [19:25:01] ok ... [19:25:09] joal: ya, https://wikitech.wikimedia.org/wiki/Erbium [19:25:27] joal: so it is dying in fron of our eyes [19:25:31] *in front [19:26:07] ottomata must be happy :) [19:26:21] ops channel have noticed as well [19:27:25] leila, just read your line, thx! [19:27:41] sure, mforns. [19:50:56] (CR) Hashar: "recheck" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/263782 (owner: Madhuvishy) [19:51:17] hashar: thank you :) [19:51:29] madhuvishy: hello :-} [19:51:43] hashar: hi! [19:52:00] the new pep8 version is causing a bunch of errors on most of our python projects :-( [19:52:23] luckily we have some volunteers that set the upper version limit and/or fix the new linting errors :D [19:52:43] hashar: hmmm :/ should I change anything on wikimetrics? [19:53:55] madhuvishy: it got fixed via https://gerrit.wikimedia.org/r/#/c/264057/1/tox.ini [19:54:18] and your patch that was previously failing ( https://gerrit.wikimedia.org/r/#/c/263782/ ) is now passing [19:54:25] hashar: oh cool! awesome, thanks :) [19:54:26] (I triggered the job by commenting 'recheck' in Gerrit) [19:54:34] yeah, i saw that [19:54:37] which test the patch after it got merged against the tip of the branch [19:55:00] every morning I looked at all patches that have been rejected by Jenkins :-D [19:55:29] (CR) Hashar: "recheck" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/263911 (owner: Wassan.anmol) [19:55:31] !log restarted eventlogging_sync script to insert batches of 1000 [19:55:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [19:55:43] hashar: i was gonna ping you myself but so many meeting [19:55:45] meetins [19:55:56] gah i can't type [19:56:31] madhuvishy: I left comments on the wikimetrics change, mostly minor [19:58:11] YuviPanda: thanks [19:58:16] madhuvishy: or you can ping folks in #wikimedia-releng as well :} [19:58:24] will do :) [19:58:39] YuviPanda: if you have time tomorrow I'd also like to talk about how to push "secrets" in labs to mimic as much as possible Prod (not this specific use case) [19:58:58] elukey: that one :D [19:59:00] elukey: sure [19:59:12] hmm, ja interesting nuria, the full MobileWebSectionUsage tables were not removed from db1047 [19:59:19] I think current consensus is 'you can not replicate prod without hosting your own puppetmaster, and hosting your own puppetmaster makes YuviPanda sad' [19:59:30] ottomata: I'm so sorry, I didn't see the notifications on my phone! [19:59:32] haahhaha [19:59:45] yeah I don't want to make Yuvi sad [19:59:46] neilpquinn: its ok, i haven't killed your query :) [19:59:51] bad ottomata [19:59:52] just have been thikning about it [19:59:53] baad [20:00:01] but it would be great to have a "standard" process [20:00:06] must kill queries first before asking people, how else do you get them to loathe you? [20:00:08] elukey: I agree. [20:00:19] elukey: this is the first time we're using a 'private'ish gerrit repo [20:00:28] elukey: labs in general isn't supposed to have really private data [20:00:31] for example, we could use the "fake" private repo and then use a common solution to replace the placeholders with "secrets" in some way [20:00:36] ottomata: that query is for generating quarterly review metrics so it's moderately important. but the sky won't fall if you kill it. so if I'm causing a server meltdown feel free to do it. [20:00:37] nuria: , so we could possible restore the old data from db1047, instad of backfiling it all from files [20:00:39] yep yep [20:00:43] they look like MyISAM tables (I think???) [20:01:04] oh, no [20:01:05] Aria? [20:01:18] elukey: yeah, problem is that you have to only give access to instances from a particular project [20:01:26] so, we could probably just shut down the dbs, copy the files over [20:01:28] elukey: and that's a hard problem to do totally right, since people have root on those machines and can lie [20:01:29] anyway thanks for not killing it! [20:01:29] iunno if it is worht it though [20:01:55] neilpquinn: how long do you think it will run? [20:03:21] YuviPanda: yep :( [20:03:57] elukey: in the Glorious Future(TM) projects like this will end up being on kubernetes, which has built in secret management [20:04:45] looking forward to it :) [20:05:36] ottomata: I'm honestly not sure—I didn't expect it to take this long. I imagine you've looked at the full query? [20:06:49] Analytics-Kanban, DBA, Patch-For-Review: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1935163 (Ottomata) Ok, I've modified the eventlogging_sync.sh script to do the custom mysqldump | insert based replication in batches of 1000 rows. This means t... [20:07:17] neilpquinn: not in detail [20:11:12] madhuvishy: do you have a few minutes to help me with https://phabricator.wikimedia.org/T120900 [20:11:42] * madhuvishy looks [20:11:56] YuviPanda: ummm sure [20:12:23] madhuvishy: ok, so... [20:12:29] madhuvishy: they are all in the 'analytics' project, right? [20:12:44] YuviPanda: yes [20:12:49] so if you look at https://wikitech.wikimedia.org/wiki/Hiera:Tools [20:12:53] and we probably dont need it for wikimetrics [20:12:56] you see how valhallasw and scfc have added their root key [20:12:58] right [20:13:00] but for limn :D [20:13:03] right [20:13:06] which is still self hosted I think [20:13:12] anyway [20:13:15] yup [20:13:20] add a key called "passwords::root::extra_keys": [20:13:24] with the value being a dict [20:13:27] with username: ssh key [20:13:35] should i make a new key, or put my labs one? [20:13:39] to https://wikitech.wikimedia.org/wiki/Hiera:Analytics [20:13:43] madhuvishy: is ok to put in your labs one [20:13:50] YuviPanda: okay [20:13:53] i'll add [20:14:11] madhuvishy: then need to run puppet on the instance to see if it worked [20:14:18] okay [20:14:21] and if not we need to just add the key manually (I can help with that too) [20:14:28] mm hmmm [20:14:34] give me a minute to add this [20:14:42] madhuvishy: and if/when you change your labs key you should remember to change this too [20:14:45] madhuvishy: thanks! [20:14:50] okay [20:14:52] this isn't as big a deal for the wikimetrics project [20:14:56] because it's fully puppetized [20:14:58] ottomata: how would we restore data from 1047? [20:15:02] and so we (labs roots) can fix nuderlying issues [20:15:07] drops must be propagated there too right? [20:15:30] no [20:15:33] drops? [20:15:46] "drop table" [20:15:51] naw [20:15:52] its not a slave [20:15:53] Analytics-Tech-community-metrics, DevRel-January-2016: gerrit_review_queue can have incorrect data about patchsets "waiting for review" - https://phabricator.wikimedia.org/T121495#1935183 (Aklapper) @Lcanasdiaz: Pull request to fix this task created in https://github.com/Bitergia/mediawiki-dashboa... [20:15:56] log is not replicated like that [20:16:11] and in didn't run drop on this box [20:16:18] because i didn't know it existed yesterday [20:16:20] ottomata: ok, I know even less about this than i thought [20:16:23] this box is kinda like analytics-store [20:16:28] log is replicated there in the sam eway [20:16:43] Analytics-Tech-community-metrics, DevRel-January-2016: gerrit_review_queue's "waiting for review" column name misleading (also includes unmerged CR+1 patches) - https://phabricator.wikimedia.org/T121495#1935190 (Aklapper) a:Lcanasdiaz>Aklapper [20:17:01] Analytics-Tech-community-metrics, DevRel-January-2016, Patch-For-Review: gerrit_review_queue's "waiting for review" column name misleading (also includes unmerged CR+1 patches) - https://phabricator.wikimedia.org/T121495#1880244 (Aklapper) [20:17:22] can you run : SELECT table_name, (DATA_LENGTH + INDEX_LENGTH)/1024/1024/1024 as `TOTAL SIZE (GB)`, ENGINE, CREATE_OPTIONS FROM information_schema.tables WHERE TABLE_SCHEMA='log' /* AND `ENGINE` <> 'TokuDB' */ ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC LIMIT 30; [20:17:42] ottomata: and paste the results of that on 1047 on the ticket ? [20:17:52] ottomata: that way get a glimpse of how big tables are [20:18:18] YuviPanda: done - should i run puppet on limn1? [20:18:22] madhuvishy: yup [20:18:25] ok [20:18:36] nuria [20:18:37] you want all tables? [20:18:45] you have a comment in there to not select tokudb [20:18:49] do you want to not select? [20:18:57] ottomata: that will get you just teh 30 biggest [20:19:39] done [20:19:42] Analytics-Kanban, DBA, Patch-For-Review: EL replication having issues since at least January 11th - https://phabricator.wikimedia.org/T123634#1935198 (Ottomata) db1047(???) big tables: ``` MariaDB EVENTLOGGING m4 localhost log > SELECT table_name, (DATA_LENGTH + INDEX_LENGTH)/1024/1024/1024 as `TOT... [20:19:46] YuviPanda: i cannot get into limn1 :/ [20:19:53] hahaha [20:19:55] nice [20:20:50] ottomata: okay. I've been reading scrollback, but I don't fully understand what the issue is or how helpful canceling my query would be. So if you just you keep in mind that this is fairly (but not massively) important, I leave whether to cancel it up to you. If you think it's necessary, I trust you. Let me know if you need more info (e.g. to gauge its [20:20:50] importance). [20:21:12] madhuvishy: yeah, puppet has been broken for months [20:21:36] YuviPanda: hmm :/ [20:22:03] oh, I just saw the Phab ticket. That might clarify. [20:22:11] madhuvishy: try sshing in as root now [20:22:13] should work [20:22:18] I've added your and milimetric's key to it [20:22:24] YuviPanda: okay trying [20:22:25] bt the instance itself is in many ways unrecoverable otherwise [20:22:35] elukey: ^ is why self hosted puppetmasters make me sad [20:22:50] YuviPanda: ya i can get in [20:22:59] madhuvishy: cool. [20:23:02] hopefully we can get limn out of self hosted soon as well [20:23:15] madhuvishy: I want to show you how to add other people to root keys [20:23:24] sure [20:23:31] madhuvishy: on a working labs instance, you can use ssh-key-ldap-lookup [20:23:33] to get their keys [20:23:37] madhuvishy: and then you can add it to [20:23:40] /etc/ssh/userkeys/root [20:23:43] and that's it [20:23:47] that's the same thing adding it to Hiera: does [20:23:57] when you have working puppet [20:23:58] okay [20:24:16] do i need to be root to edit userkeys/root? [20:24:20] madhuvishy: feel free to add others in your team to root key in hiera too ( elukey maybe?) so you're less reliant on us :) [20:24:22] madhuvishy: yes [20:24:25] madhuvishy: well, or have sudo [20:24:37] madhuvishy: but usually the root key manual change is only needed when puppet is completely broken [20:24:38] as is the case now [20:24:45] okay - don't we all have sudo on labs instances we can get to? [20:24:45] 90% of the time just hiera is good enough [20:24:47] right [20:24:53] makes sense [20:25:02] this is just the emergency manual fix [20:25:05] cool [20:25:11] i'll add anyone who needs to be, for our self hosted ones [20:25:14] madhuvishy: can you add milimetric's key to the analytics hiera? [20:25:19] and then I can call that ticket closed :D [20:25:23] doing [20:26:17] YuviPanda: done [20:26:18] neilpquinn: ok thanks [20:26:20] i thikn i don't need to [20:26:24] thx both [20:26:32] milimetric: \o/ thanks [20:26:35] err [20:26:38] madhuvishy: ^ \o/ thanks [20:26:41] YuviPanda: also fixed puppet stuff [20:26:47] based on your CR [20:26:59] madhuvishy: yup [20:27:08] madhuvishy: shall I merge now or do you want someone to merge the deploy patch first? [20:27:38] YuviPanda: those are independent i think - i wanna make sure the submodule will go away though [20:27:49] ok [20:27:51] yeah good point [20:28:14] I removed it from .gitmodules [20:28:16] ottomata: you've merged patches removing submodules before, right? [20:28:21] madhuvishy: I didn't follow 100%, but you're working on limn1? [20:28:28] milimetric: no [20:28:33] we got root [20:28:38] oh ok, good [20:28:47] but you're not changing the self-hosted setup there, right? [20:28:54] YuviPanda: yes [20:28:56] and hopefully in the near future we can kill its self hosted ness [20:28:58] noo [20:28:58] that instance has a few things that need to get manually migrated before we do that [20:29:04] i'm not touching it [20:29:07] k [20:29:26] ottomata: can you take a look at https://gerrit.wikimedia.org/r/#/c/260687/10 to see if it's doing it right? [20:30:26] YuviPanda: if not we can first merge a separate remove submodule commit, and then do this [20:30:45] madhuvishy: it's all the same I think, this is probably easier anyway to do it in one go [20:30:52] okay [20:30:58] if it works i'm for it [20:31:01] ok [20:31:12] let's try get it merged today then [20:31:57] milimetric: will you have time to review the fabric part? elukey did one pass yesterday, but it would be good for you to look and merge it [20:32:19] sure, looking [20:32:29] YuviPanda: , i think that should do it [20:32:31] milimetric: can you also merge https://gerrit.wikimedia.org/r/#/c/263782/ minor path change in wsgi file [20:32:40] ottomata: ok! [20:32:43] although i might do the removal of the submoduel as a separate commit [20:32:47] just in case gerrit or something gets confused [20:32:55] (PS2) Milimetric: Change config path in wsgi file [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/263782 (owner: Madhuvishy) [20:32:58] it'll get confused anyway right? [20:33:02] (CR) Milimetric: [C: 2 V: 2] Change config path in wsgi file [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/263782 (owner: Madhuvishy) [20:33:03] iunno [20:33:07] milimetric: thanks [20:33:08] teehee :D [20:33:18] ottomata: i'm going to merge now and see if anything gets confused. [20:33:43] YuviPanda: ya - nothing will die because of it anyway [20:33:51] well [20:33:57] prod puppetmaster could theoretically die :D [20:34:03] if it's stuck on submodule stuff [20:34:10] lol that i dunno [20:34:18] wikimetrics wont die [20:34:22] k [20:34:48] madhuvishy: needs manual rebase [20:35:18] YuviPanda: i can try doing that [20:35:29] ok, update on EL stuff: at least tables are now being iterated through [20:35:32] 1000 rows at a time [20:35:43] it still takes way too long to do this though [20:36:25] YuviPanda: rebased and pushed [20:37:18] (CR) Milimetric: [C: 2 V: 2] Fabric deployment setup for wikimetrics [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/261579 (https://phabricator.wikimedia.org/T122228) (owner: Madhuvishy) [20:37:36] looks great to me, Madhu [20:37:42] nice and clean [20:37:44] meh [20:37:48] it broke puppetmaster on prod [20:37:49] let me clean up [20:38:03] milimetric: thanks :) [20:38:06] YuviPanda: awww [20:41:53] milimetric: ok i'll setup a prod server with the role once YuviPanda fixes puppetmaster. we should load the backup from current prod into labsdb sometime [20:42:17] I think [20:42:18] we've to revert [20:42:21] and do it in two steps [20:42:24] as the wise ottomata suggested [20:42:25] YuviPanda: okay [20:42:29] me toooo [20:42:35] as everyone except me suggested [20:42:39] i'll make a separate patch [20:42:49] git does not take lightly to hubris [20:42:55] he he [20:43:05] you have to revert though [20:43:32] done [20:52:29] YuviPanda: i'm a little unsure of how to resubmit the old patch [20:52:39] if i rebase, it reverts everything [20:52:43] hmm [20:52:44] good question [20:52:59] i can make a new patch [20:53:12] madhuvishy: I think you can revert the revert [20:53:15] madhuvishy: with gerrit [20:53:18] oho [20:53:22] then checkout locally [20:53:25] and then do [20:53:30] git reset HEAD^ [20:53:33] and then add them back? [20:53:35] that will also have submodule remove stuff, that's okay? [20:53:50] hmm [20:53:52] good point [20:53:54] idk [20:53:56] try reset HEAD^ [20:53:58] see what happen [20:54:00] s [20:54:30] i can't revert the revert [20:54:41] if there's even such a thing [20:54:56] no revert button [20:54:58] ? [20:55:12] there is [20:55:38] but that's not for reverting the revert no [21:03:56] milimetric: do you think we can backup the db and deploy to new prod today? [21:04:06] i would need your help in restoring the backup [21:04:18] because YuviPanda will soon break our prod [21:04:29] well, not necessarily [21:04:33] yeah [21:04:47] if you stop salt-minion in your instances salt won't hit them [21:04:51] but the solution is to stop puppet runs on the instance all together [21:04:58] no [21:05:00] salt [21:05:00] madhuvishy: sure, I'm ok with that. [21:05:03] puppet is already broken there I think [21:05:06] right [21:05:10] salt is just our remote command execution thingy [21:05:13] do we still have access to NFS from prod, I forget? [21:06:07] milimetric: i think so. we can access from anywhere on the analytics project right? [21:09:47] YuviPanda: do you have to first remove the submodule everywhere before merging? [21:10:04] merging new module [21:10:11] madhuvishy: yeah [21:10:19] madhuvishy: we removed the submodule already, problem is that the files are left behind [21:10:27] yeah [21:11:18] YuviPanda: i think you should go ahead - i'll set up the new prod instance meanwhile [21:11:30] madhuvishy: ok [21:11:34] we're planning in -labs [21:13:00] madhuvishy: so I'll send a note to wikimetrics-l and back up the db to NFS [21:13:14] milimetric: awesome, thanks [21:13:38] so wait, we're not expecting anything crazy to go wrong, right? [21:13:46] like, we're just backing up the db to be safe [21:13:57] because there are also the files to back up otherwise [21:14:12] milimetric: which files? [21:14:28] oh the reports [21:14:31] the report results [21:14:36] we should move them over too right? [21:15:45] milimetric: as far as I understand, nothing will happen to the prod instance. only the puppet folder will get deleted [21:15:58] YuviPanda: ^ [21:16:15] well [21:16:18] the wikimetrics module will [21:16:24] yeah [21:16:26] but I'm going to skip the current wikimetrics instances [21:16:26] that's fine [21:16:28] when doing it [21:16:32] okay [21:16:33] just to give you guys more breathing room [21:16:42] thanks [21:19:05] madhuvishy: ok, so the backup on that was running normally, except the redis file (which was broken anyway) [21:19:14] the last backup is in /data/project/wikimetrics/backup/wikimetrics1/hourly [21:19:18] and they're saved daily also [21:19:21] so we're good [21:20:09] madhuvishy: let me know when you're done running puppet so I can clear the crontab [21:21:40] milimetric: okay - waiting on YuviPanda to re-merge our new module [21:22:04] i will setup staging once again non self-hosted to make sure, and then prod [21:27:22] milimetric: what are the three endpoints you are referring to re the pagestats tool? [21:27:51] ooh, I should clarify [21:28:24] I'll wait for it then, milimetric. thanks! :-) [21:31:38] k, sent [21:32:27] * elukey wants documentation about how to deploy projects like wikimetrics on labs/prod [21:36:08] elukey: we can do it together over batcave if you want [21:36:15] after timo's talk [21:36:34] a-team, see you tomorrow, good luck with the db! [21:37:01] mforns: o/ [21:37:23] madhuvishy: let's do it tomorrow if you are in the office! [21:38:24] a-team: please check your ssh clients :) [21:38:34] Ref: [Wmfall] Update your openssh clients [21:39:12] homebrew has the new openssh version for the mac users (brew update && brew upgrade) [21:46:08] thanks elukey, I added UseRoaming no [22:11:12] YuviPanda: you haven't merged yet right? [22:11:16] madhuvishy: no [22:11:28] okay [22:11:29] madhuvishy: and don't wait on me, since I'm excluding your instances from all of this [22:11:52] YuviPanda: no, i want to setup a staging instance based on the merged module, and then prod [22:12:00] aaah [22:12:02] right [22:12:04] ok [22:12:06] makes sense [22:12:15] madhuvishy: I'll be able to merge in <10min, waiting for another run to complete [22:12:28] ya np, just let me know whenever [22:18:33] madhuvishy: so we're restoring the database to labsdb then? [22:18:40] milimetric: yeah [22:19:01] we'll need someone to create the db I think, not sure our users have that right [22:19:15] we do - if you clone wikimetrics-deploy with submodules, you should have the db name, host, creds in secrets/private/production [22:19:46] milimetric: they made it such that you can create dbs with names that are like labsdbuser__dbname [22:20:18] it'll get created as part of the initialize_server from fab [22:20:25] oh ok, I was thinking it would still be called wikimetrics, but doesn't have to be [22:20:27] and then we can restore [22:20:33] yeah [22:20:41] what'd you end up doing with the backup processes? [22:20:54] it doesn't exist anymore in puppet [22:20:54] is that still puppetized or [22:20:57] oh ok [22:21:04] so we'd just manually back up? [22:21:06] labsdb has its own backups [22:21:12] we dont have to do anything [22:21:21] well, the files and redis files [22:21:35] ah, are they backed up on nfs too [22:21:35] not that we ever used those [22:21:45] hmmm i din't think of that [22:21:52] ok [22:22:07] we don't have nfs - and i'm not sure we should enable it for this [22:22:46] YuviPanda: it seems we have some persistent files that are also being backed up ccurrently on nfs [22:23:59] milimetric: do you think we should figure out a way to back those up, or do those manually? [22:24:19] i dont know where we'd put the backups though [22:24:20] don't think the redis stuff needs to be atally backed up, right? since it's just in-flight requests [22:24:25] how big are the files? [22:24:38] there are just a ton of files, and it's not the redis ones [22:24:46] it's kind of dumb [22:25:00] and complicated [22:25:20] quite a bit of work went into the backup scripts [22:25:35] and I don't think anyone really needs those files to be safe except for us as part of vital signs [22:25:59] and we really have to move that to another system anyway, 'cause the reports fail half the time [22:26:11] milimetric: hmmm [22:26:20] in my opinion, we don't need to back them up, we can just keep the current backups we have and go from there [22:26:34] ok lets do that then [22:26:41] sooo [22:26:44] I'll merge the puppet patch? [22:26:48] but we still have to copy them over before we change the web proxy [22:27:06] so as long as wikimetrics1 stays online and keeps NFS access, we're ok [22:27:12] yeah [22:27:15] milimetric: okay cool [22:27:21] YuviPanda: yeah i think you can merge [22:27:25] so you can just agent forward and scp it from old host to new host [22:27:31] or scp it through your local [22:27:51] i was never able to scp with agent forwarding [22:27:56] (-A you mean right?) [22:28:05] yeah [22:28:10] you agent forward and ssh to bastion [22:28:14] then scp from X to bastion [22:28:17] then from bastion to Y [22:28:19] that should work [22:28:23] oh, ok [22:28:26] even if X <-> Y directly doesn't [22:28:30] madhuvishy: merged [22:28:58] YuviPanda: thanks [22:29:15] I'll take down staging and set it up again [22:29:20] can I log on labs? [22:29:30] madhuvishy: sure! [22:29:34] !log wikimetrics [22:29:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [22:29:48] heh [22:29:50] ok [22:29:55] YuviPanda: Tsk. :-) [22:36:52] madhuvishy: what's this thing called again? I can't login to wikimetrics-01 or wikimetrics01 [22:37:11] milimetric: the new one? [22:37:26] yes [22:37:42] milimetric: no prod instance yet - [22:37:52] staging instance is wikimetrics-staging.wikimetrics [22:37:59] it's in a different project [22:38:01] oh :) [22:38:01] you have access [22:38:17] makes sense that I can't connect to a nonexistent thing then [22:38:31] mmm, I might have to go shopping for a couple hours soon [22:39:34] milimetric: no problem [22:39:45] our prod is still up [22:39:57] right, 'course but the email I sent I probably shouldn't have sent [22:40:07] right [22:40:19] so what's left to do [22:44:15] milimetric: i'm setting up staging again [22:46:12] Analytics, Analytics-Cluster: https://yarn.wikimedia.org/cluster/scheduler should be behind ldap - https://phabricator.wikimedia.org/T116192#1935684 (Tbayer) Right now I'm getting the following error at https://yarn.wikimedia.org/cluster/scheduler : //Error: 404, Public access disabled. See https://wiki... [22:46:28] madhuvishy: ok, let me know where you leave off and I'll try to finish up if I can [22:46:45] milimetric: okay [22:46:57] milimetric: nuria thanks for prioritizing this and moving it off [22:47:03] madhuvishy: ^ [22:47:38] limn1 is still in bad shape though :) is almost unrecoverable now, and become more so with every day. [22:50:51] YuviPanda: i messed up the paths a little bit it seems [22:51:09] the wsgi file is looking for config files in /srv/wikimetrics [22:51:34] oh [22:51:43] madhuvishy: make additional patch? I'll just merge [22:51:54] when it's actually inside /srv/wikimetrics/config - which has the checked out wikimetrics-deploy - inside which there's a config folder [22:52:01] why does this seem stupid only now [22:52:19] paths are always stupid [22:52:24] Analytics, Analytics-Cluster: https://yarn.wikimedia.org/cluster/scheduler should be behind ldap - https://phabricator.wikimedia.org/T116192#1935708 (Ottomata) Earlier today I sent the following email to the analytics mailing list. **Public YARN ResourceManager HTTP UI disabled** Hi all, Due to a rece... [22:53:00] madhuvishy: the code for scap's source deployment is at: [22:53:02] root@tin:/srv/deployment/scap/scap/scap# ls [22:53:19] YuviPanda: okay i'll patch both fabric and the wsgi file - will call the config folder in wikimetrics-deploy as config_templates [22:53:24] which is what it really is [22:53:25] ok! [22:55:46] (PS1) Madhuvishy: Fix the config paths in wsgi file again [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/264202 [23:01:42] HaeB: (are you tbayer?) [23:02:10] Analytics: Restore MobileWebSectionUsage_14321266 and MobileWebSectionUsage_15038458 - https://phabricator.wikimedia.org/T123595#1935731 (Ottomata) [23:03:13] (PS1) Madhuvishy: Move config template yaml files to config_templates dir [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/264204 [23:03:34] ottomata: yes he is :) [23:04:13] Analytics, Analytics-Cluster: https://yarn.wikimedia.org/cluster/scheduler should be behind ldap - https://phabricator.wikimedia.org/T116192#1935736 (Ottomata) @Tbayer, the ssh command works for me, buuuuuut hm. What bastion do you usually use to connect to stat1002? You might even be able to replace... [23:04:14] YuviPanda: I'm working on some logic to fix limn1 as well, no worries [23:04:22] ok, now I go grocery shopping for realz, bbl [23:04:54] milimetric: :D [23:06:44] YuviPanda: added you to merge these two patches - i think that should fix it [23:07:32] madhuvishy: they're both wikimetrics repos right? [23:07:42] I can merge if that's ok with rest of a-team :D [23:07:44] yup i can self merge too i think [23:07:52] ah [23:07:54] ok [23:07:59] cool then self merge :D [23:08:05] they are minor, yp okay [23:08:19] Analytics, Research-and-Data: Historical analysis of edit productivity for English Wikipedia - https://phabricator.wikimedia.org/T99172#1935743 (Halfak) I've started trying to get this data loaded onto the altiscale Research Cluster so that I can use HIVE to query it. I'll be working on ways to flag bots... [23:08:23] (CR) Madhuvishy: [C: 2 V: 2] "Self merging - minor fix" [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/264204 (owner: Madhuvishy) [23:08:46] (CR) Madhuvishy: [C: 2 V: 2] "Self merging, minor fix" [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/264202 (owner: Madhuvishy) [23:12:09] YuviPanda: gah i missed one place [23:12:38] :D [23:16:48] (PS1) Madhuvishy: Fix path for local_config_dir [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/264207 [23:17:27] YuviPanda: anyway, it was a good idea to redo staging [23:18:14] yeah [23:18:16] caught these things [23:18:31] yup [23:19:06] * YuviPanda goes to stat1002 this time to check his events [23:20:34] YuviPanda: :) should I give the prod instance more RAM and CPUs? [23:20:51] madhuvishy: medium or large should do... [23:20:57] madhuvishy: waht's the current one? medium? [23:20:57] okay [23:21:02] madhuvishy: also these are debian right? [23:21:04] not ubuntu? [23:21:12] YuviPanda: new ones debian [23:21:17] cooool [23:21:47] 8.2 [23:21:49] shinyy [23:21:53] :D [23:22:00] YuviPanda: old wikimetrics prod instance was large [23:22:06] ah [23:22:08] yeah then just use large [23:22:37] ok [23:27:45] hmm [23:27:52] we're producing 70k events per day [23:27:54] approximately [23:28:01] I suppose that's fine? [23:28:14] that's less than 1 per second [23:28:16] so should be fine [23:48:44] (PS2) Madhuvishy: Make db root user configurable for different environments [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/264207 [23:49:16] Analytics, Research consulting, Research-and-Data: Update official Wikimedia press kit with accurate numbers - https://phabricator.wikimedia.org/T117221#1935826 (DarTar) @ezachte this hasn't seen any update in a while, is there a status update or shall we close this? [23:50:02] Analytics-General-or-Unknown, Community-Advocacy, Wikimedia-Extension-setup, Wikipedia-iOS-App-Product-Backlog: enable Piwik on ru.wikimedia.org - https://phabricator.wikimedia.org/T91963#1935829 (Krenair) See {T116308}