[06:23:21] 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668283 (10Marostegui) [06:26:35] 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668294 (10Marostegui) [06:55:53] (03CR) 10Fdans: [C: 04-1] "Looks good to me! The only thing right now is that there's no error or a 404 when we enter the detail page of a metric that doesn't exist." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382636 (https://phabricator.wikimedia.org/T167676) (owner: 10Milimetric) [06:56:19] gooooood morning equipo A! [07:35:41] (03PS1) 10Fdans: Use B instead of G to represent billion in the detail page [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) [07:38:13] (03CR) 10jerkins-bot: [V: 04-1] Use B instead of G to represent billion in the detail page [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [07:49:38] morning :) [07:50:48] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3668422 (10jcrespo) Quarry switched to the new servers on Tue, Sep 26 (less than 2 weeks ago), so if that has been happening "in last months", it is not that. The only change I can see, which applies to both s... [07:51:33] 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668424 (10Marostegui) [07:58:19] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3668444 (10jcrespo) Actually, running your first query, takes: ``` 8 rows in set (7.85 sec) ``` on the new analytics server, while I it took >200 seconds on labsdb1001: ``` 8 rows in set (3 min 52.05 sec)... [08:20:03] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3668482 (10jcrespo) I had to kill the second query after 1000 seconds: ``` | Query | 983 | Queried about 131240000 rows | select distinct(con... [08:35:44] Morning a-team [08:50:14] elukey, fdans: With your approval, I'll start deploy soon [08:50:46] my approval isn't worth that much, but you have it joal [08:51:10] fdans: Thank you for approval - And please don't think it's not valuable :) [08:53:30] !log Rerunning wikidata-articleplaceholder_metrics-wf-2017-10-7 after failure [08:53:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:53:32] joal: Bonjur :) There is an issue with datasets1001 atm, that should be completely not relevant but stat100[56] have high load due to nfs etc.. [08:53:47] 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668658 (10Marostegui) [08:53:48] err sorry bonjour [08:53:50] :) [08:53:53] elukey: Would you prefer me to wait for this to be finished? [08:54:00] I’m out for about an hour to run some errands a-team [08:54:00] elukey: Bonjour à toi :) [08:54:02] yes please, if it is not an issue :) [08:54:07] ok fdans [08:54:13] elukey: not an issue at all :) [08:54:25] elukey: I'll wait for you to lert me know when you think it's ready [08:54:38] elukey: Thanks for caring our beloved machines [08:54:44] super, I'd say that after lunch is ok, we are going to reboot dataset1001 [08:55:07] if you are interested it seems an issue with nfs/xfs and slabcache https://phabricator.wikimedia.org/T169680#3668636 (recurrent) [08:55:19] I am totally ignorant in this part of the kernel [08:55:24] buuut it looks really interesting [08:55:45] elukey: I think I'm more ignorant than you are about nfs / xfs :( [09:17:32] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3668723 (10IKhitron) Hi, @jcrespo. All that you told me, I don't think you should it to me. I just use quarry, don't choose between analytics and something, I don't know what they are. You don't need to compar... [09:21:18] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3668749 (10jcrespo) @IKhitron It is ok, quarry maintainers will know what to do with the information I have written :-) [09:28:42] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3668761 (10IKhitron) Sure, @jcrespo, but you talked to me ("your query"), so I answered. [09:52:45] 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668825 (10Marostegui) [10:32:43] 10Analytics, 10DBA: Drop MoodBar tables from all wikis - https://phabricator.wikimedia.org/T153033#3668933 (10Marostegui) [10:53:44] joal: all good now! I am going to lunch now but feel free to deploy [11:23:37] Thanks elukey :) [11:27:22] (I am back) [11:47:16] !log Deploying refinery from scap [11:47:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:56:15] (03CR) 10Joal: [V: 032 C: 032] "LGTM ! Merged !" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/353309 (https://phabricator.wikimedia.org/T164497) (owner: 10Mforns) [11:56:51] 10Analytics-Kanban, 10Patch-For-Review: Cleaning scheme for banner data _SUCCESS files - https://phabricator.wikimedia.org/T164497#3235487 (10JAllemandou) Base script merged - Now it needs a puppet companion to lauh the script :) [11:59:56] 10Analytics-Kanban, 10Patch-For-Review: Cleaning scheme for banner data _SUCCESS files - https://phabricator.wikimedia.org/T164497#3669114 (10JAllemandou) Interestingly, checking this patch allowed me to notice we don't currently use the new version of druid loading for banners ... Mwarf [12:04:33] !log Deploy refinery onto HDFS [12:04:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:14:07] !log Kill-Restart oozie jobs loading banner data into druid [12:14:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:47:10] !log Kill restart oozie job lading mediawiki-history into druid [12:47:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:46:08] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/383127 [13:52:21] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/383127 (owner: 10Hashar) [14:15:33] (03CR) 10Mforns: "Logic looks good! But I think there's a typo?" (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [14:16:56] @mforns: sorry, thought I had already pushed that last fix! [14:17:04] thank you for the CR :) [14:17:14] fdans, wait, I might have missed it... [14:17:33] oh ok [14:17:47] (03PS2) 10Fdans: Use B instead of G to represent billion in the detail page [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) [14:17:59] there it is mforns [14:32:14] (03CR) 10Mforns: [V: 031 C: 031] "Looks good to me! I think the animation is even better now. +1, because we still have to remove not-implemented metrics from the config li" (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382636 (https://phabricator.wikimedia.org/T167676) (owner: 10Milimetric) [14:32:48] (03CR) 10Mforns: [C: 032] Use B instead of G to represent billion in the detail page [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [14:34:08] (03CR) 10Mforns: [V: 032 C: 032] Use B instead of G to represent billion in the detail page [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [14:34:22] (03PS3) 10Mforns: Use B instead of G to represent billion in the detail page [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [14:34:30] (03CR) 10Mforns: [V: 032 C: 032] Use B instead of G to represent billion in the detail page [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [14:48:02] 10Analytics-Kanban, 10Patch-For-Review: Cleaning scheme for banner data _SUCCESS files - https://phabricator.wikimedia.org/T164497#3669639 (10mforns) @JAllemandou Will do that :] [14:59:16] 10Analytics-Kanban: Correct (AGAIN ??!!) mediawiki_history cumulative count names - https://phabricator.wikimedia.org/T176600#3669652 (10Nuria) 05Open>03Resolved [14:59:26] (03CR) 10Nemo bis: "Is this website aimed only at USA people or what? stats.wikimedia.org used to be an international website." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [14:59:32] 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging tests fail for python 3.4 in Jenkins - https://phabricator.wikimedia.org/T164409#3669654 (10Nuria) 05Open>03Resolved [15:01:21] ping fdans standduppp [15:02:35] (03CR) 10Fdans: "@Nemo bis please see commit message about internationalisation." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [15:05:51] 10Analytics-Kanban, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Move Wikistats 2 from Differential to Gerrit - https://phabricator.wikimedia.org/T177288#3669669 (10Nuria) 05Open>03Resolved [15:06:06] 10Analytics-Kanban, 10Patch-For-Review: Create purging script for mediawiki-history data - https://phabricator.wikimedia.org/T162034#3669670 (10Nuria) 05Open>03Resolved [15:09:02] 10Analytics-Kanban, 10Patch-For-Review: Add "PhantomJS" to the list of bots in webrequest definition. - https://phabricator.wikimedia.org/T175707#3669680 (10Nuria) 05Open>03Resolved [15:09:46] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Wikistats metrics should link to corresponding page in meta - https://phabricator.wikimedia.org/T176241#3669682 (10Nuria) 05Open>03Resolved [15:31:07] (03CR) 10Nemo bis: "@Fdans the commit message doesn't address my point. In Wikimedia, we usually strive to respect international standard even in English-only" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [15:39:51] (03CR) 10Fdans: "@Nemo in this context we felt that using "billion" to speak about pageviews felt more correct than using G as shorthand for "gigapageviews" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/383071 (https://phabricator.wikimedia.org/T170940) (owner: 10Fdans) [15:41:13] 10Analytics, 10Analytics-Wikistats: Alpha Release: Breakdown selection disapears when you change timespam - https://phabricator.wikimedia.org/T177646#3669721 (10Nuria) [15:41:48] 10Analytics-Kanban, 10Analytics-Wikistats: Alpha Release: Breakdown selection disapears when you change timespam - https://phabricator.wikimedia.org/T177646#3665552 (10Nuria) a:03fdans [16:02:21] ping mforns [16:02:30] oops coming [16:04:22] (03PS1) 10Joal: Fix oozie jobs loading druid proj-family uniques [analytics/refinery] - 10https://gerrit.wikimedia.org/r/383151 [16:05:31] 10Analytics, 10Operations, 10Ops-Access-Requests: analytics-privatedata-users access for Jeff Green - https://phabricator.wikimedia.org/T177602#3664354 (10Nuria) Approved, this is ops for FR tech [16:05:48] !log pausing all druid oozie coordinators in preperation for druid public separation [16:05:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:14:48] 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014#2931068 (10fdans) We'd like to have out editing APIs in place before we import this data. Aiming to have this in Q3. [16:19:44] 10Analytics: Improve joining mechanism between webrequest data and edit data for i.e. sampling pageviews - https://phabricator.wikimedia.org/T126290#2010185 (10fdans) You can join webrequest and edit data using page ids for desktop and mobile web traffic but not for app traffic [16:24:31] joal: what will happen if i stop druid services on the node where the banner realtime indexing is running? [16:24:51] ottomata: I looked into that, I have an answr :) [16:25:24] ottomata: tasks will die, and won't manage to be relaunched, because of each of the new tasks having the same name as a previsouly dead one [16:26:04] New task will successfully created after 'segment-granularity' on another middlemanager, since it'll have a new name [16:26:26] ottomata: You can do it, there's no big issue, we'll have realtime back online tomorrow [16:26:40] oh ok [16:26:55] ottomata: However, on a more generic note, when using druid, you should always have lambda archi: realtime + batch [16:27:14] because in the case described, you really loose info if you relly on realtime only [16:27:17] 10Analytics, 10Analytics-Kanban: Add link to footer of wikistats with "file a bug" - https://phabricator.wikimedia.org/T177642#3669880 (10Nuria) a:03fdans [16:27:21] ottomata: --^ [16:27:49] aye [16:27:49] ya [16:27:57] which we have for this, ya? [16:28:51] correct ottomata [16:29:44] 10Analytics-Kanban, 10Operations, 10Traffic: Invalid "wikimedia" family in unique devices data due to misplaced WMF-Last-Access-Global cookie - https://phabricator.wikimedia.org/T174640#3669886 (10Nuria) a:03JAllemandou [16:30:27] (03CR) 10Nuria: "Can we split the fixes so we can attach them to the different tickets?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/383151 (owner: 10Joal) [16:30:35] k proceeding [16:32:01] 10Analytics-Kanban, 10Patch-For-Review: Replace references to dbstore1002 by db1047 in reportupdater jobs - https://phabricator.wikimedia.org/T176639#3669892 (10Nuria) 05Open>03Resolved [16:32:27] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3669893 (10Nuria) [16:32:29] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3669896 (10Nuria) [16:32:36] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3447705 (10Nuria) 05Open>03Resolved [16:32:40] 10Analytics-Kanban, 10Patch-For-Review: Correct typo in oozie mobile_apps_session - https://phabricator.wikimedia.org/T176599#3669901 (10Nuria) 05Open>03Resolved [16:33:29] 10Analytics-Kanban, 10Analytics-Wikistats: Backend for wikistats 2.0 - https://phabricator.wikimedia.org/T156384#3669908 (10Nuria) [16:33:31] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Implement some example metrics as Druid queries - https://phabricator.wikimedia.org/T170882#3669907 (10Nuria) 05Open>03Resolved [16:34:54] !log stopping druid services on druid1006 [16:34:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:35:54] 10Analytics-EventLogging, 10Analytics-Kanban: Eventlogging refine popups, temporary cron - https://phabricator.wikimedia.org/T177783#3669920 (10Nuria) [16:37:40] 10Analytics-Kanban: Write document on wikitech on why do we want to migrate back to gerrit from differential - https://phabricator.wikimedia.org/T176145#3669935 (10Nuria) 05Open>03Resolved [16:38:18] 10Analytics-Cluster, 10Analytics-Kanban: Provision new Kafka cluster(s) with security features - https://phabricator.wikimedia.org/T152015#3669940 (10Nuria) [16:38:20] 10Analytics-Cluster, 10Analytics-Kanban, 10monitoring, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3669938 (10Nuria) 05Open>03Resolved [16:38:39] 10Analytics-Kanban, 10Analytics-Wikistats: Handle long project names in Wikiselector - https://phabricator.wikimedia.org/T173373#3669941 (10Nuria) 05Open>03Resolved [16:39:11] 10Analytics-Kanban, 10Operations, 10Patch-For-Review, 10User-Elukey: Tune Kafka logs to register clients connected - https://phabricator.wikimedia.org/T173493#3669942 (10Nuria) 05Open>03Resolved [16:41:03] ok joal, nuria_, druid1006 services are offline [16:41:15] i guess we are waiting for historical nodes to be reassigned some segments? [16:42:30] ottomata: let's see if there are 404s in pivot, give me a sec [16:42:45] ottomata: did you removed it completely from ring? [16:43:31] ring? i've stopped all druid services on that node [16:43:52] ottomata: sorry, cluster [16:43:59] ottomata: ah i see, you removed services [16:44:12] ottomata: but the node is still visible to others? [16:44:13] its not showing up in cluster status queries [16:44:17] no don't think so [16:44:58] so since segments are on hdfs it is only a matter of stopping daemons [16:45:11] curl -sL druid1005.eqiad.wmnet:8081/druid/coordinator/v1/loadqueue?simple [16:45:12] since the data is only in memory on druid's nodes [16:45:23] the daemons are stopped, but we should wait before stopping other nodes, right? [16:45:34] otherwise segments will be unavailable until they are reloaded [16:45:47] i dunno how long it will take druid to mark the underreplicated segments from being loaded into other nodes though [16:45:57] ottomata: it doesn't say on docs [16:45:59] or really how to tell if that has been done [16:46:04] ah yes my point was more seeking for a confirmation than a statement :) [16:46:14] aye [16:46:14] ottomata: i looked for that [16:46:55] ottomata: does node come up on logs as not available? [16:47:03] ottomata: let me login [16:48:45] ottomata: but there is no place where all historical.log are agreggated right? [16:49:16] btw joal, is this new? http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html [16:49:19] i don't remember seeing this before [16:49:37] ottomata: no, i saw that in their netflix talk [16:50:25] ottomata: i think they talked about that in their explanation of real time consumption [16:50:34] ottomata: as their message bus was kafka [16:50:53] ottomata: is there a way for me to log into zookeeper? [16:51:22] nuria yes [16:51:24] on druid1001 [16:51:27] /usr/share/zookeeper/bin/zkCli.sh [16:51:54] ottomata: ok, cause there is where the info of what nodes have what segments is kept [17:01:45] 10Analytics-Cluster, 10Analytics-Kanban, 10monitoring, 10User-Elukey: Decide on casing convention for JMX metrics in Prometheus - https://phabricator.wikimedia.org/T177078#3669996 (10Ottomata) BTW, this will also be very relevant for both upcoming Prometheusization for Druid and Hadoop, even in non JMX con... [17:03:26] nuria_: i dunno. i guess i'm inclined to wait a day per node? [17:03:50] ottomata: I think zookeeper will let us know if i can find how to query it properly [17:03:58] ottomata: http://druid.io/docs/latest/dependencies/zookeeper.html [17:04:18] ottomata: i was trying curl http://localhost:2181/druid/analytics-eqiad/discovery [17:04:23] from druid1001 [17:04:26] hmm [17:04:29] ottomata: but ahem, nothing [17:04:31] and/or [17:04:34] in zkCli [17:04:34] ls /druid/analytics-eqiad/segments [17:04:40] there are druid1006* nodes [17:04:42] multiple ports [17:04:51] zkCLi? [17:05:02] ya ^^^ [17:05:04] on druid1001 [17:05:05] /usr/share/zookeeper/bin/zkCli.sh [17:05:28] ottomata:ah yes, but no disconnect right? [17:05:35] disconnect? [17:06:21] ottomata: sorry, i was looking to /var/log/zookeeper [17:06:27] in 1001 [17:07:49] 10Analytics, 10Analytics-Wikistats, 10Research: Renovation of Wikistats production jobs - https://phabricator.wikimedia.org/T176478#3670022 (10Erik_Zachte) script datamaps_views.sh, for updating WiViVi data, has been adapted to stat1005 viz. now shows data for Sep 2017 https://stats.wikimedia.org/wikimedia/a... [17:10:35] ottomata: I think this is it [17:10:38] https://www.irccloud.com/pastebin/SfpD4iHC/ [17:10:48] ottomata: although you might have known this ages ago [17:13:10] ottomata: and 1006 not present here: [17:13:12] ottomata: [17:13:15] https://www.irccloud.com/pastebin/m6KelPiE/ [17:13:46] ya, druid knows 1006 is offline for querying, i just want to know when it will have its segments fully reassigned. i'm not sure if it not being in segments is enough. [17:13:47] could be! [17:13:57] i'm inclined to wait a day and see [17:14:15] ottomata: i suspect this might be query driven [17:14:28] ottomata: so a day w/o queries might not do anything [17:14:40] ottomata: as it might not trigger the ttl to reassign segments [17:14:48] ottomata: but iam amaking this up... [17:16:58] going afk people! [17:16:59] * elukey off [17:17:48] ottomata: if you want me to follow up on any druid task tomorrow lemme know later on :) [17:18:48] ottomata: i think zookeeper knows now: [17:18:52] https://www.irccloud.com/pastebin/ctYyvImv/ [17:20:15] ottomata: according to zookeeper there are no segments on node [17:20:56] hmm, nuria_ that seems unrelated, but maybe [17:21:28] nuria_: sorry the part about queries [17:21:38] ottomata: yaya , agreed [17:21:57] oh interesting, eyah if we ls some 1006 node ports, no segments [17:21:57] hm [17:21:58] ottomata: but the segment report from zookeeper seems pretty legit no? [17:23:09] yeah, guuuuess so, but you'd think some other nodes would be loading segments now then [17:23:19] OH! [17:23:20] they are! [17:23:23] curl -sL druid1005.eqiad.wmnet:8081/druid/coordinator/v1/loadqueue?simple [17:23:30] awesome [17:23:43] since we know there are no scheduled indexing tasks running [17:23:54] we can be pretty sure those are loading beacuse 1006 is offline [17:23:54] so [17:23:57] when that goes back to 0 [17:24:16] ya for example [17:24:17] "mediawiki_history_reduced_2007-09-01T00:00:00.000Z_2007-10-01T00:00:00.000Z_2017-09-20T17:48:13.172Z" [17:24:17] ottomata: and even before cause they will be [ulled from deep storage [17:24:18] great [17:24:21] that's an old segment [17:24:23] *pulled [17:24:25] true [17:24:32] acutally, now its done? [17:24:40] hmm, i mean, that is really just a report of segments in the load queue [17:24:48] so, it might not mean all segments have been loaded [17:24:57] beacuse i dunno how/when the now missing 1006 segments will be placed in the load queue [17:25:10] i dunno, maybe it is safe to proceed? [17:25:12] kinda hard to know [17:25:24] i mean, the worst that will happen is some queries will fail if we remove nodes too early [17:25:24] right? [17:25:31] until the segments are totally reloaded [17:25:36] do we have anything super critical? [17:25:40] just pivot really, right? [17:26:17] ottomata: actually, i think that once zookeeper figures out 1006 is not there the orst it can happen is that segments are pulled from deep storage rather than memory [17:26:21] *worst [17:26:47] oh during the query you mean [17:26:48] hm. [17:26:49] maybe so! [17:27:09] alright nuria_ i feel fine about proceeding, given what we see in ZK, and that i've seen it load new segments since [17:27:26] ottomata: ok [17:28:04] ok, proceeding with 1005 [17:29:09] ottomata: ok, let me check zookeeper [17:30:09] ottomata: ok, now no segments on 1005 [17:30:17] ottomata: did you just turn it off? [17:30:53] yes [17:31:08] yup, and wow, segements are already bing loaded [17:31:12] ottomata: ok, also appears removed from announcements [17:31:14] curl -sL druid1004.eqiad.wmnet:8081/druid/coordinator/v1/loadqueue [17:31:20] ya, just looked [17:32:02] oh awesome and [17:32:08] i just noticed [17:32:09] ssh -N druid1001.eqiad.wmnet -L 8081:druid1004.eqiad.wmnet:8081 [17:32:13] http://localhost:8081/#/ [17:32:13] has [17:32:21] datasource availabity :) [17:33:36] nuria_: for good measure, i'll wait another 15 mins before doing 1004. [17:33:36] :) [17:34:03] oh, hm [17:34:03] ok [17:34:05] it is still loading [17:34:06] interesting [17:34:21] so i'm only seeing segments in the load queue, yeah [17:34:28] would be nice to know the total segments its still going to try to lod [17:34:29] lado [17:36:39] ottomata: documented all this: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Removing_hosts.2F_taking_hosts_out_of_service_from_cluster [17:37:44] ottomata: makes sense that as we remove hosts loading takes longer [17:38:18] nuria_: the curl command there does not show that the segements are loaded elsewhere [17:38:26] it just shows the segments currently to-be-loaded [17:38:32] ottomata: ah yes [17:38:34] yes [17:38:39] we need an under replicated segments query [17:38:42] wonder if that exists.. [17:43:41] ottomata: queries are SLOWER, yeah [17:44:00] oh ya? [17:47:49] ottomata: yesyes [17:48:08] ottomata: but it might be the 1sxt hit only [17:48:13] ottomata: we'll see [17:48:17] ottomata: do we do 1004? [17:49:36] ottomata: reworked docs a bit too [17:51:55] nuria_: haha, you can know which segments those are if you remove ?simple [17:51:55] :p [17:52:08] "segmentsToLoad": [ [17:52:09] "pageviews-hourly_2017-07-12T00:00:00.000Z_2017-07-13T00:00:00.000Z_2017-07-13T01:01:59.560Z_1", [17:59:31] nuria_: ok, it is loading segments, but i think things will be ok [17:59:34] i'm going to take 1004 out [17:59:43] because, it is having segments from 1005/6 loaded into it [18:00:42] ottomata: ooohhh [18:01:21] ottomata: nice [18:01:36] ottomata: high five to druid team [18:01:43] ottomata: this was painless [18:01:48] OK now there are TONS of segments loading [18:01:57] 42.8 GB (126 segments) to load [18:03:50] this one is kinda nice [18:03:50] curl -sL druid1003.eqiad.wmnet:8081/druid/coordinator/v1/loadstatus [18:04:33] ottomata: ya, i am liking this system [18:04:42] ottomata: there is some tlc here [18:05:31] ah interseting! yeah so after stopping druid1004 [18:05:41] mediawiki_history_reduced dropped to < 99% available [18:05:49] so we finally made it lose segments :) [18:06:02] ANNND there we go! [18:06:05] now 100% available [18:06:12] its still loading segemnts [18:06:18] but, not for availability purposes i think [18:06:20] just for replication [18:06:26] COOL [18:07:35] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Create Druid public cluster such AQS can query druid public data - https://phabricator.wikimedia.org/T176223#3670120 (10Ottomata) Today I set druid100[456] as spare::systemm, and stopped druid services there. Depending on the outcome of T177511,... [18:08:31] 10Analytics-Kanban, 10Operations, 10User-Elukey: LVS for Druid - https://phabricator.wikimedia.org/T177511#3670122 (10Ottomata) [18:27:17] Hey ottomata, nuria_ - I was gone dining [18:27:47] i'm going to reenable indexing jobs [18:27:52] 1004-6 are offline [18:27:54] ottomata: ? [18:27:59] ottomata: Ha, got it [18:28:07] Awesome, saw that [18:28:12] !log resuming oozie druid indexing jobs, 1004-1006 are offline [18:28:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:28:18] ottomata: I can give you UI tricks for segments loading [18:28:25] joal: please! [18:28:42] ottomata: in coordinator UI, cluster tab [18:28:46] ya [18:28:48] looking at that [18:29:01] btw, if you know what the little green dots in the tier capacity image are [18:29:05] i would love rto konw [18:29:45] The blue dot (unfilled), mean that all segments are laoded - They are in red or yellow otherwise if machines are ladong semgents [18:29:53] That is what I use [18:29:59] on the left side ya [18:30:01] I have not found either what he 3 dots mean [18:30:17] but, i think that's not what it means, i htink it means that the segments are at least loaded in a node and are available for querying [18:30:27] not that the segment is fully loaded on all replicas (IIUC) [18:30:41] ^ the unfilled blue dot per datasource [18:30:41] correct ottomata - But that's what you are after knowing when a node is down. no? [18:30:46] true! [18:30:49] well [18:30:50] no [18:30:51] i mean [18:30:56] if there is only 1 replica online [18:31:01] it is available for querying [18:31:12] say i already turned off 1005 and 1006 [18:31:15] i'm about to do 1004 [18:31:28] i'd like to know if doing so will make any segments UNavailable [18:31:43] i just did this, and it did briefly make mediawiki_history_reduced unavailable [18:32:14] ottomata: right [18:32:29] ottomata: I wonder where I can find config about segment replication [18:32:45] in RAM I mean [18:33:32] I have found a way to check where are some segments, but no way to check asking for a node and checking exactly for those if they are somewhere else [18:33:41] ottomata: It could be a nice API addition [18:33:55] aye [18:34:58] ottomata: I assume now the thing is done and I can't really help, right :( [18:35:12] ya we good! [18:35:14] no problems! [18:35:30] now we just need ot figure out if they need to be moved out of analytics vlan [18:35:33] And also ottomata, I just took a quick look at the kafka consumption extension, it looks super interesting ! [18:35:55] that is new, right? [18:36:10] ottomata: I found the reading about realtime issues we hit with tranquility: https://groups.google.com/forum/#!topic/druid-development/Ot0z3n_Tp4E [18:36:13] ottomata: I think so [18:44:16] (03PS1) 10Joal: Fix oozie jobs loading druid proj-family uniques [analytics/refinery] - 10https://gerrit.wikimedia.org/r/383171 (https://phabricator.wikimedia.org/T174640) [18:44:37] (03Abandoned) 10Joal: Fix oozie jobs loading druid proj-family uniques [analytics/refinery] - 10https://gerrit.wikimedia.org/r/383151 (owner: 10Joal) [18:48:00] (03CR) 10Nuria: [V: 032 C: 032] Fix oozie jobs loading druid proj-family uniques [analytics/refinery] - 10https://gerrit.wikimedia.org/r/383171 (https://phabricator.wikimedia.org/T174640) (owner: 10Joal) [18:48:22] (03PS1) 10Joal: Fix druid datasources for proj-family uniques jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/383172 (https://phabricator.wikimedia.org/T175162) [18:51:41] (03CR) 10Nuria: [V: 032 C: 032] Fix druid datasources for proj-family uniques jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/383172 (https://phabricator.wikimedia.org/T175162) (owner: 10Joal) [18:51:49] Thanks nuria_ :) [18:52:52] joal: one step closer to perfection in datasources naming [18:53:16] nuria_: It's interesting to do it now since I'll need full reindex for the other change :) [18:55:24] 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review: Invalid "wikimedia" family in unique devices data due to misplaced WMF-Last-Access-Global cookie - https://phabricator.wikimedia.org/T174640#3670314 (10JAllemandou) The change above doesn't change the behavior of cookies, but at least removes... [18:56:24] ottomata: can you merge this one? [18:56:28] ottomata: https://gerrit.wikimedia.org/r/#/c/381496/ [18:56:36] ottomata: PagecontentSavecomplete removal [18:59:56] trying [19:00:01] fdans: still here? [19:00:55] sort of! what’s up joal? :) [19:01:24] fdans: I was thinking it would be good to go for suggestion: remove the global aspect of editors [19:01:42] nuria_: hm. gerrit is not giving me the option to submit [19:01:53] ottomata: ya, how come? [19:01:58] ottomata: me no compredou [19:02:05] me neither [19:02:23] joal: so, no all-projects for any of them right? [19:02:27] fdans: computing editors cross-projects with digests as-is is too big, so going for the simple one (no global), and then be better with better digests computation and storage hsould be the way [19:02:36] ottomata: ok, let's ask on releng [19:02:55] nuria_: fyi [19:02:55] https://gerrit.wikimedia.org/r/#/c/382750/ [19:03:11] fdans: hmm, any of them? The are numerous? [19:03:20] joal: that makes sense [19:03:43] haha you sure its safe to remove this event? :p [19:03:49] looks like performance team might be using it [19:03:49] joal: ohhh sorry [19:03:55] just editors [19:03:58] fdans: This means you can also remove the comment about no dedup :) [19:04:00] I misread [19:04:01] ottomata: the pagecontentsavecomplete? [19:04:24] yes see that patch i just linked [19:04:26] sounds good joal, will push a change in the morning ;) [19:04:35] fdans: no rush, was just thinking of that [19:04:43] fdans: sorry for having disturb ;) [19:05:54] joal: noooo interaction with you is ever a disturbance :D [19:06:07] mwahaha :) [19:06:27] ottomata: is that what is going on? [19:06:39] ottomata: two conflicting changes? [19:06:55] not sure [19:06:56] ottomata: doesn't seem likely... [19:09:04] ottomata: BTW, I had never used /usr/bin/flock [19:09:24] ottomata: this would saved hours from my life once [19:09:34] ottomata: just saw it here: https://gerrit.wikimedia.org/r/#/c/379234/2/modules/statistics/templates/published-datasets-sync.sh.erb [19:12:28] :) [19:23:06] ottomata: stat1005 is becoming usable, maybe arnad user needs to be using nice? [19:24:33] ottomata: is he someone that works with halfak ? [19:28:47] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3670403 (10zhuyifei1999) 05Open>03Resolved Dunno if anything needs to be done here. [19:30:55] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3670406 (10IKhitron) >>! In T160188#3670403, @zhuyifei1999 wrote: > Dunno if anything needs to be done here. Do you mean it will stay work quickly like this for good? [19:34:06] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3670411 (10zhuyifei1999) >>! In T160188#3670406, @IKhitron wrote: > Do you mean it will stay work quickly like this for good? Quarry will use the new faster analytic servers, yes. But how the query executes i... [19:35:43] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3670428 (10IKhitron) >>! In T160188#3670411, @zhuyifei1999 wrote: >>>! In T160188#3670406, @IKhitron wrote: >> Do you mean it will stay work quickly like this for good? > > Quarry will use the new faster anal... [19:37:02] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3670432 (10zhuyifei1999) Or make faster :) [19:38:45] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3670437 (10IKhitron) >>! In T160188#3670432, @zhuyifei1999 wrote: > Or make faster :) Make faster it's great, but if it's possible that it will make slower, then this task isn't closed forever, I'm affraid. [19:53:48] nuria_: i think arnad works with leila [19:53:51] will send email [19:54:05] ottomata: sounds good, thank you, [19:56:46] ottomata: are you around? [19:57:09] ya kinda [19:57:10] :) [19:57:15] what's up dsaez [19:57:18] hi [19:57:52] I've seen a lot of movement here, do you know why 1005 is so slow? [19:59:15] dsaez: arnad is running a large heavy job [19:59:24] i see [20:01:28] i just sent him an email asking him to nice it [20:02:26] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3670531 (10zhuyifei1999) >>! In T160188#3670437, @IKhitron wrote: > Make faster it's great, but if it's possible that it will make slower, then this task isn't closed forever, I'm affraid. In that case, if on... [20:03:46] ;) thanks [20:38:08] dsaez: do you know where arnad works from? (his TZ i guess) [20:38:38] nuria_: no idea, I don't know him [20:38:51] dsaez: ok, hopefully we have this sorted out tomorrow [20:39:02] ok! [20:40:19] 10Analytics-Kanban, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10Patch-For-Review: Stop collecting Data for outdated schemas PageCreation, PageDeletion, PageMove, PageRestoration. Archive tables on hdfs - https://phabricator.wikimedia.org/T171629#3670598 (10Nuria) Ping @elukey pagedelet... [20:50:06] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3670621 (10IKhitron) Not 5 times, at least 1800 times. I see, thankmyou for this explanation, and for your help, @zhuyifei1999. [21:16:12] (03PS11) 10Joal: Add mediawiki-history-metrics endpoints [analytics/aqs] - 10https://gerrit.wikimedia.org/r/379227 (https://phabricator.wikimedia.org/T175805) [22:33:01] 10Analytics, 10Analytics-EventLogging, 10AbuseFilter, 10CirrusSearch, and 29 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3670811 (10Krinkle) [23:07:44] 10Analytics, 10Analytics-EventLogging, 10AbuseFilter, 10CirrusSearch, and 29 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3670863 (10Reedy) [23:08:27] 10Analytics, 10Analytics-EventLogging, 10AbuseFilter, 10CirrusSearch, and 29 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3541977 (10Reedy) Looks like we've gone down from 30 plus `func_get_args`, to 3... [23:11:32] 10Analytics, 10Analytics-EventLogging, 10CirrusSearch, 10Cognate, and 24 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3670866 (10Reedy) [23:13:23] 10Analytics, 10Analytics-EventLogging, 10CirrusSearch, 10Cognate, and 19 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3541977 (10Reedy) [23:15:53] 10Analytics, 10Analytics-EventLogging, 10CirrusSearch, 10Cognate, and 18 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3670870 (10Reedy) [23:17:22] 10Analytics, 10Analytics-EventLogging, 10CirrusSearch, 10Cognate, and 17 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3541977 (10Reedy) [23:21:16] 10Analytics, 10Analytics-EventLogging, 10CirrusSearch, 10Cognate, and 16 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3670872 (10Reedy)