[03:08:47] 10Analytics: Why do we allow "bot" in metrics/pageviews/per-article - https://phabricator.wikimedia.org/T178448#3692674 (10Milimetric) [03:42:06] (03CR) 10Milimetric: [V: 032] Update mediawiki-history-reduced oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) (owner: 10Joal) [03:42:49] (03CR) 10Milimetric: [V: 032] "looks good to me, went over it for my own benefit mostly" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) (owner: 10Joal) [04:02:02] (03CR) 10Milimetric: "One nit (the first comment), but looks great. Tests pass for me after rm -rf node_modules && npm install, so that's really great. Nice w" (034 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/384590 (https://phabricator.wikimedia.org/T178312) (owner: 10Joal) [07:28:13] good morning! [07:37:24] morning :) [07:55:57] https://config-master.wikimedia.org/pybal/eqiad/druid-public-broker \o/ [07:58:37] 10Analytics-Cluster, 10Analytics-Kanban, 10monitoring, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3692918 (10elukey) [07:58:40] 10Analytics-Cluster, 10Analytics-Kanban, 10monitoring, 10Patch-For-Review, 10User-Elukey: Decide on casing convention for JMX metrics in Prometheus - https://phabricator.wikimedia.org/T177078#3692917 (10elukey) 05Open>03Resolved [09:02:45] (03CR) 10Joal: "Thanks for reviews folks! I think we are ready to merge/deploy whenever :)" (034 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/384590 (https://phabricator.wikimedia.org/T178312) (owner: 10Joal) [09:03:18] (03PS3) 10Joal: Upgrade restbase-modules to latest [analytics/aqs] - 10https://gerrit.wikimedia.org/r/384590 (https://phabricator.wikimedia.org/T178312) [09:04:57] Hi a-team [09:05:28] fdans: The new-registered-users is fixed, hopefully it reflects on the UI :) [09:05:59] awwwww yess this is the stuff joal_ [09:06:11] https://usercontent.irccloud-cdn.com/file/f2TNTZok/Screen%20Shot%202017-10-18%20at%2011.05.44.png [09:06:33] * joal loves this UI :) [09:06:44] This is really really awesome :D [09:07:08] * joal dances in front of keyboard for the second day in a row (this doesn't happen often) [09:07:31] joal: I'm going to push the changes to the config now, if you don't mind taking a look we can merge those and then do CRs for the UI stuff [09:08:06] sure fdans [09:11:14] (03PS7) 10Fdans: Add stub of new contributing and content metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) [09:11:20] (03CR) 10jerkins-bot: [V: 04-1] Add stub of new contributing and content metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) (owner: 10Fdans) [09:14:52] 10Analytics, 10cloud-services-team (Kanban): Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3693022 (10elukey) @Nuria Both tables right? ``` +-------------------------------------+ | Tables_in_log (CommandInvocation%) |... [09:15:07] 10Analytics, 10User-Elukey, 10cloud-services-team (Kanban): Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3693023 (10elukey) [09:17:18] (03PS8) 10Fdans: Add stub of new contributing and content metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) [09:17:31] joal: just pushed and rebased :) [09:29:35] 10Analytics-Kanban: Archive tables to hadoop: MobileWikiAppToCInteraction_10375484_15423246 and Edit_13457736_15423246 - https://phabricator.wikimedia.org/T177960#3693044 (10elukey) Sanity check before dropping: ``` MariaDB [log]> select count(*) from MobileWikiAppToCInteraction_10375484_15423246; +-----------... [09:33:05] (03CR) 10Joal: "Comments inline." (0312 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) (owner: 10Fdans) [09:33:58] fdans: Quick review - Please disregard id not valuable :) [09:34:15] thank you joal! [09:39:42] (03PS9) 10Fdans: Add configuration of new contributing and content metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) [09:39:51] (03CR) 10Fdans: Add configuration of new contributing and content metrics (0317 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) (owner: 10Fdans) [09:40:09] 10Analytics-Kanban: Archive tables to hadoop: MobileWikiAppToCInteraction_10375484_15423246 and Edit_13457736_15423246 - https://phabricator.wikimedia.org/T177960#3693078 (10elukey) @Nuria thanks a lot for this task, eventlogging_cleaner will be super happy not to clean huge tables like these :) [09:44:27] (03CR) 10Joal: [C: 031] "LGTM ! Thanks fdans :)" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) (owner: 10Fdans) [09:47:10] (03PS1) 10Fdans: 2.0.9 release [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/384957 [09:47:58] (03CR) 10Fdans: [V: 032 C: 032] 2.0.9 release [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/384957 (owner: 10Fdans) [09:50:53] (03PS1) 10Fdans: 2.0.9 release [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/384958 [09:52:14] woooow TIL change-ids can be repeated within a repo if they don't point to the same branch cc milimetric [09:52:42] (03CR) 10Fdans: [V: 032 C: 032] 2.0.9 release [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/384958 (owner: 10Fdans) [09:53:48] (03CR) 10Fdans: [C: 032] Add configuration of new contributing and content metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/382659 (https://phabricator.wikimedia.org/T175268) (owner: 10Fdans) [09:59:58] 10Analytics-Kanban: Adapt components for new editing metrics - https://phabricator.wikimedia.org/T178461#3693136 (10fdans) [10:00:12] 10Analytics-Kanban: Adapt components for new editing metrics - https://phabricator.wikimedia.org/T178461#3693152 (10fdans) [10:26:27] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3693218 (10elukey) p:05Triage>03Normal [10:47:07] 10Analytics-Kanban: Check data from new API endpoints agains existing sources - https://phabricator.wikimedia.org/T178478#3693410 (10JAllemandou) [10:47:21] 10Analytics-Kanban: Check data from new API endpoints agains existing sources - https://phabricator.wikimedia.org/T178478#3693422 (10JAllemandou) a:03JAllemandou [11:08:53] * elukey lunch! [11:09:25] elukey: ! You were here all that long ? [11:10:37] joal: I was! I've typed "morning" earlier on :D [11:10:47] Ah ! Missed that elukey :) [11:10:52] elukey: Enjoy lunch ;) [11:10:56] thankssss :) [11:53:14] taking a break a-team [11:54:02] see you guys :] [12:55:03] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657517 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['db1107.eqiad.wmnet', 'db11... [13:02:38] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Create Druid public cluster such AQS can query druid public data - https://phabricator.wikimedia.org/T176223#3693703 (10elukey) Main thing to remember for the "Create LVS endpoing for druid-public-overlord (for oozie job indexing)" task: usually... [13:03:17] mforns: db110[78] are being reimaged now with Debian Stretch, they are already racked and working \o/ [13:03:22] joal, when you're back, can we talk about druid loading? ping me then :] [13:03:32] elukey, woohooo [13:03:56] we'll need to do some puppet cleaning but I believe that next week we should be able to transfer data [13:04:05] amazing [13:04:10] db1108 should be the new slave, so initially it will replicate from db1046 [13:04:26] then we'll switch analytics-slave.etc.. to it and move people [13:04:32] last step will be to move the master [13:04:53] eventlogging_cleaner should fly on db1108 :D [13:06:35] great :D [13:11:18] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3693722 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1108.eqiad.wmnet', 'db1107.eqiad.wmnet'] ``` and were **ALL** successful. [13:27:56] (03CR) 10Ppchelko: [C: 031] Upgrade restbase-modules to latest [analytics/aqs] - 10https://gerrit.wikimedia.org/r/384590 (https://phabricator.wikimedia.org/T178312) (owner: 10Joal) [13:41:04] 10Analytics, 10DBA, 10Operations: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3693836 (10elukey) Updating this task in light of the recent discussions. The analytics and DBA teams have been fighting a lot with disk space consumption on dbstore1002 due t... [13:46:05] ottomata, helloooo [13:46:22] do you have 10 mins to talk about eventlogging refineeee? [13:46:43] mforns: yes! [13:46:59] ottomata, do we batcave? [13:47:34] ya, gimme 2 mins [13:47:39] k [14:29:36] 10Analytics-EventLogging, 10Analytics-Kanban: Refine should parse user agent field as it is done on refinery pipeline - https://phabricator.wikimedia.org/T178440#3694022 (10Ottomata) Since the current code knows nothing about schemas ahead of time, this might be a little difficult. We could add a pluggable tr... [14:33:59] (03PS1) 10DCausse: Add get_main_search_request_index [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/384987 [14:39:38] (03Abandoned) 10DCausse: Add get_main_search_request_index [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/384987 (owner: 10DCausse) [15:02:08] ping milimetric [15:04:54] 10Analytics-Kanban: Update mediawiki_history_reduced oozie job loading AQS druid backend - https://phabricator.wikimedia.org/T178504#3694141 (10JAllemandou) a:03JAllemandou [15:08:48] google kicked me out team, trying to rejoin [15:11:48] 10Analytics-Kanban: Archive tables to hadoop: MobileWikiAppToCInteraction_10375484_15423246 and Edit_13457736_15423246 - https://phabricator.wikimedia.org/T177960#3694180 (10Nuria) [15:17:52] 10Analytics-Kanban: Upgrade AQS restbase-modules - https://phabricator.wikimedia.org/T178312#3687878 (10Nuria) Let's please test on beta before sending to prod [15:58:10] dsaez: hello :) [15:58:31] hi elukey [15:58:35] stat1006 seems to be under a bit of pressure due to the memory consumption of your script :( [15:59:38] the oom killer should have already killed it at least once [16:00:02] I'm doing two in parallel, I'll kill one [16:00:20] and, hopefully it will be all finished in 30 min [16:00:42] is there a way to make it less aggressive on memory? [16:00:52] it is causing alarms in wikimedia-operatons :) [16:10:57] really? It suppose to don't use more than 6G [16:14:27] elukey, give 30 min and I'll be finihsed [16:15:32] dsaez: now it is fine, but look at https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?panelId=4&fullscreen&orgId=1&var-server=stat1006&var-datasource=eqiad%20prometheus%2Fops (Used Memory) [16:15:53] elukey: ok [16:16:02] the oom killer killed two times python processes that should be yours afait [16:16:06] *afaict [16:16:19] so nothing super horrible, was just notifing you :) [16:16:27] yes, next time I'll move this to spark [16:17:19] thanks! [16:17:26] elukey: just for curiosity, why memory usage is always over 48G ? [16:17:49] oh, no, zoom out [16:18:04] forget my question :D [16:20:43] :D [16:51:37] elukey: i think one of the reasons i didn't put that stuff in jmx_exporter_instance [16:51:47] was to keep use of ferm::service out of prometheus module [16:51:54] but, i'd be happy to put it there [16:51:57] whatcha think? [16:53:39] 10Analytics, 10User-Elukey, 10cloud-services-team (Kanban): Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3694457 (10Nuria) Yes, pinging @bd808 that those two will be deleted soon. [16:58:45] ottomata: I was thinking the same, it looks more sound in this way [16:59:38] the way I have it now? [17:00:32] elukey: ^? [17:01:51] 10Analytics, 10User-Elukey, 10cloud-services-team (Kanban): Remove logging from labs for schema https://meta.wikimedia.org/wiki/Schema:CommandInvocation - https://phabricator.wikimedia.org/T166712#3694500 (10bd808) >>! In T166712#3694457, @Nuria wrote: > Yes, pinging @bd808 that those two will be deleted soo... [17:03:37] ottomata: I was thinking something like the define that we have for druid service [17:03:46] ? [17:03:47] IIRC ferm is parametrized [17:03:57] ah [17:04:04] this is too i think, prometheus_nodes are passed in [17:04:25] but, i was just talking about the location of this define logic. one of hte points is to DRY up using exporter [17:04:31] you shoudln't have to declare the ferm rule every time you want to use it [17:04:43] completely agree [17:04:51] so, i'm fine with putting the ferm::service into the prometheus module in prometheus::jmx_exporter_instnace [17:04:52] if you are [17:04:59] so if the profile define combination is already used and is ok, let's do it [17:05:04] and ditching the profile::prometheus::jmx_exporter [17:05:21] the more I think about it the more I like the profile/define [17:05:32] aye cool [17:05:41] ok me too, i think it whouldnt' be arule [17:05:48] since a module should not instanciate other classes like ferm etc.. [17:05:51] there's no other profile::prometheus:: defineds [17:06:03] btw I slightly disagree with that rule ^^ too :p [17:06:09] its too strict [17:06:16] but [17:06:23] ¯\_(ツ)_/¯ [17:06:25] let's wait for Filippo to chime in [17:06:28] wdyt? [17:06:50] depends on how long he takes! if you are ok with it, i'm inclined to merge it and keep moving [17:07:07] i pinged him on monday [17:07:17] ah so he might be on vacation [17:09:58] this is really not breaking anything so worst case scenario we revert this profile and use it something else [17:10:17] so if it is blocking you I'd say to proceed and then wait before rolling it out everywhere [17:10:28] (need to go now, but I'll check later!) [17:10:29] k [17:10:36] cool, ya will proceed this will apply to kafka jumbo only [17:10:36] * elukey off! [17:10:40] we can apply to cassandra later [17:10:40] super [17:10:41] k by! [17:10:42] bye! [17:10:42] thanks [17:29:44] joal, if there's nothing critical, I will deploy refinery tomorrow, cause today I still have a couple meetings and I think would be too much for me [17:53:45] 10Quarry: Make backups of Quarry's main database on quarry-main-01 - https://phabricator.wikimedia.org/T178519#3694650 (10zhuyifei1999) [18:00:03] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520#3694676 (10zhuyifei1999) [18:00:56] mforns: No prob for me :) [18:01:04] joal, k :] [18:01:40] mforns: I'll merge mediawiki_reduced stuff before you deploy, even if not ready for druid-snapshot [18:01:54] ok [18:03:20] 10Quarry: Find somewhere else (not NFS) to store Quarry's resultsets - https://phabricator.wikimedia.org/T178520#3694690 (10zhuyifei1999) The query runners must somehow store the results to somewhere the web server can access. Celery does not support sending large tables as a result of a job; doing so would floo... [18:37:01] 10Quarry: Make backups of Quarry's main database on quarry-main-01 - https://phabricator.wikimedia.org/T178519#3694769 (10zhuyifei1999) I'm wondering, would it make sense for a separate instance to connect to quarry-main-01 to do the backups to its instance local storage? Or the backup process should be done on... [19:08:53] (03PS16) 10Joal: Update mediawiki-history-reduced oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) [19:09:20] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy tomorrow" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/379000 (https://phabricator.wikimedia.org/T174174) (owner: 10Joal) [19:17:18] joal: Thanks for the careful code review. These are excellent suggestions and I'm making them. I have a few questions for you if you are online... [19:17:34] Hi Shilad :) [19:17:44] I'm online, thanks for considering my ideas :) [19:18:19] I appreciate them! A question about splitting out the arg handling class... [19:18:40] Yes [19:18:45] Would you put the apply() fn in that class as well (which is essentially the main). [19:18:53] I wasn't sure about the implications for testing that you mentioned. [19:19:18] Ah :) [19:20:08] Shilad: For testing Spark-code I like to use spark-testing-base [19:20:44] Shilad: This package provides helpers giving you a spark context in your test methods, allowing to pass it to functions executing spark-oriented code [19:21:12] I see. So you would like a more unit test format rather than the integrationy test I have? [19:21:32] Shilad: This said, what I meant was: You could split argument parsing/context building in a dedicated class, and have another one doing spark-oriented code [19:22:38] Shilad: Not at all - the tests you provide are definitely good :) I just find it easier to decouple spark from scala [19:22:46] But it;s more my personal view on things :) [19:23:56] That makes sense. So I am creating a SessionPagesBuilderMain() class that contains the entry point for spark. [19:24:57] Shilad: as for naming convetions, we'd rather go with SessionPagesBuilder as the spark class, and SesionPagesJobRunner for parsing and laucnhing [19:25:11] and by the way, we usually don't bother testing the runner ones [19:25:18] WFM. Another question: I really like the idea of using an Array of structures. [19:25:21] * joal feels a bit ashamed [19:26:15] Cool Shilad - This would really make it simpler for others willing to take advantage of your dataset [19:26:48] I'm a little worried about the bytes overhead in going from "pageid=secs=ns" to "{ page_id: pageid, namespace_id=ns, secs_since_prev=secs }" [19:27:10] I was thinking about just using shorter struct names. [19:27:39] Shilad: In hive at least, structures are arrays, and fieldname is just the index [19:27:39] It probably is 1x for the original format, 2x for short struct names, 3x for long names. Is this all ridiculous to worry about? [19:28:13] If I store them as TSVs, I think they are in plain text? [19:28:28] Shilad: fieldname is not part of the data when using structures (just part of the schema), so it shouldn't be bigger [19:28:45] Shilad: Good question, I don't know ! [19:28:52] About TSVs [19:29:13] But I think TSVs for a reusable dataset is not a nice format ... [19:29:26] I think if I use parquet they are pointers but in TSVs they are plain text. [19:29:27] * joal thinks [19:29:36] right [19:29:38] Mwarf [19:29:44] Tell me more about why TSVs are bad? Maybe compared to JSON? [19:30:13] Shilad: TSV is not bad, I just never use it since Inknow parquet [19:30:45] Shilad: I view TSV as a format to be exported, not worked inside the cluster [19:30:58] Shilad: Actually, for middle-dataset, parquet could be good [19:31:20] If people are using parquet that would be easier. I thought I had heard arguments against it from the group, but maybe I misunderstood. [19:31:28] Not for your specific use-case (since you'll read every signle session entirely), but for people willing to stats over the sessions [19:31:52] Got it. Why don't I just use parquet, then. It will be easier. [19:32:04] Shilad: Let's go for parquet :) [19:32:04] And I can use long struct names without concern! [19:32:11] Awesome [19:32:16] Last question: [19:32:30] Then, when you need to extract data to be worked by word2vec algo, refoimarting can be done [19:32:38] Right. [19:32:59] For the individual event timestamps I was thinking about using seconds since start of session instead of seconds since last event. [19:33:13] Are you okay with that? It's a little easier to work with, IMO. [19:34:12] Shilad: as of know I actually don't have examples for which one or the other would be better (or even timestamp) [19:34:34] I think seconds since prev event reads better, but if you are doing things that filter events you need to worry about adding together the filtered out secs if its since last event. [19:34:39] Shilad: Let's use the one you like, and if use-cases telkl us differently later, we'll update :) [19:34:45] great. thanks! [19:35:08] I'm hitting a busy patch at work so you may not hear from me for a while, but I am working on this, and appreciate your input! [19:35:26] Shilad: No prob, it took me a while as well to get to it ;) [19:35:49] Thanks for the work Shilad! [19:40:43] Gone for tonight a-team [19:40:57] byeeee [21:05:29] 10Analytics-Kanban, 10Analytics-Wikistats: Adapt components for new editing metrics - https://phabricator.wikimedia.org/T178461#3695163 (10fdans) [22:08:33] 10Analytics-Tech-community-metrics, 10Bugzilla-Migration, 10Phabricator, 10DevRel-November-2015: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#3695313 (10greg)