[00:37:33] (03PS9) 10Nuria: [WIP] UDF to tag requests [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/353287 (https://phabricator.wikimedia.org/T164021) [01:06:45] (03PS1) 10Nuria: [WIP] Refactor PageviewDefinition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 [01:07:47] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Refactor PageviewDefinition [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (owner: 10Nuria) [01:09:19] (03CR) 10Nuria: "Please take a look @joal, I think we should clean up a bit PageviewDefinition before adding functionality that has to do with redirects. L" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (owner: 10Nuria) [01:26:01] (03PS2) 10Nuria: [WIP] Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) [01:26:44] (03CR) 10Nuria: "Waiting for @joals opinion, will need to fix tests." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) (owner: 10Nuria) [01:28:06] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) (owner: 10Nuria) [07:43:05] 10Analytics, 06Operations, 15User-Elukey: Review Megacli Analytics Hadoop workers settings - https://phabricator.wikimedia.org/T166140#3299453 (10elukey) [07:48:34] 10Analytics, 06Operations, 15User-Elukey: Review Megacli Analytics Hadoop workers settings - https://phabricator.wikimedia.org/T166140#3299460 (10elukey) On analytics1033: ``` sudo megacli -AdpBbuCmd -a0 elukey@analytics1033:~$ sudo megacli -AdpBbuCmd -a0 BBU status for Adapter: 0 BatteryType: BBU Volt... [07:48:54] 10Analytics, 06Operations, 10ops-eqiad, 15User-Elukey: Review Megacli Analytics Hadoop workers settings - https://phabricator.wikimedia.org/T166140#3299462 (10elukey) [08:11:50] so about --^ it seems that an1033 and an1039 have a broken BBU [08:11:53] like an1030 [08:12:28] megacli returns the date of birth of those BBUs, 2011 :D [08:13:07] since we are still under warranty it is good to replace them [08:29:20] joal: I found a jvm on kafka1018 running with your username [08:30:04] kafka.tools.ConsoleConsumer --zookeeper conf1001.eqiad.wmnet,conf1002.eqiad.wmnet,conf1003.eqiad.wmnet/kafka/eqiad --topic eqiad.mediawiki.revision_creata [08:30:24] started May19 [08:30:34] is it meant to be there? [08:38:09] elukey: wow, I thought I had killed that ! [08:38:43] can I kill it? [08:38:57] please elukey [08:39:08] thanks! [08:39:31] (I am restarting the jvm daemons for upgrades and checking with lsof the DELs, this is why I found it, not to be the ops cop :P) [09:05:36] (03CR) 10Joal: [C: 04-1] "Comments inline" (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) (owner: 10Nuria) [09:16:23] a-team: need to go to the dentist, will be back after lunch! [09:16:27] * elukey afk [09:16:44] good luck elukey ! [09:19:36] schana: Good morning [09:19:40] schana: Are you around? [09:34:20] joal, I should be back from lunch around 1 [09:34:27] 13:00 [09:34:38] ok, I usually take my break around 14:00 [10:28:52] (03PS4) 10Joal: Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/353310 (https://phabricator.wikimedia.org/T143928) [10:51:11] joal: I'm back if you're available [10:53:24] Hey schana [10:53:45] hi [10:54:00] schana: https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave [10:54:25] You should use you WMF account [10:54:28] schana: --^ [10:54:31] yeah, switching [11:55:18] elukey, yt? [11:56:36] mforns: sure [11:56:39] heloooo [11:57:11] what do you think of putting the whitelist inside your patch? I can update it with the new fields and add it to the eventlogging_cleaner project? [11:58:35] mforns: it seems more a configuration option rather than something that should stay in a repo [11:59:39] elukey, aha. Makes sense, so should I leave it in files/mariadb/eventlogging_purging_whitelist.tsv? [12:00:38] mforns: still not sure, I am talking with all the dbas (and Riccardo) about what is the best location for the script.. Now it seems that it would be better to run from a maintenance host like neodymium, deploying to a db is not something super good [12:01:16] mmmh [12:01:25] k [12:02:15] yeah I know, trying to find a compromise :D [12:14:14] (03PS6) 10Joal: Add last access uniques global oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) [12:15:52] ok, taking a break a-team [12:32:58] /me lunchy! [12:33:05] wat [12:54:28] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Automatically sync mediawiki-identities/wikimedia-affiliations.json DB dump file with the data available on wikimedia.biterg.io - https://phabricator.wikimedia.org/T157898#3299959 (10Aklapper) This might not be related at all, but... las... [12:57:20] 10Analytics-Tech-community-metrics, 06Developer-Relations (Apr-Jun 2017): Automatically sync mediawiki-identities/wikimedia-affiliations.json DB dump file with the data available on wikimedia.biterg.io - https://phabricator.wikimedia.org/T157898#3299962 (10Aklapper) And (back to topic) generally speaking, I'm... [14:09:26] 10Analytics: Serbian Wikipedia edits spike 2016 - https://phabricator.wikimedia.org/T158310#3300136 (10Milimetric) Cool, could just close this task? [14:28:49] 06Analytics-Kanban: Preserve userAgent field in apps schemas - https://phabricator.wikimedia.org/T164125#3300259 (10mforns) a:03mforns [14:34:54] ok to install mysql-connector-java updates on the hadoop cluster, I can install these on a single host first [14:44:44] moritzm: i assume it should be fine [14:44:54] plenty of stuff does use that though [14:45:09] it won't hurt anything to install it, but if there are bugs or something it could break stuff. [14:45:12] i expect it to be fine though [14:46:38] I''ll update analytics1031 and if nothing breaks, the remaining ones tomorrow [14:48:44] k, you should maybe to stat1002/stat1004 [14:48:48] they are client nodes [14:48:58] and will probably talk to mysql on analyisc1003 for hive stuff [14:49:08] oozie runs on analytics1003, and also talks to mysql there [14:49:19] so the analytics1003 update will be the most impactful [14:49:24] thanks moritzm [14:53:29] yep, the others will also be dealt with next, doing stat* next [15:00:56] ottomata: standdup [15:01:00] oming [15:02:57] stat* doesn't have it installed [15:12:39] 06Analytics-Kanban: Preserve userAgent field in apps schemas - https://phabricator.wikimedia.org/T164125#3300396 (10mforns) I am currently adding the new userAgent fields to the EventLogging purging white-list. Between the 27 schemas listed in the task description, I found 7 that did not exist yet at the time of... [15:18:50] 06Analytics-Kanban: Preserve userAgent field in apps schemas - https://phabricator.wikimedia.org/T164125#3300439 (10mforns) Sorry, I sent this comment by accident before finishing it (adding links and transcribing note-like lines to sentences). Well, I guess it's understandable in spite of that. @Tbayer please... [15:49:50] fdans: not sure what you did to my brain but now each time I ear or read "team europe" I autoplay your voice in my head [15:50:04] xDDDD [15:50:05] YESSSS MISSION ACCOMPLISHED [15:50:23] milimetric: I got some time now if you want to talk vertical strategy [15:50:55] fdans: yeah, one sec, listening to Ross yelling Pivot :) [15:51:03] well, actually, in 3min - making coffee [15:55:26] xD [16:04:23] elukey: the team europe thing comes from an old gameplay video: https://youtu.be/qDzzqLJqLjg?t=3m11s [16:04:43] gavin starts saying gooooo blue team each time and the others get more and more irritated [16:04:51] 10Analytics, 10RESTBase, 06Services: REST API entry point web request statistics at the Varnish level - https://phabricator.wikimedia.org/T122245#3300673 (10Nuria) I know this might come a bit late but wouldn't this be a good candidate for using the tagging process we are defining? see: https://docs.google... [16:08:15] a-team: analytics-slave.eqiad.wmnet will go down for maintenance (probably max 1 hour) thursday at 1600 UTC [16:08:22] to replace the raid BBU [16:08:34] what email lists should I alert? [16:08:45] analytics@ for sure, researchers [16:08:50] elukey: i think analytics is just fine, we asked a while ago and folks said they don't really use db1047 [16:08:58] super [16:09:02] going to send the email [16:14:26] 06Analytics-Kanban, 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1046 - https://phabricator.wikimedia.org/T166422#3295950 (10Cmjohnson) The disk was indeed bad...so it's been replaced. I don't know if I have enough bbu's to go around.. I am swapping them out of decom'd servers. [16:16:33] 06Analytics-Kanban, 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1046 - https://phabricator.wikimedia.org/T166422#3300760 (10Marostegui) If it helps there are three more servers totally ready for you to decomm them: T166486 T163778 T164702 [16:23:33] 06Analytics-Kanban, 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1046 - https://phabricator.wikimedia.org/T166422#3300786 (10elukey) ``` elukey@db1046:~$ sudo megacli -pdrbld -showprog -physdrv\[32:3\] -aALL Rebuild Progress on Device at Enclosure 32, Slot 3 Completed 35% in 9 Minutes. Exit Code:... [16:25:05] 06Analytics-Kanban, 06Operations, 10Traffic: Artificial spike in offset of unique devices from November to February 6th on wikidata - https://phabricator.wikimedia.org/T165560#3300795 (10Nuria) [16:25:08] 06Analytics-Kanban, 13Patch-For-Review: Count global unique devices per top domain (like *.wikipedia.org) - https://phabricator.wikimedia.org/T143928#3300794 (10Nuria) [16:25:37] Chris swapped the broken disk for db1046, raid is rebuilding (EL master db) [16:26:13] going afk people! [16:26:17] ttl! [16:26:18] o/ [16:26:21] * elukey afk! [16:28:56] thanks elukey laters! [16:29:35] 10Analytics, 06Operations, 10ops-eqiad, 15User-Elukey: Review Megacli Analytics Hadoop workers settings - https://phabricator.wikimedia.org/T166140#3300849 (10Cmjohnson) Ordered both servers to get new cards You have successfully submitted request SR948957999. You have successfully submitted request SR948... [16:40:55] joal: are you good abandoning the prior patch of is_redirect_to_pageview and working of the 2nd one? [16:47:49] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: New SCB nodes - https://phabricator.wikimedia.org/T166342#3293197 (10RobH) [16:48:26] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: New SCB nodes - https://phabricator.wikimedia.org/T166342#3293197 (10RobH) a:03RobH Please note that the specifications for this hardware are identical to the spare pool with 4 * 4TB SATA, projected for possible purchase on T166265... [16:49:03] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: New SCB nodes - https://phabricator.wikimedia.org/T166342#3300982 (10Ottomata) After talking with Faidon, this order should no longer happen this quarter. New scb nodes are budgeted for next FY, and lated to be purchased in Q3. It... [16:49:53] 10Analytics, 10ChangeProp, 10EventBus, 06Services (later), 15User-mobrovac: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088#3300990 (10Ottomata) [16:49:56] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: New SCB nodes - https://phabricator.wikimedia.org/T166342#3300988 (10Ottomata) 05Open>03declined I'm inclined to decline this task. We can create a new on in Q3 when it is time to order these. [16:58:06] yes nuria_, works for me [16:59:08] (03Abandoned) 10Nuria: Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/353310 (https://phabricator.wikimedia.org/T143928) (owner: 10Joal) [17:03:17] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: eqiad: (3)+ nodes for Druid / analytics - https://phabricator.wikimedia.org/T166510#3301092 (10RobH) Those have SSDs, and cannot be ordered within the time line for this fiscal year. [17:04:00] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: eqiad: (3)+ nodes for Druid / analytics - https://phabricator.wikimedia.org/T166510#3301093 (10Ottomata) p:05High>03Normal Oh foo, I forgot, druid nodes have SSDs. We won't be able to get this in time for this FY's remainder budget.... [17:16:53] (03CR) 10Mforns: "LVGTM overall! I think I found a minor thing, though, see comment." (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) (owner: 10Joal) [17:21:39] (03CR) 10Joal: [C: 04-1] "Bug found, thanks @mforns" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) (owner: 10Joal) [17:22:26] (03PS7) 10Joal: Add last access uniques global oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) [17:24:01] (03CR) 10Mforns: [C: 031] "LGTM! :]" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) (owner: 10Joal) [17:24:22] mforns: I think this code needs a full revamp for names ... [17:24:30] aha [17:24:34] ok [17:24:41] mforns: If it looks correct, the change should hopfully be only syntactic [17:24:47] ok I see [17:25:16] yes, it looks good to me! [17:32:53] 10Analytics, 06Discovery, 10Elasticsearch, 06Discovery-Search (Current work), 13Patch-For-Review: Analytics cluster should connect to elasticsearch over SSL - https://phabricator.wikimedia.org/T157943#3301236 (10debt) 05Open>03Resolved a:03debt [17:50:35] urandom: heyaaa, if you find some time, https://gerrit.wikimedia.org/r/#/c/355782/ is ready for review [18:00:17] ottomata: kk [18:01:47] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install replacement to stat1005 (stat1002 replacement) - https://phabricator.wikimedia.org/T165368#3301436 (10Cmjohnson) [18:02:21] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install replacement to stat1005 (stat1002 replacement) - https://phabricator.wikimedia.org/T165368#3264256 (10Cmjohnson) [18:02:58] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install replacement stat1006 (stat1003 replacement) - https://phabricator.wikimedia.org/T165366#3301438 (10Cmjohnson) [18:08:27] 10Analytics, 10Reading Epics, 06Wikipedia-iOS-App-Backlog, 07Spike, 05iOS-app-v5.6.0-Goat-On-A-Train: Research and define initial technical requirements for app analytics - https://phabricator.wikimedia.org/T164801#3301455 (10NHarateh_WMF) [18:28:27] (03PS3) 10Nuria: [WIP] Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) [18:32:12] 06Analytics-Kanban, 13Patch-For-Review: Count global unique devices per top domain (like *.wikipedia.org) - https://phabricator.wikimedia.org/T143928#3301522 (10Nuria) @joal: are there any updates we want to add here after our findings about content-type? [18:34:09] 06Analytics-Kanban, 13Patch-For-Review: Count global unique devices per top domain (like *.wikipedia.org) - https://phabricator.wikimedia.org/T143928#3301540 (10JAllemandou) >>! In T143928#3301522, @Nuria wrote: > @joal: are there any updates we want to add here after our findings about content-type? There ar... [18:59:04] 06Analytics-Kanban, 13Patch-For-Review: Count global unique devices per top domain (like *.wikipedia.org) - https://phabricator.wikimedia.org/T143928#3301667 (10JAllemandou) (Hopefully) last comment in that thread: We have found the problem. Thanks a lot @Nuria for all the bike-shedding and data vetting! **T... [18:59:07] nuria_: --^ [18:59:18] nuria_: do you mind please read that and tell me if it makes any sense? [19:02:15] joal: back now! [19:02:23] heya nuria_ [19:02:59] was asking if you could review my last comment on uniques, just making sure it actually makes some sense to the only one who can understand it :) [19:03:04] nuria_: --^ [19:03:33] joal: i feel SO special [19:03:37] huhuhu :) [19:03:38] joal: ya, it is good [19:03:55] joal: i send another code change , will do a review myself and remove WIP if pertains [19:04:16] nuria_: I'll CR tomorrow morning [19:04:23] joal: yaya, no rush [19:04:35] joal: we probably need to redo variability calculations [19:05:25] nuria_: about changing names, shall we go with (instead of last_access_unqiues and last_access_uniques_global): unique_devices/per_domain and unique_devices/project_wide ? [19:05:35] nuria_: very much yes [19:05:56] joal: ya, that naming seems a lot better [19:06:03] nuria_: As said in task, I also suggest we correct per-domain uniques for the offset patch [19:06:15] Ok -- Will submit a big patch [19:06:19] joal: let's try to do control chnages [19:06:28] joal: rather than 1 big patch [19:06:51] joal: so let's do project wide first, bake those changes, [19:07:04] ok nuria_ [19:07:04] joal: and once we havebaked those [19:07:19] joal: we can go back to "per domain" computation [19:07:31] ok, that's my plan for tomorrow then :) [19:07:45] joal: otherwise I feel it is hard to track too many chnages at once [19:07:49] *changes [19:08:00] thanks again for the support nuria_, I probably would have broken my nerves without your help [19:08:17] joal: same here [19:08:36] Gone for tonight with a plan, awesome :) [19:08:41] Bye a-team ! [19:09:30] byyyee [19:10:40] ottomata: if you have a minute if you could sanity check this chnages: https://gerrit.wikimedia.org/r/#/c/356125/ [19:15:27] looking [19:16:08] (03PS4) 10Nuria: [WIP] Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) [19:16:32] ottomata: it does a mall refactor of pageview definition to better fit changes required for uniques and tagging [19:16:53] ottomata: feel free to have strong opinions [19:18:04] ottomata: just sent last patch (tests still pending) [19:18:47] k [19:34:21] nuria_: main comment: [19:34:28] ottomata: yessir [19:34:32] why not just have RedirectToPageview logic be a method on PagevieDefinition? [19:34:49] PageviewDefinition.getInstance().isRedirectToPageview() [19:34:49] ? [19:35:19] and, if you do that, you don't need to move the pageview speciifc things to a Constants class [19:35:25] ottomata: I can do that , i di d not do it at first cause I would like to migrate code "out" of pageview definition rtaher than in [19:35:27] since they are only used by PageviewDefinition [19:35:37] why? this one specifically is about Pageviews though [19:35:39] , no? [19:36:14] ottomata: no, the constants are use latter by tagging code and in other couple udfs, they are a non-belonger there according to all java conventions [19:36:28] public static final Set PAGEVIEW_WORTHTY_HTTP_CODES = new HashSet(Arrays.asList( [19:36:28] "200", [19:36:28] "304" [19:36:28] )); [19:36:30] feels really bad [19:36:37] its equivalent to a global, no? [19:36:43] we'll end up sticking everything in there [19:36:54] and, REDIRECT_HTTP_CODES [19:36:54] ottomata: in constants? [19:36:56] is not totally true [19:36:57] (yes) [19:37:03] beacuse it is not all REDIRECT_HTTP_CODES [19:37:15] just the ones you are interested in for this use case [19:37:18] ok, will change names [19:37:31] i don't really like this core/Constants.java class :) [19:37:57] a redirect pageview is a webrequest, a pageview is a webrequest [19:37:58] but having a constants file is a well stablished convention , everytime there is an import of aclass to just use the constants defined there is a smell [19:38:13] the pageview specific things belong in pageview, and the higher level webrequest specific things belong there [19:38:29] but, if they are public static constants on the class [19:38:43] you can just do [19:38:55] PageviewDefinition.CONTENT_TYPES [19:39:00] and it is clear what that means [19:39:27] a class named refinery.core.Constants should not have things that are not super duper generic in it [19:40:51] i can see what you mean but i do not agree with having a bunch of constants on pageview definition, the pageview definition uses those values [19:40:58] ottomata: it does not define them [19:41:15] ??? [19:41:16] why not? [19:41:19] ottomata: everytime you need to assess a 200 on our code you end up calling pageviewDefinition [19:41:23] ahh [19:41:24] ok [19:41:31] but, you did not name the constant [19:41:38] HTTP_200 [19:41:45] you named it PAGEVIEW_WORHTY [19:41:57] true, bad bad naming [19:42:07] soooo [19:42:15] what you want to do is make a constant for every value used by pageviewdef [19:42:19] and then have pageview def group them??? [19:42:40] no, rather i want pagevboiew definition to use enums or constants defined elsewhere [19:42:45] CONTENT_TYPE_TEXT_HTML = "text/html"; [19:42:45] CONTENT_TYPE_TEXT_HTML_ISO_ ... = ... [19:42:46] etc. [19:42:46] ? [19:42:54] ottomata: cause those are common to a bunch of code [19:43:06] ottomata: not just to the pageview definition [19:43:09] the values might be geenric, the enums are not [19:43:18] the groupings of those values are pageview specific [19:43:25] httpstatus and content type are the best examples [19:43:46] sure, but you didn't define the generic constants [19:43:53] true [19:43:54] you defined a pageiew specific groupings of values [19:44:44] so, none of those http status or content type Constants as you defined will be used outside of pageview [19:45:22] no, not true, the tag code i think will use those and so will be some of the code needed for uniques [19:45:45] the tag code is going to use PAGEVIEW_WORTHY_HTTP_CODES [19:45:45] ? [19:47:30] or a way to asses that http status was either 200 or 304 [19:47:37] as an example [19:47:44] ottomata: that is not far fetched [19:47:59] what does 304 indicate again? [19:48:05] ottomata: as all requests that serve content on our end that are not pageviews have that code [19:48:33] so what you want are a grouping of successful status codes? [19:48:46] ottomata: 304 (on our end) is sent out a bunch for maps requests for example [19:49:14] ottomata: i guess conditional gets and others? [19:49:15] ahh ok [19:49:20] hm [19:49:28] ok so yeah, you want successful http status codes [19:49:34] nuria_: why not put that on webrequest? [19:49:39] sounds pretty webrequset specific [19:50:46] Webrequest.HTTP_STATUS_SUCCESSES (or somehting) [19:50:59] and [19:51:09] ottomata: i could do that, but boy that class itself needs work, thus far it is just a bunch of utility functions [19:51:15] ottomata: but yes, i can do that [19:51:28] yeah, agree, itd' be nice if this was all structure nice and OO like [19:51:41] a pageiew is an instance (or composes) a webrequest [19:51:46] (of a) [19:52:03] but, yeah, this sounds likea webrequest utility function anyway [19:52:36] Webrequest.getInstance().isSuccess(http_status) (or something) [19:52:40] or even a pageview is a just a set of filters on top of webrequest data [19:52:46] aye [19:52:54] ottomata: which is the idea i ma trying to transition with the tags [19:53:01] I am [19:53:02] ya sounds good [19:53:26] maybe we should make an effort to refactor webrequest so that it is a good model class [19:53:36] might make alot of this easier [19:53:57] haha, or we could jsut start doing scala :O [19:54:03] ok, will move constants to webrequest but that makes me feel that i need to refactor that class too, i bet our refine will go way faster for example if we cache inputs and outputs of this function [19:54:04] https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L242 [19:54:06] (jk, sorta) [19:54:20] ottomata: no, the main problem is not webrequest [19:54:35] is the pageview definition lack of structure [19:54:37] oh ya, and use that LRUcache thing? [19:54:38] yeah [19:54:41] ottomata: ya [19:54:59] aye [19:55:04] nuria_: what is the main problem? [19:55:13] ottomata:in PV definition? [19:55:46] oh sorry [19:55:47] you said [19:56:27] hm, nuria_ ya it would be nice if PV def used webrequest pojo or webrequest class, but it doesn't seem taht bad, does it? [19:56:58] ottomata: pv using pojo i think i got fixed with the webrequestdata [19:57:13] oh ya, cool right [19:57:14] you did that [19:57:14] cool [19:57:19] nice [19:57:24] so yeah, what's bad about it then? [19:57:36] ottomata: what could be a lot better about PV is that criteria is not specific. it coudl be a composition of criteria [19:57:59] like : criteria 1 : httpstatus, criteria2: content type, criteria3 : project view [19:58:13] and any criteria not met renders request not a pageview [19:58:14] hm, in that each is__Pageview function is a big conditional mess? [19:58:31] ottomata: no is a loop through a filter chain [19:58:51] ottomata: kind of like the tags [19:58:51] ahhh ok, yeah, right. [19:58:56] i mean, currently its a mess [19:59:03] a mess of conditionals [19:59:10] yes [19:59:20] ya ok, sounds fine. [19:59:28] i still don't like thiws Constants class though :p [19:59:40] i think you can make all of these go where they belong, and not make them global [19:59:46] ottomata: my strategy towards refactoring PV that is moving could out so i can fit it in the tag structure [19:59:49] either in PageviewDef, or Webrequest [20:00:03] ya, put it in webrequset [20:00:37] ottomata: and after have pv definition as is running and tag "pageview" running, when tag pageview and pageview definition give the same result we can delete all code in pv definition class [20:00:59] ottomata: will move constants to webrequest [20:01:08] ottomata: actually i am going to do that change just now [20:01:23] ok, but when you have a way to make thsi filter chain, you can define the grouping constants (like PAGEVIEW_WORTHY_CONTENT_TYPES), in the chain where it matters [20:01:36] ottomata: yes [20:01:38] putting it in Constatns now just so you can use it in two places, but delete the first one later [20:01:42] i think isn't worth it [20:02:07] ottomata: ok, will move constants to webrequest but leave changes regarding POJO [20:02:23] not all of these constants to Wr though, right? [20:02:27] just the webreuest specific ones [20:02:34] the pageview specific ones will stay in PageviewDefinition (for now?) [20:03:14] ottomata: but which ones are pv definition specific? https://gerrit.wikimedia.org/r/#/c/356125/4/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Constants.java [20:03:29] ottomata: a "portal" pageview (http://www.wikipedia.org) [20:03:34] ottomata: will return 200 [20:03:42] ottomata: or who knows maybe 304 [20:03:50] https://gerrit.wikimedia.org/r/#/c/356125/4/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java [20:03:53] ottomata: and it is not a pageview at all [20:03:56] httpStatusesSet can go to Wr [20:03:59] as something like [20:04:09] HTTP_STATUS_SUCCESSES [20:04:14] ottomata: k [20:04:28] the rest sounds like they stay in PV def, no? [20:04:35] with a function there isRedirectToPageview() [20:04:36] ? [20:04:47] oh [20:04:48] REDIRECT_HTTP_CODES [20:04:52] coudl go in Webrequest too [20:05:18] the rest are just groupings of values used to identify types of pageviews [20:06:11] ottomata: and content types also on webrqeuest right? [20:06:39] no [20:06:49] you want to put something called PAGEVIEW_CONTENT_TYPES on webrequest? [20:06:52] ottomata: wait why not? [20:07:07] ottomata: but it is just like http codes [20:07:31] naw, the http codes are http level only, they have nothing to do pageviews, do they? [20:07:52] you are saying: these http codes are succcessful http requests to any requested web resources [20:07:55] or [20:08:10] these http codes are redirects to a web resources somewhere else [20:08:11] ottomata: yes,m but same argument for content types right? [20:08:25] no [20:08:28] a 200 to an image [20:08:32] is a successful webrequest [20:08:35] but not a pageview [20:08:44] so, a http status is more generic [20:08:48] your content types are only for pageview [20:08:48] s [20:08:53] (right?) [20:09:13] by putting the generic status groupgins in webrequset, you coudl do [20:09:22] webrequest.isSuccess [20:09:22] or [20:09:25] webrequset.isRedirect [20:09:42] but, how woudl you use the PAGEVIEW_WORTHY_CONTENT_TYPES in webrequest? [20:09:56] webrequest.isPageviewContentType() ? [20:10:02] smells like it shoudl be in the pageview class [20:10:03] no? [20:10:29] webrequest.isMainDocContentType() [20:10:38] as defined by what? [20:10:45] maybe you want [20:10:52] webrequest.isTextHtmlContentType [20:10:53] that sounds fine [20:10:59] if so, then sure, ya, put it in webrequest [20:11:00] ? [20:11:01] :) [20:11:04] k, agreed [20:11:16] but, nuria [20:11:29] woudln't you then just want to check if content_type starts with text/html [20:11:30] ? [20:11:58] ottomata: well, anything better than regex [20:12:06] ottomata: much rather have inSet [20:12:10] aye, i guess i don't knwo what the charset bit is for in this usage [20:12:21] sounds like your function is going to be super specific here [20:12:41] isTextHtmlISO8859orUTF8ContentType() :p [20:12:49] jaja [20:12:52] maybe you just want [20:13:02] ottomata: no wait i think string on 1.7 has a start with now, right? [20:13:24] ottomata: ya, it does [20:13:57] hmm, yeah, nuria, mabye this function is just [20:14:03] isHtmlContentType [20:14:09] or [20:14:21] webrequest.isHtml() [20:14:22] or something [20:14:27] sure sure, ok, put it as is :p [20:14:31] in webrequest with a nice name [20:14:33] ottomata: ya, startswith better, there on 1.5 [20:14:45] well, it'd be more generic, if that is what you want [20:14:58] maybe you want to only say that these 5 specific content types are for text/html [20:15:02] and reject anything [20:15:05] i don't know what this is really used for so [20:15:48] ottomata: ya, i think we cannot really change it seeing how it might affect [20:16:03] ottomata: will wrap things differently but not change functionality [20:16:14] (03CR) 10EBernhardson: [cirrus] Distinguish morelike vs fulltext api search requests (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/345863 (owner: 10DCausse) [20:17:48] ottomata: ok, will do changes now! [20:17:50] k [20:17:51] ottomata: many thanks [20:18:05] (03CR) 10Ottomata: [WIP] Provide RedirectToPageview function and UDF (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) (owner: 10Nuria) [20:47:24] (03PS5) 10Nuria: [WIP] Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) [20:49:27] ottomata: moved: https://gerrit.wikimedia.org/r/#/c/356125/5/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java [20:56:28] (03CR) 10Ottomata: "one nit, +1 to general idea :)" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) (owner: 10Nuria) [22:12:28] (03PS6) 10Nuria: Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 (https://phabricator.wikimedia.org/T143928) [22:31:52] (03PS1) 10Nuria: Memoize host normalization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356307 [22:35:36] (03PS7) 10Nuria: Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356125 [22:36:56] 06Analytics-Kanban: Improve processing of host on refine step - https://phabricator.wikimedia.org/T166628#3302469 (10Nuria) [22:37:16] 06Analytics-Kanban: Improve processing of host on refine step on Webrequest.java - https://phabricator.wikimedia.org/T166628#3302480 (10Nuria) [22:37:57] (03PS2) 10Nuria: Memoize host normalization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/356307 [22:43:36] 06Analytics-Kanban, 13Patch-For-Review: Correct uniques computation to not exclude countries that don't have either underestimates or offset - https://phabricator.wikimedia.org/T165661#3302496 (10Nuria) 05Open>03Resolved [22:44:08] 06Analytics-Kanban: Update druid unique Devices Dataset to only contain hosts having more than 1000 uniques - https://phabricator.wikimedia.org/T164183#3224787 (10Nuria) Ping @JAllemandou has data reloading happened? [22:45:19] 06Analytics-Kanban, 13Patch-For-Review: Update restbase oozie job - https://phabricator.wikimedia.org/T163479#3302514 (10Nuria) 05Open>03Resolved [22:46:57] 06Analytics-Kanban, 13Patch-For-Review: Load pivot pageview-hourly dataset every hour - https://phabricator.wikimedia.org/T164730#3302543 (10Nuria) 05Open>03Resolved [22:48:26] 06Analytics-Kanban, 07Easy, 13Patch-For-Review: Don't accept data from automated bots in Event Logging - https://phabricator.wikimedia.org/T67508#3302545 (10Nuria) 05Open>03Resolved [22:51:55] 06Analytics-Kanban, 13Patch-For-Review: Pre-generate mysql ORM code for sqoop - https://phabricator.wikimedia.org/T143119#3302555 (10Nuria) 05Open>03Resolved [22:52:10] 06Analytics-Kanban, 13Patch-For-Review: Update mediawiki history oozie SLA - https://phabricator.wikimedia.org/T164713#3302556 (10Nuria) 05Open>03Resolved [22:58:40] 06Analytics-Kanban, 13Patch-For-Review: Provide uniques estimate/offset breakdowns available in dumps - https://phabricator.wikimedia.org/T164597#3302561 (10Nuria) Looks like files are re-generated: https://dumps.wikimedia.org/other/unique_devices/2015/2015-12/ I think we are just missing docs: https://wikit... [23:03:02] 06Analytics-Kanban, 13Patch-For-Review: Provide uniques estimate/offset breakdowns available in dumps - https://phabricator.wikimedia.org/T164597#3302568 (10Nuria) Updated docs, closing [23:03:08] 10Analytics: Provide uniques offset/underestimate breakdowns in AQS - https://phabricator.wikimedia.org/T164596#3302570 (10Nuria) [23:03:10] 06Analytics-Kanban, 13Patch-For-Review: Provide uniques estimate/offset breakdowns available in dumps - https://phabricator.wikimedia.org/T164597#3302569 (10Nuria) 05Open>03Resolved