Wikipedia:Bots/Requests for approval/PkbwcgsBot 9
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Pkbwcgs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 21:56, Monday, December 17, 2018 (UTC)
Function overview: Fix CW Error #86 (External link with two brackets)
Automatic, Supervised, or Manual: Supervised
Programming language(s): AWB
Source code available: AWB
Links to relevant discussions (where appropriate):
Edit period(s): Once a week
Estimated number of pages affected: 100 to 200 a week
Namespace(s): Mainspace
Exclusion compliant (Yes/No): Yes
Function details: The bot will use AWB to fix error 86 (External link with two brackets). The bot is going to remove the double brackets around the link. For example, [[http://www.google.co.uk]] will become [http://www.google.co.uk]. General fixes will be switched on. Spelling fixing is going to be switched off.
Discussion
[edit]Do you mean error #86? Primefac (talk) 22:00, 17 December 2018 (UTC)[reply]
- @Primefac: Yes, sorry for the confusion. Pkbwcgs (talk) 07:39, 18 December 2018 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.. Primefac (talk) 15:55, 23 December 2018 (UTC)[reply]
- @Primefac: Is it okay if I do the trial with general fixes switched on? The error 86 fixing is part of the general fixes but spell fixing will be turned off. Pkbwcgs (talk) 16:47, 23 December 2018 (UTC)[reply]
- Can you ensure that if #86 is fixed with genfixes on that it will skip the page? Primefac (talk) 17:13, 23 December 2018 (UTC)[reply]
- @Primefac: I can but it will mean a very few edits. It is better to do them with GenFixes on because fixing double brackets around weblinks is part of the genfixes. Pkbwcgs (talk) 17:20, 23 December 2018 (UTC)[reply]
- Okay, let me rephrase - are you doing this on a specific list of pages so that my above question will be moot? Primefac (talk) 18:08, 23 December 2018 (UTC)[reply]
- @Primefac: Yes, I am doing this on a list of pages. My list of pages is located here. Pkbwcgs (talk) 18:13, 23 December 2018 (UTC)[reply]
- Okay. Primefac (talk) 19:06, 23 December 2018 (UTC)[reply]
- I would ideally want to do it with general fixes because error 86 fixing is part of general fixes. I can't code in RegEx to make it do only double bracket fixing. I will see if there is anything in AWB which disables all general fixes apart from double brackets in weblinks fixing. Pkbwcgs (talk) 22:13, 23 December 2018 (UTC)[reply]
- @Primefac: I have good news. I have found the regular expression to do this task. Now I don't need to do general fixes anymore. Here is my regular expression:
- I would ideally want to do it with general fixes because error 86 fixing is part of general fixes. I can't code in RegEx to make it do only double bracket fixing. I will see if there is anything in AWB which disables all general fixes apart from double brackets in weblinks fixing. Pkbwcgs (talk) 22:13, 23 December 2018 (UTC)[reply]
- Okay. Primefac (talk) 19:06, 23 December 2018 (UTC)[reply]
- @Primefac: Yes, I am doing this on a list of pages. My list of pages is located here. Pkbwcgs (talk) 18:13, 23 December 2018 (UTC)[reply]
- Okay, let me rephrase - are you doing this on a specific list of pages so that my above question will be moot? Primefac (talk) 18:08, 23 December 2018 (UTC)[reply]
- @Primefac: I can but it will mean a very few edits. It is better to do them with GenFixes on because fixing double brackets around weblinks is part of the genfixes. Pkbwcgs (talk) 17:20, 23 December 2018 (UTC)[reply]
- Can you ensure that if #86 is fixed with genfixes on that it will skip the page? Primefac (talk) 17:13, 23 December 2018 (UTC)[reply]
- @Primefac: Is it okay if I do the trial with general fixes switched on? The error 86 fixing is part of the general fixes but spell fixing will be turned off. Pkbwcgs (talk) 16:47, 23 December 2018 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete.. Primefac (talk) 15:55, 23 December 2018 (UTC)[reply]
Find: \[\[(https?://[^][<>\s"]+) *((?<= )[^\n\]]*|)\]\]
Replace: [$1 $2]
The bot is going to use that regular expression to complete this task. I have sharpened my programming skills in the last couple of days and I was practising regular expressions recently. Pkbwcgs (talk) 22:21, 24 December 2018 (UTC)[reply]
- Trial complete. That went okay but I wish it went better. The edits are located here and here but the RegEx either doesn't work on some pages, removes only one set of brackets when there is a set of three or more brackets or leaves a space before the last bracket. Can anyone please suggest changes to the RegEx. Pkbwcgs (talk) 16:26, 25 December 2018 (UTC)[reply]
Your regex simply doesn't take into account the situation where someone uses pipes in an elink (e.g. [[https:google.com|Google]]
. I think the best regex would be along the lines of \[\[(http.*?)( |\|)?(.*?)?]]
and replacing with [$1 $3]
. This should cover all of the junk mentioned above, but you'll need to go back over those 50 edits and fix all of the pipe-not-space elink errors (don't you have a bot task that does this already?). Primefac (talk) 19:17, 25 December 2018 (UTC)[reply]
- @Primefac: The first four edits were AWB genfixes, the rest of the edits are with my own regular expression. Pkbwcgs (talk) 20:27, 25 December 2018 (UTC)[reply]
- I like your expression with
$2
representing the pipe which should be taken out so it can't be in the replace expression. Pkbwcgs (talk) 20:29, 25 December 2018 (UTC)[reply]- Sometimes, I corrected the things manually as well when the RegEx wasn't doing the correct thing. Pkbwcgs (talk) 20:36, 25 December 2018 (UTC)[reply]
- I fixed everything in today's edits. Pkbwcgs (talk) 20:44, 25 December 2018 (UTC)[reply]
- Your regular expression still doesn't remove the pipe. Pkbwcgs (talk) 20:49, 25 December 2018 (UTC)[reply]
- Also, I don't have a bot task that handles pipes inside links. I am thinking of opening another BRFA soon that handles it but I need to come up with some RegEx for that. Pkbwcgs (talk) 20:54, 25 December 2018 (UTC)[reply]
- Good point, I missed a set of parens. Try
\[\[(http.*?)(?:(?: |\|)(.*?))?]]
, replacing with[$1 $2]
. Primefac (talk) 17:29, 26 December 2018 (UTC)[reply]- @Primefac: That works! Pkbwcgs (talk) 18:12, 26 December 2018 (UTC)[reply]
- Good point, I missed a set of parens. Try
- Also, I don't have a bot task that handles pipes inside links. I am thinking of opening another BRFA soon that handles it but I need to come up with some RegEx for that. Pkbwcgs (talk) 20:54, 25 December 2018 (UTC)[reply]
- Your regular expression still doesn't remove the pipe. Pkbwcgs (talk) 20:49, 25 December 2018 (UTC)[reply]
- I fixed everything in today's edits. Pkbwcgs (talk) 20:44, 25 December 2018 (UTC)[reply]
- Sometimes, I corrected the things manually as well when the RegEx wasn't doing the correct thing. Pkbwcgs (talk) 20:36, 25 December 2018 (UTC)[reply]
- I like your expression with
Approved for trial (25 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 18:15, 26 December 2018 (UTC)[reply]
- @Primefac: It is still doing it incorrectly sometimes. For example, if it is
[[https://www.google.co.uk]]
, it is replacing it with[https://www.google.co.uk ]
which is wrong. We don't need a space before the last square bracket. Pkbwcgs (talk) 19:03, 26 December 2018 (UTC)[reply]- So put in a secondary find/replace for
_]
(_ used to indicate a space) and replace with]
. Primefac (talk) 20:13, 26 December 2018 (UTC)[reply]- @Primefac: That is still not working properly. It is now unable to identify the link with double brackets. Pkbwcgs (talk) 20:28, 26 December 2018 (UTC)[reply]
- Well, the other option is to just do two find statements.
\[\[(http[^ \|]*?)]]
→[$1]
\[\[(http.*?)(?:(?: |\|)(.*?))?]]
→[$1 $2]
- Do them in that order and it will catch everything. Primefac (talk) 17:59, 27 December 2018 (UTC)[reply]
- @Primefac: There is still that annoying space before the closing square bracket of the external link. Do you know how can I get AWB to perform the regular expressions in order like you stated. Pkbwcgs (talk) 18:58, 27 December 2018 (UTC)[reply]
- The order you put them into AWB is the order they'll run. Primefac (talk) 18:59, 27 December 2018 (UTC)[reply]
- @Primefac: There is still that annoying space before the closing square bracket of the external link. Do you know how can I get AWB to perform the regular expressions in order like you stated. Pkbwcgs (talk) 18:58, 27 December 2018 (UTC)[reply]
- Well, the other option is to just do two find statements.
- @Primefac: That is still not working properly. It is now unable to identify the link with double brackets. Pkbwcgs (talk) 20:28, 26 December 2018 (UTC)[reply]
- So put in a secondary find/replace for
Approved. As far as the edits themselves, they're perfectly fine. The pages where they're found, and how they're used, are another matter entirely. I would suggest periodically piping the edit list to the MOS and GOCE wikiprojects so that they can fix them. Primefac (talk) 02:38, 28 December 2018 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.