summaryrefslogtreecommitdiff
path: root/searx/search/processors
Commit message (Collapse)AuthorAge
* [mod] add option max_pageMarkus Heiser2023-12-03
| | | | | | | Related: https://github.com/searxng/searxng/issues/2982 Closes: https://github.com/searxng/searxng/issues/2972 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [fix] spellingjazzzooo2023-09-18
|
* use logger.warningpankaj2023-05-19
| | | | | logger.warn() is depricated. logger.warning is already being used in some files.
* [fix] searxng_extra/update/update_engine_descriptions.py (part 1)Markus Heiser2023-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow up of #2269 The script to update the descriptions of the engines does no longer work since PR #2269 has been merged. searx/engines/wikipedia.py ========================== 1. There was a misusage of zh-classical.wikipedia.org: - `zh-classical` is dedicate to classical Chinese [1] which is not traditional Chinese [2]. - zh.wikipedia.org has LanguageConverter enabled [3] and is going to dynamically show simplified or traditional Chinese according to the HTTP Accept-Language header. 2. The update_engine_descriptions.py needs a list of all wikipedias. The implementation from #2269 included only a reduced list: - https://meta.wikimedia.org/wiki/Wikipedia_article_depth - https://meta.wikimedia.org/wiki/List_of_Wikipedias searxng_extra/update/update_engine_descriptions.py ================================================== Before PR #2269 there was a match_language() function that did an approximation using various methods. With PR #2269 there are only the types in the data model of the languages, which can be recognized by babel. The approximation methods, which are needed (only here) in the determination of the descriptions, must be replaced by other methods. [1] https://en.wikipedia.org/wiki/Classical_Chinese [2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters [3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter Closes: https://github.com/searxng/searxng/issues/2330 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] remove obsolete EngineTraits.supported_languagesMarkus Heiser2023-03-24
| | | | | | | | All engines has been migrated from ``supported_languages`` to the ``fetch_traits`` concept. There is no longer a need for the obsolete code that implements the ``supported_languages`` concept. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] Google: reversed engineered & upgrade to data_type: traits_v1Markus Heiser2023-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Partial reverse engineering of the Google engines including a improved language and region handling based on the engine.traits_v1 data. When ever possible the implementations of the Google engines try to make use of the async REST APIs. The get_lang_info() has been generalized to a get_google_info() function / especially the region handling has been improved by adding the cr parameter. searx/data/engine_traits.json Add data type "traits_v1" generated by the fetch_traits() functions from: - Google (WEB), - Google images, - Google news, - Google scholar and - Google videos and remove data from obsolete data type "supported_languages". A traits.custom type that maps region codes to *supported_domains* is fetched from https://www.google.com/supported_domains searx/autocomplete.py: Reversed engineered autocomplete from Google WEB. Supports Google's languages and subdomains. The old API suggestqueries.google.com/complete has been replaced by the async REST API: https://{subdomain}/complete/search?{args} searx/engines/google.py Reverse engineering and extensive testing .. - fetch_traits(): Fetch languages & regions from Google properties. - always use the async REST API (formally known as 'use_mobile_ui') - use *supported_domains* from traits - improved the result list by fetching './/div[@data-content-feature]' and parsing the type of the various *content features* --> thumbnails are added searx/engines/google_images.py Reverse engineering and extensive testing .. - fetch_traits(): Fetch languages & regions from Google properties. - use *supported_domains* from traits - if exists, freshness_date is added to the result - issue 1864: result list has been improved a lot (due to the new cr parameter) searx/engines/google_news.py Reverse engineering and extensive testing .. - fetch_traits(): Fetch languages & regions from Google properties. *supported_domains* is not needed but a ceid list has been added. - different region handling compared to Google WEB - fixed for various languages & regions (due to the new ceid parameter) / avoid CONSENT page - Google News do no longer support time range - result list has been fixed: XPath of pub_date and pub_origin searx/engines/google_videos.py - fetch_traits(): Fetch languages & regions from Google properties. - use *supported_domains* from traits - add paging support - implement a async request ('asearch': 'arc' & 'async': 'use_ac:true,_fmt:html') - simplified code (thanks to '_fmt:html' request) - issue 1359: fixed xpath of video length data searx/engines/google_scholar.py - fetch_traits(): Fetch languages & regions from Google properties. - use *supported_domains* from traits - request(): include patents & citations - response(): fixed CAPTCHA detection (Scholar has its own CATCHA manager) - hardening XPath to iterate over results - fixed XPath of pub_type (has been change from gs_ct1 to gs_cgt2 class) - issue 1769 fixed: new request implementation is no longer incompatible Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] replace engines_languages.json by engines_traits.jsonMarkus Heiser2023-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Implementations of the *traits* of the engines. Engine's traits are fetched from the origin engine and stored in a JSON file in the *data folder*. Most often traits are languages and region codes and their mapping from SearXNG's representation to the representation in the origin search engine. To load traits from the persistence:: searx.enginelib.traits.EngineTraitsMap.from_data() For new traits new properties can be added to the class:: searx.enginelib.traits.EngineTraits .. hint:: Implementation is downward compatible to the deprecated *supported_languages method* from the vintage implementation. The vintage code is tagged as *deprecated* an can be removed when all engines has been ported to the *traits method*. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] make python code pylint 2.16.1 compliantMarkus Heiser2023-02-10
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* searx.network: add "verify" option to the networksAlexandre Flament2022-10-14
| | | | | | | | | Each network can define a verify option: * false to disable certificate verification * a path to existing certificate. SearXNG uses SSL_CERT_FILE and SSL_CERT_DIR when they are defined see https://www.python-httpx.org/environment_variables/#ssl_cert_file
* [fix] typos / reported by @kianmeng in searx PR-3366Markus Heiser2022-09-27
| | | | | | [PR-3366] https://github.com/searx/searx/pull/3366 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] add 'Accept-Language' HTTP header to online processoresMarkus Heiser2022-08-01
| | | | | | | | | Most engines that support languages (and regions) use the Accept-Language from the WEB browser to build a response that fits to the language (and region). - add new engine option: send_accept_language_header Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [doc] add description of method EngineProcessor.get_params()Markus Heiser2022-08-01
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* Merge pull request #1443 from return42/fix-online_dictionaryMarkus Heiser2022-07-07
|\ | | | | [fix] online_dictionary: regular expression
| * [fix] online_dictionary: regular expressionMarkus Heiser2022-07-07
| | | | | | | | | | | | | | The query term of a engine-type `online_dictionary` can consist of more than one word. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* | Better explanation for the use of use_mobile_uiÉmilien Devos2022-07-06
|/
* notify the user that use_mobile_ui parameter existEmilien Devos2022-06-11
|
* [fix] prepare for pylint 2.14.0Markus Heiser2022-06-03
| | | | | | | | | | | | | | | | | | | Remove issue reported by Pylint 2.14.0: - no-self-use: has been moved to optional extension [1] - The refactoring checker now also raises 'consider-using-generator' messages for max(), min() and sum(). [2] .pylintrc: - <option name>-hint has been removed since long, Pylint 2.14.0 raises an error on invalid options - bad-continuation and bad-whitespace have been removed [3] [1] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/summary.html#removed-checkers [2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/full.html#what-s-new-in-pylint-2-14-0 [2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.6/summary.html#summary-release-highlights Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [enh] implement a OnlineUrlSearchProcessorMarkus Heiser2022-01-30
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [typing] add type hints for dictionariesMartin Fischer2022-01-17
|
* [format.python] initial formatting of the python codeMarkus Heiser2021-12-27
| | | | | | | | | | This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [format.python] disable py code formatting for some hunks of codeMarkus Heiser2021-12-27
| | | | | | | Disable the python code formatting from python-black, where the readability of code suffers by formatting. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [pylint] fix global-variable-not-assigned issuesMarkus Heiser2021-09-17
| | | | | | | | | | | | | | | | | | | If there is no write access, there is no need for global. Remove global statement if there is no assignment. global-variable-not-assigned: Using global for names but no assignment is done Used when a variable is defined through the "global" statement but no assignment to this variable is done. In Pylint 2.11 the global-variable-not-assigned checker now catches global variables that are never reassigned in a local scope and catches (reassigned) functions [1][2] [1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html [2] https://github.com/PyCQA/pylint/issues/1375 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] searx.metrics & searx.search: use the engine loggersAlexandre Flament2021-09-10
| | | | metrics & processors use the engine logger
* [pylint] searx: drop no longer needed 'missing-function-docstring'Markus Heiser2021-09-07
| | | | | Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [pylint] prepare for pylint v2.9.3 / fix some (new) pylint issuesMarkus Heiser2021-07-03
| | | | | | | | | | | | | | Upgrade from pylint v2.8.3 to 2.9.3 raise some new issues:: searx/search/checker/__main__.py:37:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with) searx/search/checker/__main__.py:38:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with) searx/search/processors/__init__.py:20:0: R0402: Use 'from searx import engines' instead (consider-using-from-import) searx/preferences.py:182:19: C0207: Use data.split('-', maxsplit=1)[0] instead (use-maxsplit-arg) searx/preferences.py:506:15: R1733: Unnecessary dictionary index lookup, use 'user_setting' instead (unnecessary-dict-index-lookup) searx/webapp.py:436:0: C0206: Consider iterating with .items() (consider-using-dict-items) searx/webapp.py:950:4: C0206: Consider iterating with .items() (consider-using-dict-items) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [fix] typo: online_dictionnary --> online_dictionaryMarkus Heiser2021-06-04
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] multithreading only in searx.search.* packagesAlexandre Flament2021-05-05
| | | | | | | | | it prepares the new architecture change, everything about multithreading in moved in the searx.search.* packages previously the call to the "init" function of the engines was done in searx.engines: * the network was not set (request not sent using the defined proxy) * it requires to monkey patch the code to avoid HTTP requests during the tests
* [lint] pylint searx/search/processors files / BTW add some doc-stringsMarkus Heiser2021-04-27
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] processors: show identical error messages on /search and /statsAlexandre Flament2021-04-27
|
* [mod] metrics: add secondary parameterAlexandre Flament2021-04-21
| | | | | | | | Some error won't stop the engine: * additional HTTP redirects for example * some invalid results secondary=True allows to flag these errors as not important.
* [enh] rewrite and enhance metricsAlexandre Flament2021-04-21
|
* [mod] refactoring: processorsAlexandre Flament2021-04-21
| | | | | | | | | | Report to the user suspended engines. searx.search.processor.abstract: * manages suspend time (per network). * reports suspended time to the ResultContainer (method extend_container_if_suspended) * adds the results to the ResultContainer (method extend_container) * handles exceptions (method handle_exception)
* [httpx] replace searx.poolrequests by searx.networkAlexandre Flament2021-04-12
| | | | | | | | | | | | | | | | | | settings.yml: * outgoing.networks: * can contains network definition * propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections, keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries * retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time) * local_addresses can be "192.168.0.1/24" (it supports IPv6) * support_ipv4 & support_ipv6: both True by default see https://github.com/searx/searx/pull/1034 * each engine can define a "network" section: * either a full network description * either reference an existing network * all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
* [enh] replace requests by httpxAlexandre Flament2021-04-10
|
* [fix] checker: various bug fixesAlexandre Flament2021-03-25
| | | | | * initialize engine_data (youtube engine) * don't crash if an engine don't set result['url']
* [mod] by default allow only HTTPS, not HTTPAlexandre Flament2021-03-08
| | | | Related to https://github.com/searx/searx/pull/2373
* [mod] update currencies.json and fetch_currencies.pyAlexandre Flament2021-02-23
| | | | | | | | use a sparql request on wikidata to get the list of currencies. currencies.json contains the translation for all supported searx languages. Supersede #993
* [fix] duckduckgo engine: "!ddg !g" do not redirect to googleAlexandre Flament2021-02-12
| | | | | | | | | * searx understand "!ddg !g time" as : send "!g time" to DDG * !g a DDG bang for Google: DDG return a HTTP redirect to Google This commit adds a the allows_redirect param not to follow HTTP redirect. The DDG engine returns a empty result as before without HTTP redirect.
* Fix: activate raise_for_error by defaultAlexandre Flament2021-02-09
| | | | | | Fix commit d703119d3a313a406482b121ee94c6afee3bc307 : Some engines need to parse the HTTP error but raise_for_error is always set to False in the "request" function.
* [fix] checker: minor fix about language detectionAlexandre Flament2021-01-19
|
* [fix] checker: fix engine statisticsAlexandre Flament2021-01-18
| | | | Without this commit, the URL /stats/errors shows percentage above 100% after the checker has run.
* [mod] checker: minor adjustements on the default testsAlexandre Flament2021-01-12
| | | | | the query "time" is convinient because most of the search engine will return some results, but some engines in the general category will return documentation about the HTML tags <time> or <input type="time">
* [enh] add checkerAlexandre Flament2021-01-12
|
* [fix] fix of PR #2225Alexandre Flament2020-12-17
|
* [mod] split searx.search into different processorsAlexandre Flament2020-12-17
see searx.search.processors.abstract.EngineProcessor First the method searx call the get_params method. If the return value is not None, then the searx call the method search.