summaryrefslogtreecommitdiff
path: root/searx/search
Commit message (Collapse)AuthorAge
* [fix] SyntaxWarning: invalid escape sequence '\>'Markus Heiser2024-01-15
| | | | | | | | | This patch fixes issue reported by ``make test.unit``:: searx/search/checker/impl.py:39: SyntaxWarning: invalid escape sequence '\>' rep = ['<' + tag + '[^\>]*>' for tag in HTML_TAGS] Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* Searx to SearXNG change error messageÉmilien (perso)2023-12-31
|
* [mod] add option max_pageMarkus Heiser2023-12-03
| | | | | | | Related: https://github.com/searxng/searxng/issues/2982 Closes: https://github.com/searxng/searxng/issues/2972 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [feat] implement feeling lucky featureBnyro2023-09-19
|
* [fix] spellingjazzzooo2023-09-18
|
* [fix] make flask_babel.gettext() work in engine modules (L10n & threads)Markus Heiser2023-08-09
| | | | | | | | | | | | | | | | | | | | | | | | incident: flask_babel.gettext() does not work in the engine modules. cause: the request() and response() functions of the engine modules run in the processor, whose search() method runs in a thread and in the threads the context of the Flask app does not exist. The context of the Flask app is needed by the gettext() function for the L10n. Solution: copy context of the Flask app into the threads. [1] special case: We cannot equip the search() method of the processors with the decorator [1], because the decorator requires a context (Flask app) that does not yet exist at the time of the initialization of the processors (the initialization of the processors is part of the initialization of the Flask app). [1] https://flask.palletsprojects.com/en/2.3.x/api/#flask.copy_current_request_context Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* use logger.warningpankaj2023-05-19
| | | | | logger.warn() is depricated. logger.warning is already being used in some files.
* [mod] move language recognition to get_search_query_from_webappMarkus Heiser2023-04-15
| | | | | | | | | | | | | | | | To set the language from language recognition and hold the value selected by the client, the previous implementation creates a copy of the SearchQuery object and manipulates the SearchQuery object by calling function replace_auto_language(). This patch tries to implement a similar functionality in a more central place, in function get_search_query_from_webapp() when the SearchQuery object is build up. Additional this patch uses the language preferred by the client, if language recognition does not have a match / the existing implementation does not care about client preferences and uses 'all' in case of no match. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [fix] searxng_extra/update/update_engine_descriptions.py (part 1)Markus Heiser2023-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow up of #2269 The script to update the descriptions of the engines does no longer work since PR #2269 has been merged. searx/engines/wikipedia.py ========================== 1. There was a misusage of zh-classical.wikipedia.org: - `zh-classical` is dedicate to classical Chinese [1] which is not traditional Chinese [2]. - zh.wikipedia.org has LanguageConverter enabled [3] and is going to dynamically show simplified or traditional Chinese according to the HTTP Accept-Language header. 2. The update_engine_descriptions.py needs a list of all wikipedias. The implementation from #2269 included only a reduced list: - https://meta.wikimedia.org/wiki/Wikipedia_article_depth - https://meta.wikimedia.org/wiki/List_of_Wikipedias searxng_extra/update/update_engine_descriptions.py ================================================== Before PR #2269 there was a match_language() function that did an approximation using various methods. With PR #2269 there are only the types in the data model of the languages, which can be recognized by babel. The approximation methods, which are needed (only here) in the determination of the descriptions, must be replaced by other methods. [1] https://en.wikipedia.org/wiki/Classical_Chinese [2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters [3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter Closes: https://github.com/searxng/searxng/issues/2330 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] remove obsolete EngineTraits.supported_languagesMarkus Heiser2023-03-24
| | | | | | | | All engines has been migrated from ``supported_languages`` to the ``fetch_traits`` concept. There is no longer a need for the obsolete code that implements the ``supported_languages`` concept. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] Google: reversed engineered & upgrade to data_type: traits_v1Markus Heiser2023-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Partial reverse engineering of the Google engines including a improved language and region handling based on the engine.traits_v1 data. When ever possible the implementations of the Google engines try to make use of the async REST APIs. The get_lang_info() has been generalized to a get_google_info() function / especially the region handling has been improved by adding the cr parameter. searx/data/engine_traits.json Add data type "traits_v1" generated by the fetch_traits() functions from: - Google (WEB), - Google images, - Google news, - Google scholar and - Google videos and remove data from obsolete data type "supported_languages". A traits.custom type that maps region codes to *supported_domains* is fetched from https://www.google.com/supported_domains searx/autocomplete.py: Reversed engineered autocomplete from Google WEB. Supports Google's languages and subdomains. The old API suggestqueries.google.com/complete has been replaced by the async REST API: https://{subdomain}/complete/search?{args} searx/engines/google.py Reverse engineering and extensive testing .. - fetch_traits(): Fetch languages & regions from Google properties. - always use the async REST API (formally known as 'use_mobile_ui') - use *supported_domains* from traits - improved the result list by fetching './/div[@data-content-feature]' and parsing the type of the various *content features* --> thumbnails are added searx/engines/google_images.py Reverse engineering and extensive testing .. - fetch_traits(): Fetch languages & regions from Google properties. - use *supported_domains* from traits - if exists, freshness_date is added to the result - issue 1864: result list has been improved a lot (due to the new cr parameter) searx/engines/google_news.py Reverse engineering and extensive testing .. - fetch_traits(): Fetch languages & regions from Google properties. *supported_domains* is not needed but a ceid list has been added. - different region handling compared to Google WEB - fixed for various languages & regions (due to the new ceid parameter) / avoid CONSENT page - Google News do no longer support time range - result list has been fixed: XPath of pub_date and pub_origin searx/engines/google_videos.py - fetch_traits(): Fetch languages & regions from Google properties. - use *supported_domains* from traits - add paging support - implement a async request ('asearch': 'arc' & 'async': 'use_ac:true,_fmt:html') - simplified code (thanks to '_fmt:html' request) - issue 1359: fixed xpath of video length data searx/engines/google_scholar.py - fetch_traits(): Fetch languages & regions from Google properties. - use *supported_domains* from traits - request(): include patents & citations - response(): fixed CAPTCHA detection (Scholar has its own CATCHA manager) - hardening XPath to iterate over results - fixed XPath of pub_type (has been change from gs_ct1 to gs_cgt2 class) - issue 1769 fixed: new request implementation is no longer incompatible Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] replace engines_languages.json by engines_traits.jsonMarkus Heiser2023-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Implementations of the *traits* of the engines. Engine's traits are fetched from the origin engine and stored in a JSON file in the *data folder*. Most often traits are languages and region codes and their mapping from SearXNG's representation to the representation in the origin search engine. To load traits from the persistence:: searx.enginelib.traits.EngineTraitsMap.from_data() For new traits new properties can be added to the class:: searx.enginelib.traits.EngineTraits .. hint:: Implementation is downward compatible to the deprecated *supported_languages method* from the vintage implementation. The vintage code is tagged as *deprecated* an can be removed when all engines has been ported to the *traits method*. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [fix] fix threshold in replace_auto_languageMarkus Heiser2023-03-05
| | | | | | | [1] https://github.com/searxng/searxng/pull/2027#pullrequestreview-1322157677 [2] https://github.com/searxng/searxng/pull/1969#issuecomment-1345354529 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* Add "Auto-detected" as a language.Alexandre Flament2023-02-17
| | | | | | | | | | | When the user choose "Auto-detected", the choice remains on the following queries. The detected language is displayed. For example "Auto-detected (en)": * the next query language is going to be auto detected * for the current query, the detected language is English. This replace the autodetect_search_language plugin.
* [mod] make python code pylint 2.16.1 compliantMarkus Heiser2023-02-10
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* Merge branch 'master' into fasttextArtikusHG2022-12-16
|\
| * move searx.shared.redisdb to searx.redisdbAlexandre Flament2022-12-10
| |
* | Replace langdetect with fasttextArtikusHG2022-12-16
|/
* Initialize Redis in searx/webapp.pyAlexandre FLAMENT2022-11-05
| | | | | | | | | | | | | | settings.yml: * The default URL was unix:///usr/local/searxng-redis/run/redis.sock?db=0 * The default URL is now "false" The default URL makes the log difficult to deal with: if the admin didn't install a Redis instance, the logs record a false error. It worked before because SearXNG initialized the Redis connection when the limiter started. In this commit, SearXNG initializes Redis in searx/webapp.py so various components can use Redis without taking care of the initialization step.
* The checker requires RedisAlexandre Flament2022-11-05
| | | | | Remove the abstraction in searx.shared.SharedDict. Implement a basic and dedicated scheduler for the checker using a Redis script.
* searx.network: add "verify" option to the networksAlexandre Flament2022-10-14
| | | | | | | | | Each network can define a verify option: * false to disable certificate verification * a path to existing certificate. SearXNG uses SSL_CERT_FILE and SSL_CERT_DIR when they are defined see https://www.python-httpx.org/environment_variables/#ssl_cert_file
* [fix] typos / reported by @kianmeng in searx PR-3366Markus Heiser2022-09-27
| | | | | | [PR-3366] https://github.com/searx/searx/pull/3366 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] add 'Accept-Language' HTTP header to online processoresMarkus Heiser2022-08-01
| | | | | | | | | Most engines that support languages (and regions) use the Accept-Language from the WEB browser to build a response that fits to the language (and region). - add new engine option: send_accept_language_header Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [doc] add description of method EngineProcessor.get_params()Markus Heiser2022-08-01
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [fix] pyright repported errorsAlexandre Flament2022-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The errors make pyright usage useless since a new error won't be seen [1]. [1] https://github.com/searxng/searxng/pull/1569 ``` searx/compat.py:11:27 - error: Expression of type "Type[cached_property[_T@cached_property]]" cannot be assigned to declared type "Type[cached_property]" "Type[cached_property[_T@cached_property]]" is incompatible with "Type[cached_property]" Type "Type[cached_property[_T@cached_property]]" cannot be assigned to type "Type[cached_property]" (reportGeneralTypeIssues) searx/utils.py:69:36 - error: Expression of type "None" cannot be assigned to parameter of type "str" Type "None" cannot be assigned to type "str" (reportGeneralTypeIssues) searx/utils.py:573:85 - error: Expression of type "None" cannot be assigned to parameter of type "int" Type "None" cannot be assigned to type "int" (reportGeneralTypeIssues) searx/webapp.py:1306:22 - error: Argument of type "str" cannot be assigned to parameter "__a" of type "BytesPath" in function "join" Type "str" cannot be assigned to type "BytesPath" "str" is incompatible with "bytes" "str" is incompatible with protocol "PathLike[bytes]" "__fspath__" is not present (reportGeneralTypeIssues) searx/webapp.py:1306:68 - error: Argument of type "Literal['themes']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join" Type "Literal['themes']" cannot be assigned to type "BytesPath" "Literal['themes']" is incompatible with "bytes" "Literal['themes']" is incompatible with protocol "PathLike[bytes]" "__fspath__" is not present (reportGeneralTypeIssues) searx/webapp.py:1306:78 - error: Argument of type "str | Any | None" cannot be assigned to parameter "paths" of type "BytesPath" in function "join" Type "str | Any | None" cannot be assigned to type "BytesPath" Type "str" cannot be assigned to type "BytesPath" "str" is incompatible with "bytes" "str" is incompatible with protocol "PathLike[bytes]" "__fspath__" is not present (reportGeneralTypeIssues) searx/webapp.py:1306:85 - error: Argument of type "Literal['img']" cannot be assigned to parameter "paths" of type "BytesPath" in function "join" Type "Literal['img']" cannot be assigned to type "BytesPath" "Literal['img']" is incompatible with "bytes" "Literal['img']" is incompatible with protocol "PathLike[bytes]" "__fspath__" is not present (reportGeneralTypeIssues) searx/engines/mongodb.py:8:6 - warning: Import "pymongo" could not be resolved (reportMissingImports) searx/engines/mysql_server.py:9:8 - warning: Import "mysql.connector" could not be resolved (reportMissingImports) searx/engines/postgresql.py:9:8 - warning: Import "psycopg2" could not be resolved from source (reportMissingModuleSource) searx/engines/xpath.py:187:28 - warning: "categories" is not defined (reportUndefinedVariable) searx/search/__init__.py:184:82 - warning: "flask" is not defined (reportUndefinedVariable) searx/search/checker/background.py:19:26 - error: Type of "schedule" is partially unknown Type of "schedule" is "(delay: Any, func: Any, *args: Any) -> Literal[True]" (reportUnknownVariableType) searx/shared/__init__.py:8:12 - warning: Import "uwsgi" could not be resolved (reportMissingImports) searx/shared/shared_uwsgi.py:5:8 - warning: Import "uwsgi" could not be resolved (reportMissingImports) ```
* Merge pull request #1443 from return42/fix-online_dictionaryMarkus Heiser2022-07-07
|\ | | | | [fix] online_dictionary: regular expression
| * [fix] online_dictionary: regular expressionMarkus Heiser2022-07-07
| | | | | | | | | | | | | | The query term of a engine-type `online_dictionary` can consist of more than one word. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* | Better explanation for the use of use_mobile_uiÉmilien Devos2022-07-06
|/
* notify the user that use_mobile_ui parameter existEmilien Devos2022-06-11
|
* [fix] prepare for pylint 2.14.0Markus Heiser2022-06-03
| | | | | | | | | | | | | | | | | | | Remove issue reported by Pylint 2.14.0: - no-self-use: has been moved to optional extension [1] - The refactoring checker now also raises 'consider-using-generator' messages for max(), min() and sum(). [2] .pylintrc: - <option name>-hint has been removed since long, Pylint 2.14.0 raises an error on invalid options - bad-continuation and bad-whitespace have been removed [3] [1] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/summary.html#removed-checkers [2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.14/full.html#what-s-new-in-pylint-2-14-0 [2] https://pylint.pycqa.org/en/latest/whatsnew/2/2.6/summary.html#summary-release-highlights Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [enh] implement a OnlineUrlSearchProcessorMarkus Heiser2022-01-30
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [pyright:strict] searx.search.checker.backgroundMartin Fischer2022-01-27
|
* [fix] checker: fix image fetchAlexandre Flament2022-01-22
| | | | | | | | Since https://github.com/searxng/searxng/pull/354 the searx.network.stream(...) returns a tuple This commits update the checker code according to this function signature change.
* [typing] add type hints for dictionariesMartin Fischer2022-01-17
|
* [enh] settings.yml: implement general.enable_metricsAlexandre Flament2022-01-05
| | | | | | | * allow not to record metrics (response time, etc...) * this commit doesn't change the UI. If the metrics are disabled /stats and /stats/errors will return empty response. in /preferences, the columns response time and reliability will be empty.
* [format.python] initial formatting of the python codeMarkus Heiser2021-12-27
| | | | | | | | | | This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [format.python] disable py code formatting for some hunks of codeMarkus Heiser2021-12-27
| | | | | | | Disable the python code formatting from python-black, where the readability of code suffers by formatting. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [enh] verify that Tor proxy works every time searx startsAlexandre Flament2021-10-12
| | | | based on @MarcAbonce commit on searx
* [fix] searx.network.stream: fix memory leakAlexandre Flament2021-09-28
|
* [fix] checker: fix memory usageAlexandre Flament2021-09-28
| | | | | | | * download images using the "image_proxy" network (HTTP/1 instead of HTTP/2) * don't cache data: URL (reduce memory usage) * after each test: purge image URL cache then call garbage collector * download only the first 64kb of images
* [pylint] fix global-variable-not-assigned issuesMarkus Heiser2021-09-17
| | | | | | | | | | | | | | | | | | | If there is no write access, there is no need for global. Remove global statement if there is no assignment. global-variable-not-assigned: Using global for names but no assignment is done Used when a variable is defined through the "global" statement but no assignment to this variable is done. In Pylint 2.11 the global-variable-not-assigned checker now catches global variables that are never reassigned in a local scope and catches (reassigned) functions [1][2] [1] https://pylint.pycqa.org/en/latest/whatsnew/2.11.html [2] https://github.com/PyCQA/pylint/issues/1375 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] searx.metrics & searx.search: use the engine loggersAlexandre Flament2021-09-10
| | | | metrics & processors use the engine logger
* [doc] update docs/dev/plugins.rstAlexandre Flament2021-09-10
|
* [mod] plugin: call on_result after each engine from the ResultContainerAlexandre Flament2021-09-09
| | | | | | | | | | | | | Currently, searx.search.Search calls on_result once the engine results have been merged (ResultContainer.order_results). on_result plugins can rewrite the results: once the URL(s) are modified, even they can be merged, it won't be the case since ResultContainer.order_results has already be called. This commit call on_result inside for each result of each engines. In addition the on_result function can return False to remove the result. Note: the on_result function now run on the engine thread instead of the Flask thread.
* [pylint] searx: drop no longer needed 'missing-function-docstring'Markus Heiser2021-09-07
| | | | | Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [fix] searx.search.checker.get_result() always return a dictAlexandre Flament2021-08-16
| | | | | So checker_results['status'] == 'ok' is enough to check the checker result. See searx/webapp.py, /preferences endpoint
* [pylint] prepare for pylint v2.9.3 / fix some (new) pylint issuesMarkus Heiser2021-07-03
| | | | | | | | | | | | | | Upgrade from pylint v2.8.3 to 2.9.3 raise some new issues:: searx/search/checker/__main__.py:37:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with) searx/search/checker/__main__.py:38:26: R1732: Consider using 'with' for resource-allocating operations (consider-using-with) searx/search/processors/__init__.py:20:0: R0402: Use 'from searx import engines' instead (consider-using-from-import) searx/preferences.py:182:19: C0207: Use data.split('-', maxsplit=1)[0] instead (use-maxsplit-arg) searx/preferences.py:506:15: R1733: Unnecessary dictionary index lookup, use 'user_setting' instead (unnecessary-dict-index-lookup) searx/webapp.py:436:0: C0206: Consider iterating with .items() (consider-using-dict-items) searx/webapp.py:950:4: C0206: Consider iterating with .items() (consider-using-dict-items) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [fix] typo: online_dictionnary --> online_dictionaryMarkus Heiser2021-06-04
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [mod] settings_default: remove searx.search.max_request_timeout global variableAlexandre Flament2021-06-01
|
* [pylint] searx/search/__init__.py & replace lic-text by SPDX tagMarkus Heiser2021-05-21
| | | | Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>