2024-02-09

202402

Clickhouse 纯手工迁移表
https://www.cnblogs.com/hdpdriver/p/16088755.html

ClickHouse 建表create table时primary by与order by
https://blog.csdn.net/qq_36951116/article/details/106260189

ORDER BY的作用，负责分区内数据排序;
PRIMARY KEY的作用，负责一级索引生成;
Merge 的逻辑，分区内数据排序后，找到相邻的数据，做特殊处理。
- 只有在触发合并之后，才能触发特殊逻辑。以去重为例，在没有合并的时候，还是会出现重复数据。
- 只对同一分区内的数据有效。以去重为例，只有属于相同分区的数据才能去重，跨越不同分区的重复数据不能去重。
通常只有在使用 SummingMergeTree 或 AggregatingMergeTree 的时候，才需要同时设置ORDER BY与PRIMARY KEY。
- 显式的设置 PRIMARY KEY，是为了将主键和排序键设置成不同的值，是进一步优化的体现。
- 如果 ORDER BY 与 PRIMARY KEY 不同，PRIMARY KEY 必须是 ORDER BY 的前缀(为了保证分区内数据和主键的有序性)。

ClickHouse 查询优化详细介绍
https://mp.weixin.qq.com/s/38RMVbw25P3iuE4IIuxdog

https://clickhouse.com/docs/zh/engines/table-engines/mergetree-family/mergetree

日志分析下ES/ClickHouse/Loki比较与思考
https://mp.weixin.qq.com/s/n2I94X6tz2jOABzl1djxYg

方案A：Local Storage + Pssh扫描派（代表作：跳板机上各种脚本）
方案B：Central Storage + Inverted Index派（代表作：ES）
方案C：Central Storage + Column Storage + MR派（代表作：Hive）
方案D：Central Storage + Column Storage + MPP派（代表作：ClickHouse）
方案E：Central Storage + 扫描类（代表作：Grafana-Loki）

可观测性数据收集集大成者 Vector 介绍
https://blog.csdn.net/n9ecommunity/article/details/133810461

核心就是 pipeline 的处理，有 Source 端做采集，有中间的 Transform 环节做数据加工处理，有 Sink 端做数据转发。

超级快速可靠：Vector采用Rust构建，速度极快，内存效率高，旨在处理最苛刻的工作负载
端到端：Vector 致力于成为从 A 到 B 获取可观测性数据所需的唯一工具，并作为守护程序、边车或聚合器进行部署
统一：Vector 支持日志和指标，使您可以轻松收集和处理所有可观测性数据
供应商中立：Vector 不偏向任何特定的供应商平台，并以您的最佳利益为出发点，培育公平、开放的生态系统。免锁定且面向未来
可编程转换：Vector 的高度可配置转换为您提供可编程运行时的全部功能。无限制地处理复杂的用例

快手、携程等公司转战到 ClickHouse，ES 难道不行了？
https://mp.weixin.qq.com/s/hP0ocT-cBCeIl9n1wL_HBg

https://zhuanlan.zhihu.com/p/547100507
ClickHouse现在是云原生的，支持分层存储。如果你关注它，会看到它的一条演进轨迹。最开始是单机的，单机即可实现很多高性能查询；然后演进到分布式，利用了比如复制表、分布式表，巧妙地变成了一个分布式架构；再往后，大家在讲云原生，也是可以分层存储、存算分离，有一些存储可以放到S3上，也可以放到HDFS上面去；后来也支持了OSS，目前也是通过原生的分层存储方式向云原生再迈进了一步。在此之前，虽然ClickHouse支持把一些冷数据，或者是部分的数据放到像S3这样的对象存储上面去，但是它的实现比较粗暴。

mysql 加密

mysql> select hex('给我狗子');
+--------------------------+
| hex('给我狗子')          |
+--------------------------+
| E7BB99E68891E78B97E5AD90 |
+--------------------------+
1 row in set (0.00 sec)

mysql> select unhex(hex('给我狗子'));
+----------------------------+
| unhex(hex('给我狗子'))     |
+----------------------------+
| 给我狗子                   |
+----------------------------+
1 row in set (0.00 sec)


mysql> select hex(AES_ENCRYPT('13884331246','abc123')) encrypt_text;
+----------------------------------+
| encrypt_text                     |
+----------------------------------+
| 61F56849292176EAD44B26FCDB52C791 |
+----------------------------------+
1 row in set (0.00 sec)

mysql> select AES_DECRYPT(unhex('7EDEB1877EE7AD6BD30BE668ECF924A4'),'password') plain_text;
+-------------+
| plain_text  |
+-------------+
| 13884331246 |
+-------------+
1 row in set (0.00 sec)

ClickHouse自定义函数实例教程
https://blog.csdn.net/neweastsun/article/details/130235194

ck 随机取 30% 的数据

where rand32()<pow(2,32)*0.3

ClickHouse性能调优之排序和数据类型
https://www.yii666.com/blog/499218.html

通过排序键可以让内存使用大幅减少，尤其是select查询中按排序键排序。
对于已存在的表，排序表达式仅可以使用新增列
正确使用排序键可以提升压缩因子20多倍，重复值相较于随机值更有利于压缩。
在处理大型表并寻找最佳性能查询时，需要仔细选择数据类型。
- 不要把整形设置为float型
- 对数值设置合适的精度，精度越低越好
- 对于文本类型尽可能使用LowCardinality(String) 或FixedString

clickhouse里的数组数据类型与相关使用介绍
https://blog.csdn.net/u010882234/article/details/130464938

从 ClickHouse 到 Apache Doris，腾讯音乐内容库数据平台架构演进实践
https://www.infoq.cn/article/nybtjqs07zcrqqnc0xwt

浅谈ClickHouse聚合和窗口函数
https://blog.csdn.net/weixin_59801183/article/details/134186433

Uber 如何使用MySQL + Redis提供4000 万/秒的读取请求
https://mp.weixin.qq.com/s/cneQcz_uEwChMFuWtoVolw

ClickHouse 到底有多神？ - leiysky的回答 - 知乎
https://www.zhihu.com/question/505958148/answer/3341039818

mysql 提高写入性能，写入完毕后要注释掉

skip-log-bin
innodb_doublewrite = 0
innodb_log_buffer_size = 32M

SPL 实践：单节点实现每日百亿时序数据实时写入和秒级统计
https://c.raqsoft.com.cn/article/1705410891469

doris-10亿数据和100万表join高并发测试
https://www.cnblogs.com/lilei2blog/p/15524029.html

How to merge large tables in ClickHouse using join
https://datachild.net/data/clickhouse-join-large-tables-on-column

Optimizing ClickHouse Joins for Performance: A Deep Dive into Nested-Loop and Merge-Scan Joins with Practical Examples
https://chistadata.com/optimizing-clickhouse-joins-for-performance-a-deep-dive-into-nested-loop-and-merge-scan-joins-with-practical-examples/

MergeJoin是一种基于排序的连接算法，它要求参与连接的表在连接字段上进行排序。Merge Join 的原理如下:

对参与连接的表按照连接字段进行排序，确保两个表的连接字段是有序的。
使用两个指针分别指向两个表的第一个记录。
比较两个指针所指向的记录的连接字段的值，如果相等，则将这两条记录合并为一条，并输出。
如果两个指针所指向的记录的连接字段的值不相等，则将连接字段较小的记录的指针向后移动一位，然后继续比较。
重复步骤 3 和步骤 4，直到其中一个表的记录全部被处理完

Merge Join 的优势在于它只需要对参与连接的表进行一次排序，并且可以并行处理多个连接操作，从而提高查询的效率

数据排序为了使用 Merge Join，参与连接的表必须在连接字段上进行排序。如果表没有按照连接字段排序，可以使用ClickHouse 提供的ORDER BY 语句对表进行排序。
数据类型 Merge Join 要求连接字段的数据类型必须相同，否则无法进行连接。在进行连接操作之前，需要确保连接字段的数据类型一致。
内存限制 Merge Join 使用了一定的内存来存储连接字段的值，如果连接字段的值较大或者连接的表的数据量很大，可能会导致内存不足。在使用 Merge Join 时，需要根据实际情况调整 ClickHouse 的内存配置，确保有足够的内存来执行连接操作。
多表连接 Merge Join 可以连接两个或多个表。当连接多个表时，需要保证每个表的连接字段都进行了排序，并且连接字段的数据类型相同。

原文链接：https://blog.csdn.net/weixin_42754420/article/details/132794340

ClickHouse Joins Under the Hood - Full Sorting Merge Join, Partial Merge Join - MergingSortedTransform
https://clickhouse.com/blog/clickhouse-fully-supports-joins-full-sort-partial-merge-part3

查看 raid 信息

$ sudo mdadm --detail /dev/md5
/dev/md5:
           Version : 1.2
     Creation Time : Thu Dec 22 23:42:37 2022
        Raid Level : raid10
        Array Size : 7394613248 (6.89 TiB 7.57 TB)
     Used Dev Size : 3697306624 (3.44 TiB 3.79 TB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

Redis数据结构：Stream类型全面解析
https://blog.csdn.net/weixin_45187434/article/details/132593271

group_concat 和 over partition

mysql> select * from a;
+------+-------+
| a_id | a_age |
+------+-------+
|    1 |     2 |
|    2 |     3 |
|    3 |     4 |
+------+-------+
3 rows in set (0.00 sec)

mysql> select * from b;
+------+-------+
| b_id | b_age |
+------+-------+
|    1 |     5 |
|    1 |     6 |
|    3 |     6 |
+------+-------+
3 rows in set (0.00 sec)

mysql> select a_id,a_age,group_concat(b_age,',') from a left join b on a_id = b_id group by a_id,a_age;
+------+-------+-------------------------+
| a_id | a_age | group_concat(b_age,',') |
+------+-------+-------------------------+
|    1 |     2 | 6,,5,                   |
|    2 |     3 | NULL                    |
|    3 |     4 | 6,                      |
+------+-------+-------------------------+
3 rows in set (0.04 sec)

mysql> select b_age,a.*, count(*) over(partition by b_age) match_count,row_number() over(partition by b_age) match_number from b left join a on b_id = a_id;
+-------+------+-------+-------------+--------------+
| b_age | a_id | a_age | match_count | match_number |
+-------+------+-------+-------------+--------------+
|     5 |    1 |     2 |           1 |            1 |
|     6 |    1 |     2 |           2 |            1 |
|     6 |    3 |     4 |           2 |            2 |
+-------+------+-------+-------------+--------------+
3 rows in set (0.00 sec)

Appending to a File from Multiple Processes
https://nullprogram.com/blog/2016/08/03/

MONGODB 内存使用分析与判断内存是否缺少
https://cloud.tencent.com/developer/article/2047328

ilickhouse has everything to be one of the most used databases in the future. But today it is not yet possible to be used, only for data already processed and only use simple select and simple queries.

It really is an incredibly fast database, but it lacks features much used in other banks, such as subselect. With subselect we can use almost everything without relying on its own functions (UDF). Without both functions its use is complicated.

I will try to work Clickhouse together as MonetDB and post here if the result was good.

csv文件中每隔 100 行取一行,保留第一行的 headers

awk '(NR%100==0 || NR==1){print $0}' result.csv

How to Do IP Address Geolocation Lookups on Linux
https://www.maketecheasier.com/ip-address-geolocation-lookups-linux/

sudo apt-get install geoip-bin
geoiplookup 8.8.4.4
https://dev.maxmind.com/geoip/geolite2-free-geolocation-data
sudo apt install mmdb-bin

企业运维实践-Nginx使用geoip2模块并利用MaxMind的GeoIP2数据库实现处理不同国家或城市的访问最佳实践指南
https://www.zhihu.com/tardis/bd/art/547045377

JAVASCRIPT & CSS WORLD MAP
https://www.cssscript.com/tag/world-map/

https://www.cssscript.com/interactive-vector-map/

只需提供一个视频主题或关键词，就可以全自动生成视频文案、视频素材、视频字幕、视频背景音乐，然后合成一个高清的短视频
https://github.com/harry0703/MoneyPrinterTurbo

nginx-geoip2

nginx 集成 IP 地理位置

$ nginx -v
nginx version: nginx/1.24.0
$ cd ~/download/
$ wget http://nginx.org/download/nginx-1.24.0.tar.gz
$ tar xf nginx-1.24.0.tar.gz
$ wget -c https://github.com/leev/ngx_http_geoip2_module/archive/refs/tags/3.4.tar.gz -O ngx_http_geoip2_module-3.4.tar.gz
$ tar xf ngx_http_geoip2_module-3.4.tar.gz
$ cd nginx-1.24.0/
$ which nginx
/usr/sbin/nginx
$ nginx -V
nginx version: nginx/1.24.0
built by gcc 9.3.0 (Ubuntu 9.3.0-10ubuntu2)
built with OpenSSL 1.1.1f  31 Mar 2020
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.24.0/debian/debuild-base/nginx-1.24.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'
$ ./configure --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.24.0/debian/debuild-base/nginx-1.24.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie' --add-dynamic-module=../ngx_http_geoip2_module-3.4
$ ls objs/*.so
objs/ngx_http_geoip2_module.so  objs/ngx_stream_geoip2_module.so
$ sudo cp -a objs/*.so /etc/nginx/modules
$ ls /etc/nginx/modules
ngx_http_geoip2_module.so  ngx_http_js_module-debug.so  ngx_http_js_module.so  ngx_stream_geoip2_module.so  ngx_stream_js_module-debug.so  ngx_stream_js_module.so
$ sudo nginx -s stop
$ sudo cp -a objs/nginx /usr/sbin/nginx
$ sudo nginx
$ sudo vi /etc/nginx/nginx.conf

    load_module modules/ngx_http_geoip2_module.so;
    http {
        log_format  main  '$remote_addr $geoip2_data_country_code [$time_local] $request_time $request '
            '$status $host $http_accept_encoding $body_bytes_sent "$http_referer" '
            '"$http_user_agent" "$http_x_forwarded_for" ';

        geoip2 /usr/local/GeoIP2/GeoLite2-City.mmdb {
            $geoip2_data_country_name country names en;
            $geoip2_data_country_code default=CN source=$remote_addr country iso_code;
        }

        map $geoip2_data_country_code $lang_ch {
            CN yes;
            TW yes;
            HK yes;
            MO yes;
            default no;
        }

        access_log  /var/log/nginx/access.log main;

        server {
            location /geotest {
                default_type text/html;
                if ($lang_ch = no) {
                    return 403 "Access denied!IP [ $remote_addr ] $geoip2_data_country_code";
                }
                return 200 "Welcome to you! IP [ $remote_addr ] $geoip2_data_country_code";
            }
        }
    }

$ tail -f /var/log/nginx/access.log