resolution = 1000 # Hz # 1 / resolution = 1ms 1 value in the list is 1ms
def pwm(freq, duty_cycle): # gen a list of 0 and 1 period = int(1 / freq * resolution) on = int(period * duty_cycle) off = int(period * (1 - duty_cycle)) print(on, off) while True: for i in range(on): yield 1 for i in range(off): yield 0
data = [] for i in range(30, 100, 10): gen = pwm(10, i / 100) data += [next(gen) for j in range(1000)] for i in range(100, 30, -10): gen = pwm(10, i / 100) data += [next(gen) for j in range(1000)] # plot xs = [x for x in range(len(data))] print(len(data)) plt.plot(xs, data) plt.show() plt.close()
回想当时,我那时候考虑过,是不是还是从语感入手,就是通常所说的磨耳朵。可是,这个耳朵磨到起多少茧才会有用?这是无法衡量的东西,你听懂了Hey,What’s up,How you doing,这不代表你能听懂考试听力。仔细对比听力文本和日常用语,也就能发现,它们各个方面都不一样。考试听力并不快,也不会讲俚语,所以特点是,说得不快,但信息量更大。非母语的我们,消化一句日常用语,不会很难,甚至听个50%就基本理解了,但消化一句听力的时间很长,会感觉听力特别快,感觉自己忙不过来。事实上,单看语速,每分钟讲了几个单词,它并不快。所以,那时候我选择硬嗑真题听力,因为我需要习惯,从听力里提取出有用信息拿来做题。要习惯从这样的语速中,获取一些不太直观的信息。不断地听同一段听力,慢速、正常速,真的很折磨我,主要是我是非常不喜欢重复的人。但所幸考试这个动力还是足的,我还是坚持了下来,最后的听力成绩超过我的预期,也算证明我的选择是对的吧?
In order for the consistent hash function to balanced, ch(k, 2) will have to stay at 0 for half the keys, k, while it will have to jump to 1 for the other half. In general, ch(k, n+1) has to stay the same as ch(k, n) for n/(n+1) of the keys, and jump to n for the other 1/(n+1) of the keys. ch(k,2),就是分一半流量给新增的bucket1。进一步的,ch(k, n+1)对比ch(k, n)则是,n/(n+1)的key完全不变化,1/(n+1)的key,它们的bucket变更成n。(必须n/(n+1)部分是完全不变化,bucket值都不可改变,不然就算不得一致性哈希了,就是随便乱来算法。)而既然是最后还是要key平均进入bucket,当然是得每个bucket里抽出一部分来,放进新bucket。
One simple implementation of a message queue is using Redis. There is a Python implementation of Redis Simple Message Queue (RSMQ) called PyRSMQ . RSMQ emulates Amazon’s SQS-like functionality, where there is a named queue that is backed by Redis. Once the queue is created, producers can place messages in the queue and consumers can retrieve them .
Here’s an example of how you can use PyRSMQ to implement a simple message queue:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
from rsmq import RedisSMQ
# Create a new instance of RedisSMQ queue = RedisSMQ(host="localhost", port=6379, qname="myqueue")
# Create a new message in the queue queue.sendMessage().message("Hello World").execute()
# Receive the next message from the queue msg = queue.receiveMessage().execute() print(msg['message'])
# Delete the message from the queue queue.deleteMessage(id=msg['id']).execute()
So a log, rather than a simple single-value register, is the more natural abstraction. Log可以表示a sequence of requests,也可以说是changes,它是一种很符合现实的抽象概念。程序基本都是在跟持续的请求打交道,而不是单一的请求。单一的,比较简单的例子就是数学计算,与别的无关,没有状态,只要给我同样的输入,我就会给同样的输出。还有cache,我就是只get/set,先到先得。但现实中,基本不是这么简单的情况。
拿Table和Log的关系来分析,Table是当前瞬间的状态。而Log是一个序列,包含了所有的变化,也可以说,它保存了所有的历史版本。历史版本,就让人很容易联想到source code version control,每个commit并不是记录当前的所有code,而是记录change。
介绍三个方向,分别如何使用log: Data Integration—Making all of an organization’s data easily available in all its storage and processing systems. Real-time data processing—Computing derived data streams. Distributed system design—How practical systems can by simplified with a log-centric design.
Data Integration 类比马斯洛的金字塔需求模型,data就是最底层的需求。(开始玄学,但确实也没说错。)许多现实系统中,其实data就没准备好,Data Integration做的差,就在此基础上搭分析、建模等模块,和搭房子一样地基不稳。 Data Integration为什么难,作者认为是两个趋势导致的,一是event data井喷,二是data system变得非常专用。第二个好理解,很多现实场景遇到的挑战,已经不是某个data system可以解决的了,所以出现了很多细分的领域和功能各有优劣的system,例如,olap,oltp,online,batch,streaming,graph等等。第一个event data的话,我理解不深,目前的理解是event data区别于entity data,它描述的是事件、行为,而不是某个东西的最终样子,例如某时刻的点击,某时刻的登录,而entity data例如用户名、id、地区,更静态。如此区分data,主要还是没有万能的银弹,区分后会更适合去处理它们。而单独划出来的event data,传统的DB很难撑住,也就诞生了更多品种。如今的data system确实是相当多,无法将它们视为一个整体,经常要去处理它们谁流到谁那儿去的问题。
而不同data system如何同步数据?都可以抽象为log-structured data flow。Data Source将数据变化写入log,其他data system作为subscriber,读log并自己追上数据变化。无论subscriber是哪种system,这个模型都是行得通的。就算subscriber是个cache,它实际并不存log,但它也可以通过log来同步数据。而且,log自带版本概念,可以很清楚的知道自己从这些subscriber中读取的是不是最新数据。还有一点是,log可以当作是buffer,不同subscriber消化数据的速度是不同的,也可能在某个时刻,subscriber down了,那么log还在,重启后也能追得上。从图上也可以看到,log是一个用来解耦的概念,subscriber只需要懂log,它不需要知道Data Source长什么样,这也是一个优点。
在作者公司linkedin中,实际的场景就是
通过Log,架构变为
非常简洁的架构,也可以看出Log抽象的价值。(其实这也就是kafka的简要架构图了。)
在这样的架构基础上,实际还有一些点需要考虑,就是ETL,数据并不一定是都有意义的,结构可能不是统一的,很多场景里ETL(cleanup or transformation)后的数据才值得使用或存储(存入数仓)。ETL后的数据还是一样要考虑到subscriber可能是batch的数仓,也可能是实时的什么系统。前面的架构图看不出具体的角色,所以我们应该用下面这样的详细一点的图:
Stream processing has nothing to do with SQL. Nor is it limited to real-time processing.
Yes,俺也一样,我也觉得real-time和stream并非绑定关系。
I see stream processing as something much broader: infrastructure for continuous data processing. I think the computational model can be as general as MapReduce or other distributed processing frameworks, but with the ability to produce low-latency results.
batch和stream的概念,他也举了个例子,US人口普查,人口统计是下到挨家挨户统计,记录在纸上,然后this batch of records传到中央,中央把全部加起来。它不是每个地区都有a journal of births and deaths,用它来算当前人口,甚至能算到某一年的人口。这不是说journal形式就很优秀,而是各有各的区别。有些数据就是batch进来的,你就应该batch处理,有些数据就是continuous feeds,那肯定就想要能够能平滑地处理连续数据,而不是转为batch。这是自然的应对方式,当然,如果分析下来,batch没什么不可以,那也可以batch处理。要平滑处理数据,还有就是,不可能数据1s来一条,你stream处理它花5s,来不及消化的数据能堆积到明年,所以stream通常是要求low latency的。
while True: while lmark<=rmark and [lmark]<=piv: lmark++ while lmark<=rmark and [rmark]>=piv: rmark-- if lmark > rmark: break exchange exhange rmark and left
cur = root stack = [] while ?: while cur: cur -> stack cur = cur.left # no more left child, cur == None now top = stack.pop top -> result cur = top.right # next loop will traversal the right child
Pasanen, Tomi (November 1996). Elementary Average Case Analysis of Floyd’s Algorithm to Construct Heaps (Technical report). Turku Centre for Computer Science. CiteSeerX 10.1.1.15.9526. ISBN 951-650-888-X. TUCS Technical Report No. 64. Note that this paper uses Floyd’s original terminology “siftup” for what is now called sifting down. https://en.wikipedia.org/wiki/Binary_heap#cite_ref-13
We could break out of the loop as soon as we find a pos where newitem <= both its children, but turns out that’s not a good idea, and despite that many books write the algorithm that way. During a heap pop, the last array element is sifted in, and that tends to be large, so that comparing it against values starting from the root usually doesn’t pay (= usually doesn’t get us out of the loop early). See Knuth, Volume 3, where this is explained and quantified in an exercise.
class Solution: def nthUglyNumber(self, n: int) -> int: a = [1] L = [0,0,0,0,0,0] # L[i] = pi, candidate is a[L[i]]*i, i = 2,3,5 while len(a) < n: if a[L[2]]*2 == a[-1]: L[2]+=1 if a[L[3]]*3 == a[-1]: L[3]+=1 if a[L[5]]*5 == a[-1]: L[5]+=1
while end < n: dic[s[end]]+=1 while len(dic)>k: dic[s[start]]-=1 if dic[s[start]]==0: del dic[s[start]] start+=1 longest = max(longest, end-start+1) end+=1
while end < n: dic[s[end]]+=1 if len(dic)>k: dic[s[start]]-=1 if dic[s[start]]==0: del dic[s[start]] start+=1 longest = max(longest, end-start+1) end+=1
这一个改动,它到底好在了哪里呢?光看经过,start,end两个游标都是单向前进的,2-while和while-if都不会使两个游标左右飘。但由于2-while会保证每次窗口都是合法的,和while-if相比,start这个游标可能会更靠右。举个极端例子,如果longest是[0, n-2]这个窗口,下一次扩展end就会到末尾n-1,while-if此时发现窗口不合法,收缩一次,就溜了。而2-while,会愣是要求[start, n-1]窗口合法,可能start从0不断右移,直到n-1才停止。无用操作在这个例子就占了一半。具体例子就是AAAAB, k =1。
这个优化并不容易读,我也不建议平时代码搞这么tricky。在这个题目里,仅仅在节省len查询和dict更新,都是较高效的操作,而且有次数上限(由于start标最多都走到末尾,操作次数最多n次)。优化效果仅仅锦上添花。但这个将while改为if,在其他题目中可能有奇效。因为while的判断条件如果复杂度较高,这里的收益就很大了,见Longest Substring with Same Letters after Replacement一题。
We have a string s of length n, which consist only UPPERCASE characters and we have a number k (always less than n and greater than 0). We can make at most k changes in our string such that we can get a sub-string of maximum length which have all same characters.
Example 1:
1 2 3
Input: s = "ABAB", k = 2 Output: 4 Explanation: Change 2 'B' into 'A'.
Example:
1 2 3
Input: s = "ABCD", k = 1 Output: 2 Explanation: Change one 'B' into 'A'.