The Lattice iCE40 FPGA saga

January 2nd, 2019
asciilifeform: in other noose, asciilifeform built 'icestorm' , 'arachne-pnr', with plain gcc 4.9 ( the only concession to idiocy on the test machine was python3 ) . and even MOST of 'yosys' ( the last step in the ice40 open sores fpga toolchain ) built. in fact, whole thing built, but linker barfs
asciilifeform: tcl , as i understand, caught the plague many years ago. this must be the 7th or 8th popular proggy that none of asciilifeform's boxes will build, purely on account of tcl rot.
asciilifeform: 'yosys' is that chain's verilog synth; the whole shebang is quite dead in the water without it.
asciilifeform: i did finally get ^ going . had to edit makefile and ENABLE_TCL := 0
asciilifeform: so can haz ice40!!!! [...] synthesis appears to work.
asciilifeform: ( not even substantially slower than xilinx chain. though i have currently nfi re output quality in re path delays )
asciilifeform: at some point will have to see re depython3ization.
asciilifeform: examples/icestick/example.v BUILDS and programs and runs... blinkenlichten blink in the specified order.
asciilifeform: ( not a high bar, but probably enough to say now that next FUCKGOATS will NOT feature a xilinx no moar )
asciilifeform: 'icestick' is a notbad student devboard btw [...] thumbdrive-shaped, no cables needed [...] (carries own jtag-programmator along)
asciilifeform: it isn't the only compat. devboard , by any means. but probably the smallest and most self-contained [...] has ftdi thing for usb i/o as well as programming [...] and buncha lights, and a pin header.
asciilifeform: i since tested the j1forth thing on same board. builds, runs.
mircea_popescu: nice.
asciilifeform: 'the lady goes'
asciilifeform: this is a 20bux board that gives fullspeed usb , lamps, ir i/o (why -- idk ) and various else [...] 1k LUTs.
asciilifeform: upstack : in case it wasn't obvious : it is possible to make ddr2 sdram controller , nic, etc.
asciilifeform: ice40 series tops out at 250Mhz iirc, so it'd be a modest thing. but working.
asciilifeform: ( and that's the global clock net max, rather than 'hey i can make a pentium pro'. LUT delay is 10ns+ iirc )
asciilifeform: you ~can~ make a i386. or a m68k. mips. 'ivory'... etc
mircea_popescu: right, you can make a lisp machine.
asciilifeform: aha [...] 1 that runs on 'aa' battery [...] ( lattice co.'s whole niche is micropower fpga )
spyked: phf, asciilifeform: re "phf: fwiw, if the goal is to put an existing lisp machine onto an fpga, then i don't think macivory is a particularly good target. the goal would be to run Genera, which is severely lacking sources for critical components." , "asciilifeform: failing this, could start with cadr and slowly backport bolix envir." <-- thanks for all the refs. my initial plan was to start from whatever SECD papers I could find and better understand architecture specifics.
spyked: also, is there any worth in trying to "physicalize" the virtual lisp machine stuff? genera runs on that from what I read.
asciilifeform: ^ to get the fuck off unix and pc. permanently [...] rather than merely paper over them [...] a sane comp is a box with NO unix in it, anywhere.
spyked: ok, so to sum up; 1. get ice40 fpga; 2. run fpga lisp machine (cadr?); work from that towards symbolics/ivory, or the other way around starting from symbolics.
asciilifeform: spyked: we have loooooong way to go - try and make a ddr controller first
asciilifeform: the other aspect, vlm is a massive bucket of c liquishit, and not at all a compact description of the arch
asciilifeform: to do this, will need to make world's first ice40+ddr board -- none exist [...] and no , NOT a matter of 'take opencores hdl and port'
asciilifeform: you gotta minimize the delay and specify gates MANUALLY for the specific fpga [...] after this, comes ethernet
[amended later...]
asciilifeform: on second thought, you probably could put this chore off, olimex sells a ice40-8k (largest available) with 512k of sram glued on. and this is theoretically enough to prototype . the more pressing matter is ethernet. ( afaik nobody sells an ice40 + ethernetmagnetics . and just as with ddr dram, answer is 'lattice wants you to use their larger fpgas, with THEIR toolchain'
spyked: was trying to get a perspective on things. I haven't dabbled in hardware since 3rd uni year (that was almost 8 years ago). and even then... well, I got a lot to learn.
asciilifeform: spyked: you will find that none of the existing literature is of any help
spyked: asciilifeform, what do you think of minimal baremetal implementation of Lisp (RISC assembly only) on something like a MIPS core? I might be thinking this in too abstract terms, it's definitely not that easy. but I'm trying to find a middle way between working FPGA Lisp machine and Lisp on unix.
asciilifeform: spyked: i tried this, it's a dead end because papers over idiot hardware rather than replaces. and does not yield hardware sovereignty-- you are at the mercy of the chipmaker's continued making of 100% compatible cpu & peripherals
asciilifeform: and there must be NO non-taggedwords code physically anywhere in the machine.
asciilifeform: and not a single flipflop of state that we didn't consciously put there.
asciilifeform: spyked: i spent 7+ yrs on this approach, and my verdict is that 'middle way' is a DEAD END
asciilifeform: you're welcome to spend 7 more. alternatively you can learn from my work.
asciilifeform: a spoonful of unix in a barrel of honey -> barrel of shit .
spyked: asciilifeform, I see. I thought this would be a way to make the problem easier in the short/medium term. but I've had to deal with being at the mercy of X myself at another level so this makes sense.
asciilifeform: spyked: consider, the amount of arbitrary stupidity you import even ~by having pci bus~ is gargantuan
asciilifeform: !#s dma
asciilifeform: and elsewhere, "took the chance made by the death of ssd in 'zoolag' to attempt netbsd. result : no boot. ( with or without 'no acpi' ) option, hangs at usb init." , "possibly one could try building a netbsd that doesn't try to touch the usb chip at all. then enjoy setting up sans keyboard."
asciilifeform: spyked: as for 'use off the shelf iron', see thread "pc engines 'apu2' (the board with the intel nics - vs. 'apu1', with realtek) , turns out, is crippled, hdt probe barfs with it, the cpu is reputed to have a drm fuse set." and elsewhere.
asciilifeform: see also the sad story of movitz
asciilifeform: !#s movitz
spyked: asciilifeform, lol, yeah, that's why I gave MIPS as an example. but actually, MIPS on FPGA + MIPS Lisp machine implementation might be more work than starting from CADR. that is, not even accounting for RAM and peripherals
phf: "asciilifeform: whereas physical 'ivory' happily did multi year uptime" << another little known fact, there are race conditions in genera itself, that manifest in higher clock environments
spyked: I'm still in the process of grokking the DMA+interrupts discussion.
phf: i spent (mostly another whisperer and myself did) on getting vlm stable, and i'm unconvinced that some of the issues we encountered were purely "buggy vlm". there is, for example, a crash in floating point instruction that happens when you load document examiner on stock piratebay opengenera. i have no explanation for it still, because vlm code ~seems to do the right thing~. there are other similar instances
asciilifeform: spyked: forget mips [...] and other nontagged cpu [...] they don't belong in the future.
asciilifeform: phf: iron floatingpoint Must Die.
phf: now i didn't find out about race conditions myself, that data point came from dks, they discovered race conditions as part of the emulator rewrite, but they have the benefit of having access to the necessary low level bits
asciilifeform: ( and yes genera used it and yes you need it for historically accurate ivorytron. but dun belong in sanecomp. )
asciilifeform: spyked: designing a reasonably sane ( a la 'scheme79' ) cpu is not actually the hard part.
asciilifeform: reliable controller for usefully large ram, and peripherals ( the only strictly necessary periphs are nic and a reasonable ssd persistent storage ) [...] are the hard part.
spyked: "phf: take movitz, take lice, add tcpip stack to it, add irc. what else one needs for tmsr-ing" <-- ah. precisely the discussion.
asciilifeform: spyked: didja miss the part where it runs on no modern nic ? or the thread where ALL modern nics are built to use dma, which gives the nic vendor access to every byte of your ram ?
spyked: asciilifeform, yeah, it's right below phf's initial reply. the whole thing
asciilifeform: sane comp is to be a fully granular description. i.e. if you can still get transistors, you can still build ENTIRE THING. no exceptions.
asciilifeform: no magic 'available until we reversed it, then magically evaporates from market' chips.
asciilifeform: ( as for 'ice' itself, if it were to vanish, we can have clones rolling off conveyor in 6mo or so for coupla hundred btc. not so for magical stateful nic etc )
spyked: yeah, that makes sense.
asciilifeform: phf: what you & dks et al observed is called complexity shitstack collapse.
asciilifeform: it is not curable.
phf: i still stand by this point, but with a disclaimer. i think that one of the advantages early hackers had was that they were working from inside their systems. linus was dogfooding linux, symbolics were using every lisp machine at their disposal to build better lisp machines. i think there's inherent folly in planning a revolution from the comfort of our bourgeois machines :p
asciilifeform: phf: the notion is to build a box with sufficient ram, horse, capacity to drive xterm, so that you can sit down on it and edit the fpga config per se.
asciilifeform: closing the ouroborus.
phf: right, but you can already do that with, say, fpga cadr. it's not necessarily a shiny experience though
asciilifeform: cadr sucks because it was designed around off the shelf alu and misc glue of the time.
asciilifeform: nongranular.
asciilifeform: the designers were crippled by the extreme transistorpoverty of the time.
asciilifeform: there's NO reason to emulate, e.g., amd am2900, for all time.
phf: sure, but chinual is extremely detailed and the ~architecture~ can be improved incrementaly. for example brad's cpu is, yes, implemented as an emulator for a discrete circuit. but at the same time it can be isolated from the bus, put into a determenistic harness, and rewritten from the cpu spec in chinual.
phf: to some extent something like that was done in a transition from 36xx to ivory
asciilifeform: since we're on subj, asciilifeform got the recently released ice40-8k (largest in the series) going. ( there's only 1 decent dev board for the 8k, the one released by olimex ~2wks ago )
asciilifeform: it worx. ( blinkenlichts, couplea hundred universal i/o pins, schematic published ) [...] 512k of sram (16 bits wide). [...] comes with 100MHz clock gen, but has pll, tunable.
asciilifeform: ( rather like xilinx, if you're a reformed xilinxist )
asciilifeform: phf: at some point ( and by this i mean when finished ffa / released 'p' ... ) i'ma have a large board made, with, say, 8 ice40-8k's, and row of dimm-holders...
asciilifeform: ... picture an a4-sized plinth, of, e.g., 32 dimm slots. each can contain a card of sram, or alternatively of 4 ice40-8k's, or some peripheral ( e.g. nic magnetics. )
asciilifeform: in very very other noose : the vendor's vga and ps/2 kbd demo verilog for ice40 builds and WORX [...] display syncs and displays moving (via arrow keys) sprite.
asciilifeform: ( olimex sells a little adapter that bolts vga db15 plug and minidin ps/2 to the ice40-1k and -8k boardz )
asciilifeform: vga bouncyball is a 'pons asinorum' of sorts, in fpgadom [...] << subj
asciilifeform: ( << iron component of subj )
asciilifeform: in other 'news', it is apparently impossible to fit even ONE 4096-bit adder into an ice40-8k ( the largest in the series )
asciilifeform: essentially, 'yes we put srams in the fabric but they can't shift, fuckyou'
asciilifeform: ( likbez : all you need for the mythical holy grail, 'fast iron rsa', is a very large-bitnessed adder-cum-barrelshifter and a few storage registers that can be programmatically shuffled between. )
asciilifeform: ~all ffa ops can be reduced to 'add-with-possible-2scomplement' and 'shift' [...] ( or rather, some finite sequence of these )
asciilifeform: asciilifeform in particular would like a gnat for mips, given as the latter actually fits in an ice40
asciilifeform: i took a shot at building one back in 2016, but ran into a buncha gnarl, which ave1 at this point seems to have mostly resolved
ave1: yes, did you ever hear anything back from the russian mips producers?
asciilifeform: nope [...] not 1 answered.
ave1: It seems that the American one (Cavium?) is also unobtainable
asciilifeform: the ice40 breakthrough however means that we can be own mips producer.
asciilifeform: it won't be world's fastest, but will work.
asciilifeform: ( and possibly can roll in not only mips, but e.g. a custom instruction for bounds checking )
ave1: I could see if I can port gnat to qemu-mips
asciilifeform: << example. schoolbook mips
ave1: thx! now if anyone can be bother to design a board with multiple ice40
asciilifeform: 1 will suffice.
asciilifeform: the simplest (text i/o only) mips emul asciilifeform likes, is 'gxemul' , should suffice to test.
asciilifeform: no particular need for qemu.
ave1: thx, I will look into it
ave1: I see, I'll have to learn verilog, I did not know you could implement a processor with so little code
asciilifeform: ave1: FUCKGOATS src makes as good an intro to verilogism as any imho
asciilifeform: ( illustrates a (write-only) serial uart, for instance )
asciilifeform: and state machine, etc [...] approx same level of complexity as that little mips.
asciilifeform: it eats 71 of the 72 logic cells in the old xilinx cpld.
asciilifeform: ( ice40 is about a dozen times larger )
asciilifeform: ice40, unlike the xilinx cplds, also includes 32kB of onboard sram. so possibly can have small cache, or extra registers, or some other useful item.
asciilifeform: "and possibly can roll in not only mips, but e.g. a custom instruction for bounds checking" << to expand on this, with the linked mips example, already can haz e.g. 32bit cpu with no pipeline, no cache, etc ~but~ fits in ice40.
asciilifeform: ice40 + a coupla MB of sram -- and you can (not very quickly, but) crypto [...] should even be enuff gates left over to roll in FG logic.
mircea_popescu: so it's your expert opinion you can't actually make a better [Juniper router] ?
asciilifeform: not with the currently available means, no
asciilifeform: it'd cost a couplea mil (orcbux)
asciilifeform: would need 1) fab capability 2) substantial time, unless mircea_popescu has a coupla qualified pairs of hands up his sleeve to assist
asciilifeform: thing needs to eat packets, parse fields, sort'em into tables, parallelize lookups ( and below all of this, do such things as driving the sdram , the nic PHYs , shuttle data b/w processors )
mircea_popescu: this doesn't sound like more than a few hundred adalines.
asciilifeform: if you want 'modern' (Gb/s+) throughputs, it aint 'ada lines', but transistors. coupla mil of'em.
mircea_popescu: "asciilifeform: 'industrial' telco gear is pretty much 'bsd box with array of GB nics soldered in' + some shitware" << i got a bsd box right here btw ; cost me nothing, took in as junk.
asciilifeform: mircea_popescu: entirely so, but these won't "'competition' box routes 1G/s from 48 jacks, daisy-chains with 10GB/s snakes, compiles ip filter rules into 1mil+ gate fpga fabric"
mircea_popescu: so work with tiny ones.
asciilifeform: the ice40 tops out at 250MHz (and drops rapidly when you fill it up, from switch fabric propagation delay)
mircea_popescu: but you can have 1k of these lined up.
asciilifeform: it is also pretty cramped sizewise (recall, i was not able to fit a single 4kbit adder into it) [...] 1k of these will eat 5kWatt [...] and occupy entire cabinet.
mircea_popescu: that was the idea yes, a cabinet.
asciilifeform: so we want to make and sell a cray-1 ?
mircea_popescu: actually, i was thinking, the tiny ices you use could be an intermediate step -- think alf, instead of fabric-of-transistors, fabric-of-ice
asciilifeform: well yes, i've wanted 1 for ages. but if you add up the cost of a dozen of these, you could instead get equiv fabbed into single die.
mircea_popescu: if it can process 100Gp/s rather than juniper's mahahahaybe 10Gp, nobody cares it eats a kw.
mircea_popescu: asciilifeform well, part of the reason i've been working on getting some chinese ppls is that i entirely don';t believe single die fabbing is as expensive today as you think.
mircea_popescu: however, before you make dies you gotta know what you put in ; and i know of no other way to find out.
asciilifeform: ^ iirc i answered this in the past, but this thread makes it even moar obvious what the pill is : make a hypertrophied ice40 (i.e. homogeneous lattice of gates.) with these, can bake alt-juniper, alt-pc, crypto, pretty much anyffing you like.
asciilifeform: and can chain'em into 'cray' at will.
asciilifeform: "if we had any fab capacity to speak of, these'd be the priority items : 1) large homogeneous fpga 2) otp roms 3) 1+2" and elsewhere
mircea_popescu: mno, because drops rapidly when you fill it up
asciilifeform: not mega-problem if you have large fabric, leaves lotsa room for optimal connection
asciilifeform: it's an egregious problem in cramped fpga.
mircea_popescu: from experience large items are cramped.
mircea_popescu: that's why town is tight and village loose.
asciilifeform: not if they're 5cents and you can matrix'em together.
asciilifeform: usg.fpga is expensive because 'intellectual property' derpitude.
mircea_popescu: all these lofty considerations aside...
asciilifeform: imho 'sane fpga' is closest thing to 'philosopher's stone' accessible with current tech.
mircea_popescu: are you making the alt juniper or arent you ?
asciilifeform: if problem is defined in such a way that i can honestly say that i have from what to make it, and can be made to work to spec -- will make. otherwise not.
mircea_popescu: alrite.
asciilifeform: 'are you launching capsule into orbit or not' 'plox to 1st say, where is orbit, how big capsule, tovarisch stalin'
asciilifeform: 'and do we have rockets or only slingshots'
asciilifeform: for instance, i might like to bake a box with ice40 as 'mips cpu' and "10ns ecc sram [...] 100MHz [...] ~15 usd / megabyte" for main memory. and then for ~2k usd you can have... 16MB . and what to run on that.
asciilifeform: observe that box with 1990s level of immunity to 'cachebleeds', 'rowhammers', etc. still costs 1990s price ... 250bux / 16MB.
asciilifeform: ( why to do this ? just as in other cases of 'i can't believe it's not X!', dram is not actually random-access -- all currently sold drams only achieve their rated speed in 'burst mode'; and from that it follows that they are only ever read to fill a cache line; and from this, trivial timing leak etc. and the joys of 'rowhammer', bonus. )
asciilifeform: Mocky: in the past i attempted a fpga rsa also. sadly the 'ice40' would need to be about 250x bigger, for it to be bakeable
asciilifeform: ( as it is, ice40 won't even hold ~one~ 4096bit adder ! )
asciilifeform: and i'ma never recommend to anyone the use of heathen 'rsa chips'. not even because they all, without exception, work with only 'toy' key lengths, but because srsly wtf.
asciilifeform: mircea_popescu: let's suppose we make the req'd contact. what wouldja want to fab 1st ?
mircea_popescu: asciilifeform what, the phb is supposed to come up with what ? you come up with three things let me pick, how about that!
asciilifeform: mircea_popescu: i've outlined several items, historically. will summarize for the l0gz, in order of descending ( per asciilifeform's lights ) universality : 1) sane fpga 2) sane minimal cpu 3) 8192-bit arithmetizer ( a la ye olde weitek! but for ints ) 4) 2+3 , if somehow can be fit into 1 die 5) 1chip carrierless radio ( per thread ) 6) sane ethernet controller .
asciilifeform: possibly incomplete list, but roughly it.
mircea_popescu: i dunno "fpga" is something that may be sane.
asciilifeform: why not ? it's the simplest item, and theoretically the others can be made from it.
mircea_popescu: a universal tsmr cpu, even if nothing more than miniaturized/updated z80, would prolly be the one gain here. so we end up with a commodity part to put in things.
mircea_popescu: asciilifeform define "fpga" for me.
asciilifeform: and pretty much the ideal 'nonspecificity of diddling' platform, it is quite impossible to meaningfully boobytrap fpga fabric if you don't have foreknowledge of what will go into it and precisely where.
asciilifeform: mircea_popescu: recall ice40 ? simple grid of LUTs, + matrix of programmable interconnects.
asciilifeform: for that matter current FG is baked on fpga, from evil old xilinx.
mircea_popescu: this is not a definition. define it.
asciilifeform: buncha gates, as many as can fit, and a programmable switching matrix, a la old telco , look up tables made of 4-6 bits of sram that turn a given unit into 'and' , 'or', 'xor', half-adder, straight wire, whichever is necessary. i dun know how to more rigorously define, it is one of the simplest devices, straight homogeneous grid of sram cells plus a couple hundred (thousand, in larger devices) 'express lanes' made of straight metal, [...] to use as bus
mircea_popescu: i can well define a hammer, one of the simplest devices.
asciilifeform: 'heavy iron head on a wooden stem'
asciilifeform: same level of description
mircea_popescu: there's two possible reasons you don't have a definition for a fpga you're happy with : either we're not yet enough advanced for one (to use, to make, whatever), or that it is outright an escher object.
mircea_popescu: and i suspect the later.
asciilifeform: FG is baked on fpga.
asciilifeform: for 'escher object' it worx pretty well.
mircea_popescu: but anyway, for my own use, fpga=wrapper around industrial poverty, somewhat like a painting that came with crayons.
mircea_popescu: asciilifeform the problem with escher objects is that "perpetuum mobile -- also works pretty well". it's what the fan always says, because 'to my eyes' "asciilifeform: quite likely thinking of the bulk of the b00k, which consists of blockcipher liquishit which is complicated for no reason at all other than the religion where 'it is confusing to ME, author, and therefore Must Be Hard To Break'" sorta thing.
mircea_popescu: i believe attempting to go "everything's a fpga because fg worked ok on one" is learning the wrong lesson from fg, in the "smbx had perverse incentives (usg funding that appeared bottomless - until it died suddenly. reagan's 'star wars.') << best way to sink a good start-up is a bad revenue source early on." sense.︎
asciilifeform: mircea_popescu: for design that actually fits inside, you end with exactly 'slow asic', with the added win that it's a homogeneous object with no e.g. 'and here is where he will rsa and here is where the low bit of multiplier will live' sabotage target available to enemy mole in vendor plant.
mircea_popescu: because no, "every picture comes with crayons now" is not very smart ; and it's perversely, recursively nonsmart ("can't make polaroid, no way to produce attachable crayons -- maybe 3d print them ???")
asciilifeform: i disagree -- fpga is analogous to gutenberg's movable type; classical 'asic per design' to chinese whole-plate.
mircea_popescu: asciilifeform there's better approaches to hanging moles than putting an ok button on every movement of every rifle.
mircea_popescu: well, this dispute will have to be resolved cuz it's fucking important.
mircea_popescu: i'm not particularly invested in being right about it ; but i'd better not be right and we end up with the wrong thing.
asciilifeform: thing's existed since mid-80s, the pluses and minuses of it are well-documented
mircea_popescu: i don't suspect they're well understood.
asciilifeform: ( orig ancestor was the PAL. there were PALs in yer ro 'spectrum' clone. )
mircea_popescu: neither do you -- the minuses of the linux-c stack were actually not thoroughly understood until tmsr either.
asciilifeform: there's 2 well-known minuses. 1 is that yer making circuit out of immovable parts, connected by drawing line though multiple elements ( bus lines are generally few ) , this gives you much slower circuit with many fewer logical elements than if you had made the device physically from scratch .
asciilifeform: the other is political, all of the existing vendors obfuscate and keep seekrit the necessary docs to actually program the thing. ice40 happens to have been reversed, but it is ruinously small ( still ~150x bigger than the miniature xilinx i baked FG from, however , but too small even for 4096bit adder )
mircea_popescu: the third is technological -- you learn to walk with crutch.
mircea_popescu: this is my concern here.
asciilifeform: what's the 'crutch' ? not spending a $3mil + 1yr delay if there's bug in layout , like 1970s folx had to ?
mircea_popescu: now, a 4096 bit native fpga, specifically for rsa-ing and rsa-likes-ing, THAT might be very useful, because there the s-o-d item is major win.
mircea_popescu: asciilifeform yes alf, that's what's always the crutch. "give just a little spring in yoru step for insurance against toads in the roads." that's precisely the crutch.
asciilifeform: there's no 'bitness' in fpga, it's a bag of gates, if you have enuff of them you can made n-bit addder, divider, whatever one likes
mircea_popescu: no but i mean, pre-bake it in 4096 bit chunks
mircea_popescu: no bit. byte, of 4096 bit size. make n-byte adder, sure.
asciilifeform: mircea_popescu: this is actually how existing ic industry worx, a good half of the 'asics' are actually 'hard copy fpga', recall the early miner derps threads.
asciilifeform: they prototype on ordinary, sram-based one, then pay to have it metallized.
mircea_popescu: so it's how the "industry" works. is how the "music industry" works.
mircea_popescu: asciilifeform you realise "soviets imported windowze" was precisely s. s. sovietovski saying "this is how industry works" in 1980.
asciilifeform: mircea_popescu: the sad bit is that conventional asic process , as available today in cn , tw, etc, is also like this. you are forced to use 'standard cells' supplied by vendor.
asciilifeform: they giv'em under nda, too.
mircea_popescu: yes well. how about we bake out of this, rather than into this.
asciilifeform: as i currently understand, that means vertical integration, i.e. building the plant.
asciilifeform: $B.
mircea_popescu: ie, i can't fucking have an arbitrary chip made.
mircea_popescu: if i wanted the center caret of z80 rotated 90 degrees and printed, i could not get this done.
asciilifeform: i wouldn't go so far as 'can't', but we're talking 'lease $B plant for 6mo.' sort of figure.
asciilifeform: it is not available as off-the-shelf service anywhere, afaik, nope.
mircea_popescu: the exact digits in there are the question here.
asciilifeform: i found this out the last time we had 'let's bake ic' thread, and it was thoroughly depressing, put me off subj for 2y..
mircea_popescu: yes. but it's been 2 years.
mircea_popescu: are you saying the reeval cycle is too tight ?
asciilifeform: lol
asciilifeform: the exact figs can only be obtained by a cn-speaking emissary, i suspect.
asciilifeform: and their magnitude will depend, i also suspect, on how well he plays his cards.
asciilifeform: btw the reason, afaik, why erry fab house forces 'standard cells', is that they have proprietary tweaks to their process , and have lib of cells ( kept seekrit ) that are known to work with said process.
asciilifeform: the actual physical procedure of baking the ic is not as standardized as i previously (to last thread) thought.
mircea_popescu: indeed it is not.
asciilifeform: it is at the level of 18th c. cannon-forging, roughly. erry house has 'seekrit sauce'.
mircea_popescu: moreover, very 1820s steam engine airs hang about the entire barn
mircea_popescu: oh im sorry, "industry"
asciilifeform: noshit
asciilifeform: imho the race for 'smallest transistor' has been a disaster of incaization -- in '70s there were thousands of ic makers, in '80s -- hundreds, in '90s -- dozens, today maybe 10 .
asciilifeform: the plant gets ruinously costlier, per erry 'shrunk nanometre', and somehow gotta be amortized, and the competition gets thinner an' thinner
asciilifeform: iirc we had a thread re 'ic is deeply incatronic tech' hypothesis
mircea_popescu: asciilifeform aha.
BingoBoingo: Kinda suggests the 2+3 option seems like it could be had sooner than a neutral field of gates FPGA
asciilifeform: BingoBoingo: i doubt any of it belongs in same sentence with word 'soon', we're speaking of just short of mars colony.
BingoBoingo: That's an exaggeration. This is more of a blank slate nitromethane internal combustion engine for freight hauling.
asciilifeform: BingoBoingo: vehehery different class of nre cost, vs engine.
asciilifeform: much moar comparable to satellite biz. a 1-bit mistake costs you coupla $mil.
asciilifeform: and the mistake can be in anyffing, incl. a physical interaction between unrelated components that you did not know were possible.
mircea_popescu: incidentally, this bitmistake/milcost is the exact reason human genome is some % garbage.
mircea_popescu: what exact % -- numeric application from those priors.
asciilifeform: to briefly revisit upstack, asciilifeform's interest in ic fab largely revolves around "for the sake of thread-completeness, what would the ~alternative~ to this story look like? i suggest -- it'd be a process which does to ic fab what 'polaroid' process did to colour photography. find way of etching the circuit from prefab 'sandwich' without caustic baths, sputtering, etc..."
asciilifeform: presently i have nfi whether this is physically possible, or how in particular -- could be fpga-like device where somehow the components actually ~move~ into position ; or sumthing where you can optically burn away the unused tracks through 'window' ; or some yet entirely unknown trick.
asciilifeform: imho the classical fab is an overwhelmingly incatronic tech, it centralizes unhealthily.
mircea_popescu: incidentally... there's all these LAYERS, because we're essentially making books, ie, 3d object out of 2d implementations
asciilifeform: that's part of what makes the trad process cost what it does, yes
mircea_popescu: maybe the trick is to make 6-connected cubic matrix and burn away connexions via ion pump or similar.
asciilifeform: the etches, the masks, the elemental fluorine gas and other joys
asciilifeform: mircea_popescu: i'd even settle for something entirely like ice40 but with fuse/antifuse bridges
asciilifeform: in '80s folx briefly made , then somehow evaporated.
asciilifeform: incidentally , baking such box doesn't marry to serpent, can replace the ice40's feed rom whenever, with whatever one likes
asciilifeform: so long as it sits down in 8k gates
asciilifeform: ice40 eats config from a 8-legged spi rom thing, can socket it.
asciilifeform: (unlike the xl9572 , incidentally, which has baked-in eeprom )
asciilifeform: mircea_popescu: nope, as in fact noted in the head of thread, "in re these lulz, at one point asciilifeform dug for 'anybody ever verilog-ified serpent?' and found a stack of 'papers'. any src ? mno. but plenty of 'discussion' of supposed 'implementation', in the traditional nadia henninger style ."
asciilifeform: it needs that 1 magick trick.
mircea_popescu: speaking of which -- an ada-to-verilog item would prolly be very fucking useful
mircea_popescu: looks to me like about half of what we write, we'll end up baking eventually.
asciilifeform: mircea_popescu: they're sorta fundamentally immiscible, verilog is not a procedural/algorithmic lang
asciilifeform: it's a wiring diagrammator, if you like.
mircea_popescu: yes but how strong is that sorta ?
asciilifeform: all the lines 'execute at once'
mircea_popescu: recursive and functional also "sorta inmiscible", at least until bright kid
asciilifeform: it compiles into a gate netlist, rather than sequence of instructions for von neumann cpu.
mircea_popescu: no dood i understand the differences.
asciilifeform: there actually exists an ada-flavoured variant, 'vhdl', but i never saw any win from it, loox rather like simply a moar verbose verilog. but! to be fair, that was 10y ago when i last dug, it was prior to asciilifeform's getting into adaism.
asciilifeform: most gate compilers support both.
mircea_popescu: well, so in actionables : probably dusting off vdhl worth your time, see how it feels. possibly baking serpentdisk worth your time, tho at this point seems kinda soso.
asciilifeform: ( i was initially testing rk pilot plant to run off sd, discarded on acct of meh speed vs usb3 )
asciilifeform: vhdl is prolly worth a 2nd look, tho i currently suspect that it vs verilog aint a 'ada vs c' win, simply longer text that does same thing ( the only unit of data in fpgaism is really the bit, so 'types' dun exist )
asciilifeform: and the q of 'would serpent fit in ice40' is imho also worth answering. i'ma put it in the pipe.
asciilifeform: "mircea_popescu: probably dusting off vdhl worth your time, see how it feels. possibly baking serpentdisk worth your time, tho at this point seems kinda soso." << this quickly led to dead end, incidentally -- the ice40 'icestorm' proggy dun seem to eat vhdl...
asciilifeform: ( suxx when there is only 1 working example of a thing... )
asciilifeform: there's a converter, but it smacks of ye olde c2fortran
deedbot: Loper OS - Can the Serpent Cipher fit in the ICE40 FPGA?
mircea_popescu: asciilifeform basically, if it fits in 1/3 of the chip ?
asciilifeform: approx, yes ( tho keep in mind that said chip, in order to do useful work, gotta have at least a bit of room for other things, unless one were to equip board with >1 ( not end of the world, they're, what, 8bux ) )
asciilifeform: mircea_popescu: observe also that the sbox mechanism is 'bitsliced' (i.e. the bits move only 'vertically' there ) so potentially it can be shrunk at expense of speed . so the real puzzler isn't 'does serpent fit', it can almost certainly be shoehorned, but 'with how little/much unrollage' i.e. what resulting eating bitrate.
asciilifeform: it is also possible that the equations can be simplified further, i did a fairly surface job of it, mostly by hand
asciilifeform: literally 2hr's evening wurk.
asciilifeform: btw, spoiler : i put the thing in an ice40-8k , simply did not have time to write up yet, and the fwd sbox in fact eats roughly 1/4 of the gates . which leaves the orig question wide open...
asciilifeform: in other minutiae, the terms i left in xor-containing form, can of course be expressed in not/and/or , but this resulted in seven-term ORs , which i assumed is a greater delay than to let it use a xor LUT; but this is not experimentally confirmed, and one might conceivably get better throughput if all of the terms were rewritten in the and/or/not form.
asciilifeform: 'yosys' ( 'icestorm'-'s synthesizer, suggests a max clock rate of ~65Mhz for the posted form. )
mircea_popescu: asciilifeform so did you measure throughput of this thing ?
asciilifeform: mircea_popescu: as in, whether it actually sboxates at the stated 65MHz ? notyet, gotta write a serial i/o thing for it, to do this. possibly later today.
asciilifeform: i expect the sbox won't actually be the bottleneck in a full serpentron tho
asciilifeform: rather, it'll be the rotational transforms.
asciilifeform: those are blocking, i.e. take multiple clocks ea.
asciilifeform: imho, if an ice40 can be coaxed into serpenting at , say, 1MB/s, it's worth sumthing, otherwise iffy
asciilifeform: ( and conceivably, worth sumthing even if it takes having ~two~ on the board; problem is that i dun presently have a board with 2 , to actually try )
asciilifeform: believe or not, seems like nobody has ever publicly baked a board with >1
asciilifeform: i've gathered afaik all of the commercial demo boards with ice40, they all have 1 ea.
asciilifeform: if i were baking asic ( not sure why anybody would blow 'orbit' moneys on serpent asic, but for the sake of arg ) would unroll the sbox invocation the way it is unrolled in the pc serpent diana_coman is using, there'd be no reason not to have 128 or what, independent copies. but in the tight space of ice40 this is out of the question.
asciilifeform: err, 32
asciilifeform: is the actual parallelism of the algo. the rotator would likewise win from having 32 physical instances, as obvious from
asciilifeform: so from that point it becomes a q of the actual gate delays. in principle a serpentron that does coupla 100MB/s is physically possible. ( just not on my desk, lol )
asciilifeform: i admit, the seekrit reason asciilifeform could even be arsed to pick the thing up, is that to write serpent in maximally algebraic form might tell us sumthing useful re the weakness.
asciilifeform: ( the orig author, to be fair, did write it algebraically, but in imho somewhat cryptic form )
deedbot: Loper OS - Serpent in ICE40, Part 2.
asciilifeform: quasi-relatedly: asciilifeform found out that it is actually possible to fit an rsatron into ice40, if one uses a bit-serial multiplier into external sram. a 4096x4096 mul would then take 8192 clock cycles ( 16384 if counting all load/stores. ) but we can come back to this item laters.
asciilifeform: ( it doesn't win 'vs pc' other than as proof of concept )
asciilifeform: 1980s algo, from the days of 1uMism.
mircea_popescu: i'd so love to see my "byte is 4096bit" asics...
asciilifeform: 8192!
asciilifeform: thinkbig.
mircea_popescu: yaok.
asciilifeform: i'd buy a 8192b-reg mips right nao.
asciilifeform: the only seekrit ingredient missing is some way to actually lay the physical circuit down.
asciilifeform: it's a 100% solved, in the mathematical sense, problem.
asciilifeform: doesn't even need 10nM or anyffing of the kind
asciilifeform: 1995-era process would entirely suffice.
asciilifeform: ( bake it in ECL, then get 'for phree' not only constant-time, but constant-current.)
asciilifeform: ( very sadly, there is not a vhdl eater for ice40. but when looking over subj 2w ago, i actually came to like vhdl, it's ~exactly ada syntax. )
asciilifeform: in other notes from last wk's dig, turns out that ice40 contains onboard otp rom , enuff to store whole config ( somehow i read the docs '9000' times prev. and missed this ).
amberglint: asciilifeform: the reverse-engineered toolchain for the iCE40 now supports a larger chip, the ECP5 with 85k LUTs (vs ~8k LUTs of iCE40), possibly worth taking a look at:
amberglint: perhaps the rsatron will fit into this larger chip?
asciilifeform: amberglint: i cannot immediately say that it won't fit! at least from first principle. ( yosys's router is pretty rough tho, vs the commercial ones of old, so it is possible that not fit )
asciilifeform: datasheet also reveals a 'Dedicated DDR2/DDR3 and LPDDR2/LPDDR3 memory support with DQS logic, up to 800 Mb/s data-rate', wonder if this is supported in the ice40 chain yet
asciilifeform: seems to have iron multiplier, also
asciilifeform: also loox like has iron high-speed uarts, for e.g. GB nic - baking.
asciilifeform: not even outrageously expensive ic, they're ~30bux ea.
amberglint: there's also a presentation about it, haven't found the transcript: