Gvildenu | Mintys / AutomatųLaipsnynas

Mintys.AutomatųLaipsnynas istorija

Paslėpti nežymius pakeitimus - Rodyti galutinio teksto pakeitimus

2014 gegužės 18 d., 20:18 atliko Andrius Kulikauskas -

Pridėtos 1-711 eilutės:

See also: AutomataTheory, NPComplete

------

Hi Joseph, Tom, Sean,

I suppose I hope to get your attention. I think I have thought
through a novel way to approach automata theory, one that opens up new
domains in mathematics, and also metaphysics.

Tom Munnecke wrote to the Blue Oxen collaboratory, presumed copyright:
http://collab.blueoxen.net/forums/yak/2003-11/msg00110.html
"Re: Chomsky, transformational grammars, etc. I have a fuzzy notion
(actually all of this is fuzzy) of a "maturity model" for domains of
discourse that they begin with finite state models (pigeonholing
thoughts into predefined addresses - poke the address and out comes
the thought)... then move to generative grammars (common deep
structure with arbitrary number of transformations to a "surface
structure.") then moving to self-referential evolutionary structures
(crossing an "organic threshold" of being cat-like instead of
toaster-like)."

At UCSD, I took a graduate course in automata theory, based on
"Introduction to Automata Theory, Languages, and Computation" by John
E. Hopcroft and Jeffrey D. Ullman, and was quite intriguted by the
Chomsky hierarchy of the classes of languages generated by different
kinds of automata.

Philosophically, I rejected the hodge podge approach to defining the
classes, and looked for a universal, elegant, and metaphysically
profound way to distinguish the classes, and the kinds of automata.

I believed that you could actually look at the production rules for a
language and say: here the automata is, qualitatively, behaving like a
finite automata, or a push-down automata, or a Turing machine. I
believed that you could do this qualitatively, for each production
rule. For example, you could point to a Turing machine and say: here
it's only behaving as a finite automata. My teacher rejected this
approach on the pedantic grounds that the definitions only made sense
for the language as a whole, not for particular calculations.

I have found a way to formalize my intuition!

I use matrices. Only three.
One matrix X for the variables. It's a diagonal n x n matrix, so the
off-diagonal entries are zero. The diagonal entries are either 0 or
1, but for now they are n variables, the n states X(ii).
Another n x n matrix A to define the alphabet of the language. I
realized that, for my purposes here, I could by introducing as many
variable as needed, working with matrices as large as needed, assume
that the production rules for a finite automata are all of the form
X(ii) -> A(ij)X(jj) where A(ij) is a unique letter in the alphabet,
and may also be 0.

In my formalism, we will see that all of the words that result are
strings that concatenate the letters A(ij) and always according to the
rules of matrix multiplication!

The secret is in the interpretation! The four classes are generated
by the following rules for matrix multiplication:

X -> AX finite automata, regular languages
X -> AXA push-down automata, context free languages
XA -> AX linear bounded automata, context sensitive languages
XA -> X Turing machines, recursively enumerable languages

In other words, thinking of A as the generic fixed instructions that
take you from one state X(ii) to another X(jj) by unique and
deterministic step A(ij), and which you might specialize by setting
A(ij) to 1 or 0, the above rules for matrix multiplication define the
production rules, both locally and globally the character of the language.

Here in matrix multiplication I'm using + to mean "or", and the
multiplication is to be understood noncommutatively, so that it is
generating all manner of strings. The actual language generated will
be given by a subset of strings taken by setting X(ii) = 0 and A(ij) =
0 for the nonexistent states and paths. We set the remaining states
X(ii) = 1 for those that are actually used by the particular automata,
and then the strings of A(ij) that result will be understood as those
strings that may be accepted by the automata and are part of the
language it generates.

Finite automata are exactly like matrix multiplication X -> AX in that
they can only look ahead to see where the current state might take it
next, and accept a letter in traveling to the next state. Note we
only allow multiplication from one side. We can see clearly what kind
of elements will be created, anything of the form AAAAAAAA which is to
say, things like A(ij)A(jk)A(kl) etc. these are paths.

Pushdown automata make calculations based on whatever is the top tray
on the stack, but they can take trays off, not just put them on. They
are equivalent to context free languages which have production rules
of the form Y -> cZd, which is to say, a rule like Y -> 0Y1 can
generate languages like 01, 0011, 000111, 00001111, which finite
automata can't because the latter don't have any memory. I noticed
that in my context I could express such rules as X(ii) ->
A(ij)X(jj)A(ji), or for example even X(ii) -> A(ii)X(ii)A(ii) as here
I am using the location of the X as the cursor position to let me know
that the lefthand A(ii) is a c, and the righthand A(ii) is a d, and
distinguish them. So this should give me the full power of context
free languages, and is compatible with what we have for finite
automata. Again, this is deterministic, I can work backwards and see
how the calculation came about. And we're still dealing with matrix
multiplication, pretty much in the normal sense, except that now there
is a cursor located somewhere within the middle of the path, not just
at the right end.

Linear bounded automata are a special case of Turing machines. What
they have in common is the ability for the cursor to carry letters to
another location in the string, which is to say, shift them over to a
workspace. I couldn't find it in the book, but when I was a graduate
student I somewhere found out that the key production rule is of the
form Xcd -> dXc which if you notice is a three-cycle and it's purpose
and power is to move the c across a bunch of d's, so that Xcddddde
yields dddddXce. So now the X has moved the c over next to the e,
into a different context. That certainly complicates things with
record to our matrix multiplication! But it's actually not a
disaster, given the right interpretation, if we know the cursor
location! This is because I think we can assume that the cursor only
moves right! and never back! This certainly gives us something more
powerful than just the pushdown automata, and I believe as powerful as
linear bounded automata. You have production rules of the form X ->
dXd that let you increase the size of your workspace as much as you
like, and as complicated you like, on your right. And you can keep
moving further to the right. The trick now is to notice that X is a
diagonal matrix. This means that from the point of view of matrix
multiplication, AX and XA define the same paths! if thought of as
locations visited. I mean to say that a term A(12)X(22) and
X(11)A(12) have the same information in terms of path, that
information is not lost! Note that the value of the edge from A does
not change! So no information is created! Only the value of the
cursor changes! All we've done with our strings is now add an
"interpretation" that X can move right as far as it likes and then
takes on the value of required by its location. Also, certainly, the
complication is greater, we are able to move the cursor and therefore
generate new kinds of strings, and also we are here now finally able
to generate the same string but in more than one way. Still, matrix
multiplication continues to generate correctly the set of strings that
are accepted.

Finally, the extra power of Turing machines is that they allow us to
delete letters, to destroy information! Generally, in the theory, if
we have a production rule from one string of letters and variables:
w1...wN -> v1...vM then so long as the strings keep getting longer, we
can only have the complexity of the linear bounded automata. In order
to have greater complexity, we have to have shrinking strings! In
other words, we have to have deletion, erasure. Well, the simple way
to do that is XA -> X, simply allow a letter next to a cursor to
disappear. Note, we can even do this only on the right, on our simple
side. The reason is that it destroys our knowledge of where our
cursor was. Earlier we were able to use the information that we were
generating as AXA as a kind of memory to let us figure out where the
cursor was supposed to be centered originally, and also distinguish
what was created by the finite automata and by the push down automata.
But now everything collapses. We have all these locations where the
information we've been needing for global orientation has disappeared.
We can no longer keep track of what's up, the evidence can be
destroyed. But note that every thing that we're generating is being
accepted! There's nothing wrong with the matrix multiplication in
that regard. It's simply that we're not able to necessarily tell if a
particular word is generated by the production rules, because it may
have been shrunken from something very complicated. But certainly we
know that the production rules, as interpreted above as matrix
multiplication, are generating them!

Writing this up, there's a lot of details to prove that might not be
hard for an expert of the field, but if this viewpoint is correct,
it's a lot more elegant and unified than the book I have. And it
opens up the use of some very powerful mathematical tools,
coincidentally, from my thesis.

My thesis was on Symmetric Functions of the Eigenvalues of a Matrix.
I noticed that the determinant of a matrix is the product e1 x e2...x
eN of its eigenvalues, and the trace is the sum e1 + e2 +... + eN, and
these are both symmetric functions. Well, symmetric functions are the
foundation of combinatorics, the math of counting, listing, generating
objects. The basement of mathematics, at least that's why I studied
it. For a general matrix, you can't know the eigenvalues, but... you
can always calculate the determinant, trace, and in fact, all of the
symmetric functions of the eigenvalues. And I noticed they generate
very basic combinatorial objects: walks, cycles, Lyndon words, but all
built from edges from i to j, instead of blocks i. In fact, setting
the off-diagonal edges to 0 lets you recover the eigenvalues, they
become the diagonal elements, and so you get as a "trivial" special
case, all of the combinatorics of the usual symmetric functions.
Furthermore, traditional theorems for matrix combinatorics all seemed
to be interpretations of trivial statements arising from this perspective.

Well, here we're only dealing with two very general matrices, A and X!
And everything we're ultimately generating is of the form
AAAAXAAAAAA. Well, that is a matrix whose edges are interpreted as
paths. And along that path is a cursor! given by the diagonal matrix
X. So we get the usual kinds of objects but replacing them with paths
with cursors. Note also that the eigenvalues of AB are the same as
BA, and we know the eigenvalues of X!

Another thing is that this approach can be taken up with other classes
of automata, for example, the P = NP problem. Note that here the
length of an input word is very clear, it is the number of letters.
Whereas the the complexity is, I imagine, given by the number of
production rules. Well we see immediately that the finite automata and
pushdown automata can't cause any great problems. Nor can, for that
matter, the linear bounded automata, because we can move to the right
only to the extent that we've been generating letters on the right
with the help of the pushdown automata. So the real source of
complexity comes from deletion! In fact, in a sense it is given by
the number of deletions, because it's equal to the extra number of
additions, and by the number of extra lateral movements between deletions.

Well, the matrices X, AX, AAX, AAAX, etc. all have different sets of
eigenvalues. But lateral movement doesn't affect that. So this
approach offers a way to keep those two issues separate, which they
need to be, if we're to do good counting.

I feel good about this. And if it is the least bit sound, then it has
a lot of potential for the metaphysics that I work on, because these
production rules have a very similar kind of form to basic operations,
structures.

Am I on track? or way off?

----

Actually, I only used two. (Sorry!)

The reason I wrote three was that I was thinking of inserting an
identity matrix to mark the deletions. Then it can be ignored later.
But for my purposes here I realized it was simpler just to do without it.

----

I noticed a mistake...
in the Turing machine part, the deletion rule XA -> X may not accord
with what's happening on the element level.
Or maybe.
On the element level we have
X(ii)A(ij) -> X(ij) which is nonzero only if i = j.
But that may not be a problem.
It's only a problem if X(ii)A(ii) -> X(ii) is not strong enough to
create full strength Turing machine behavior.
An alternative would be to define an operation that would collapse,
delete not one element, but the neighboring cycle. That would be very
powerful, but of course, messier. It would fit well, though, with the
eigenvalue approach, which is all objects built from cycles.

So the overall idea of relying on matrix multiplication still stands,
I think.

Also, I forgot to mention that typically the N and NP problems are all
about relationships between nodes, and so might have very natural
expression in terms of the edges of matrices.

----

The set-up makes it straightforward to distinguish between
Nondeterministic machines (possibly multiple ways out of each state)
and Deterministic machines (at most one output of each state). I've
set up for the nondeterministic case, and I can get the deterministic
case by restricting myself to the machines that are of the shape of
permutation matrices. In the eigenvalue theory this is quite a
profound restriction because it reduces the homogeneous symmetric
functions (which generate products of all strings) to products of
disjoint cycles (which is what is generated by the elementary
symmetric functions, but the latter are signed). In other words, an
arbitrary string of length n gets mapped, collapsed to an n-cycle.

The correct way to define the Turing machine effect, I now think, is
now deletion of the cycle to the right. This is a very simple and
elegant operation in this combinatorial context, and this kind of
operation comes up in my thesis a lot. The cycle may be of arbitrary
length. I imagine that each cycle deletion may, from the point of
view of complexity, be like increasing the complexity by a power, the
next polynomial up.

Shannon, so I've found a unified way to think of four different kinds
of production rules. They have different roles in the dynamics of
language:
- The finite automata are the "natural laws" that keep generating
stuff left of the cursor.
- The pushdown automata, context free grammar, offer the basic parsing
capability for dealing with pairs of markers, like left and right
parentheses, working in a tree like fashion. They let us break down a
meaning, in the primitive sense. In doing so, they also open up a
workspace on the right, balanced with what they are generating on the
left.
- The linear bounded automata, context sensitive language, serve to
move the cursor into the workspace on the right. So this means that
the parsing ability can be moved to a new context. In other words, a
meaning can depend on a context.
- The Turing machine get its power from its ability to delete a cycle
on the right. Any such cycle was created by the pushdown automata,
which in this set up is the only thing that ever creates something on
the right. So on the right we have combinations of creating cycles,
moving cursors and deleting cycles. The net result is that we get
situations where we may break down one cycle and build up another. We
may, with each production rule, be "overspeaking" or "underspeaking",
which is to say, building on the last deletion, or building towards
the next deletion. In this way, we have the natural kind of ambiguity
in language between underspoken and overspoken.

Nobody argues about the power of Turing machines in terms of sheer
computational reach. I think we're mostly lacking in a cognitively
elegant way to present Turing machines. That's the kind of thing that
intrigues me, how to relate the intuitive understanding behind each
kind of computation with different kinds of metaphysical first principles.

Joseph, for example, I imagine the above distinctions are relevant to
"responsibility" as we apply it in life. There is a kind of "finite
automata" responsibility that is one-step at a time, where we are
simply forced to choose, or even not choose, at each step, what to do
in the current situation. There is a "context free grammar" type of
responsibility that pushes out for us a workspace, so that we can
"take on responsibility", connect an act of intent that will be our
mark in the past, with an intended action that we mark out in the
future and will become real at some point. Note that it is a two-step
responsibility. There is a "linear bounded automata" type of
responsibility by which we ourselves move into that future we've
projected, where the three-cycle "take a stand, follow through,
reflect" works to move our state (our stand) adapted but pretty much
intact across many experiences, until we may end up in an entirely
different context where completely other things may be relevant.
Finally, there is a "Turing machine" type of responsibility that says
that we can have many long chains of thought that we simply delete,
they will be truly forgotten, even though they may have counterparts
in our past. They are subroutines that have done their job and we
don't want to keep track of them.

So this all shows how we can think of time as what is on the right of
our cursor. Finite bounded automata (production rules) have time flow
at us one-step at a time, and recede to the left, into the past.
Context free grammars let us construct our own time as a workspace
where we can posit intended actions that match our acts of intent in
the past. Linear bounded automata let us move with our principles into
that workspace of the future. Turing machines let us delete portions
of that future as simply thoughts to be forgotten and erased, although
leaving counterpart experiences in the past, the acts of intention
with which those thoughts were generated. In each case, we have
thoughts in the future to the right of the cursor, and actions that
have happened in the past, to the left of the cursor.

All of this with just two matrices, X(ii) and A(ij) and four
production rules:
X -> AX
X -> AXA
XA -> AX
XA -> delete cycle to the right

These are all just the right simplicity to be expressed metaphysically
from first principles.

I think of God as the unity of representations of everything. And one
property of everything is that it is the simplest algorithm, the one
that accepts all things. So I wonder what that means here.

----

My purpose is, of course, to understand life, and so my thoughts on
automata are simply one more means towards that end. Certainly,
"algorithmic" thinking is part of human life, not necessarily all of
life. But the qualitative distinctions of the kind I made about
responsibility:
- dealing with things "now" as they come
- depositing "notes" that you will deal with something later
- taking up and holding "principles" that you carry across changes in
context
- being able to put aside chains of thinking that have come full circle.
These are ways of thinking about human experience that predate automata,
and instead, express some of the ways that we as humans have dealt with
life systematically, and in the historical sense, are plausibly what
have manifested themselves in the kinds of cultural systems that we find
ourselves living in.

The model I'm working out brings all of the above together in a unified
framework, and it already has some nice consequences. One of them is
that I have a Turing machine equivalent that makes no use of an infinite
tape! Instead, the "tape" is whatever workspace is rolled out by the
context-free-grammar that is depositing the notes. So this resource is
something that is explicitly accounted for.

Also, I have a Turing machine equivalent where the "cursor" only moves
to the right, never to the left! And the only elements that can arise
on the right are generated by the context-free-grammar. This means that
the "future" is extremely small. It is constructed only by "depositing
notes". And the complexity of the future is increased by reducing those
notes, either by "moving into the future" and dealing with them
proactively, or by striking out whole circles of them. The "smallness"
of my "personally constructed" future seems a very real factor of human
life.

The "interaction" is there. It's not between people. It's between the
individual and their environment, much of which is personally
constructed. What the model says is that the "real world" only effects
us in ways that a "finite automata" experiences, ways that we can "deal
with right now". All the rest is personally constructed, personally
accepted. Any future that we acknowledge is only because of
self-obligations that we have posited. And the way that we may simplify
our future is A) to deal with those obligations by moving into them with
a principle, or B) to dismiss sets of obligations that we find go
nowhere, go back to where we started, cancel themselves out.

That may not be all of life, nor all of experience, but it seems a
pretty straightforward account of a lot of down-to-earth reality. I
think, compatible, with Buddhism, from what I read.

Another fantastic feature of this framework, if it hangs together, is
that for the automata it defines, it deals with ALL automata at the same
time. In other words, it is saying what the FULL automata would
generate, and then every particular automata could be gotten by setting
A(ij) = 0 for any nonexistent production rules. Furthermore, it
maintains, on the left, a copy of every rule ever applied! (Just as in
real life, regarding the past!) All of the power of the linear bounded
automata is gotten just by shifting the cursor, and all of the power of
the Turing machine is gotten by deleting cycles, which is all to say, by
erasing certain elements, thanks to the ambiguity that results.

Furthermore, that a word has been "accepted" is given as soon as it is
generated. This fits nicely with the picture of partial recursive
functions, where we have to keep waiting and typically never know that
they will "halt" for a particular word, which is to say, we never know
if a small word will not be generated later, simply because it may
result from deletions from a very large word.

If we think of the amount of resources used, then this framework does a
good job of keeping on top of that, they are simply the number of
production rules used. I should clarify something, that because each
word is keeping a copy of all the stuff generated on the left, then a
word can hardly to be said to be "short", they keep getting longer,
unless we restrict our thinking to the nub on the right. But we can
have another kind of "acceptance" that says a word is accepted if it
ever appears to the right of the cursor. And this is perhaps the
framework in which to study the use of resources: Given n, and automata
A, what is the function f(n) that gives the number of production rules
needed to have generated all of the words of length n that will be
generated?

Also, we can specialize the matrix in various ways, for example, plug in
"1" for various matrix edges at the end, which completely changes the
set of words accepted, can collapse them into sets that are more
recognizable.

Metaphysically, the matrix is about as simple, pure, powerful, basic
object that there can be, if we think of A(ij) as a shift from i to j.
Note that we don't even have to have a notion of integer.

Regarding modeling humans, I want to add that I think theoretical
automata are much more directly "human", and reflective of human
thinking and experience, then any practical contraptions, or networks of
contraptions, by which we make computations. The theoretical automata
are pure constructs of the human mind, reflections of it. The practical
contraptions are much more distant in that regard. The theory clearly
shows that the power of computation comes from deleting information, not
from generating more. This means that the intelligence of contraptions
can arise only from its "failures", or its ability to write something
off, allow something else to deal with it.

Joseph, Thank you for pointing me to Husserl and Heidegger. I have
Heidegger's "Being and Time" which every so often I pick up and put back
down again, so it's good to left bracket that. Saturday at the
bookstore I bought Karol Wojtyla's "Acting Person (Analecta
Husserliana)" translated in Lithuanian. (He's since Pope John Paul II,
he was no slouch as an existentialist philosopher. There's a question,
though, of how much of this book was rounded out by an editor.)

----

I try to write my thoughts more simply. And I also include some
speculations on the connection between "removing cycles" and "polynomial
use of resources" especially given that the tough problems seem to be
the ones without "sub-cycles". Andrius, http://www.ms.lt

It is cool, something nice to have in your mental tool set.
Joseph wrote a bit skeptically, and he is in the heart of computer
science, so I can't trump him in that space.
But I can draw on wider things, and a lot of imagination.

It's exciting when any discipline, including computer science,
introduces concepts and observations that touch upon our intuitions
about human life. In that sense, we are modeling our experience.

In developing computer science, it has been observed that there are,
intuitively, several different ways to think about computation, and they
can be considered in classes of different capability.

A simple computer is one such that, given state A, you find fixed
instructions that tell you what states B you can go to. For example, a
state may be a valid position on a tic-tac-toe board. You can describe
a "finite state automata" that would tell you, starting from the empty
board, how to make valid moves to valid boards, and ultimately get to
all the final boards. You can also describe a "finite state automata"
that would play well for one side, or for another side, or in whatever
particular way, so long as it is fixed. If there is one or no choice at
the state, then it is called deterministic, and if you allow more than
one choice, it is called nondeterministic.

It was noted that we can study the "languages" that consist of all
possible paths through the machine, we can talk about the paths
"accepted" by the machine. These are called regular languages. All
tic-tac-toe games is a regular language, but so is, I think, all chess
games. And even, all chess games where one or both players play without
mistakes. Simply because there are only so many chess positions. You
can return to the same chess position, but these would simply be loops,
cycles.

It turns out, that we can imagine computers that are smarter than that!
For example, consider the set of parentheses expressions, consisting of
left and right parentheses that are balanced, such as: (()()) It can
be shown that this is not a regular language! Basically, in order to
"accept" such a string, you need memory, you need to remember to close
each parentheses that you open. And there is a different kind of
computer that can do this, it is more powerful. One formulation for it
is a "push-down automata". I don't understand it too well, but you have
a "stack" of memory, like a stack of cafeteria trays, that you build by
laying down on top of each other, and then they are in "memory" so that
you can go back to them, as you take them off. So, for example, (()())
might be generated by saying that ( means put down a tray, and ) means
pick it back up. So (()()) would be: put down, put down, pick up, put
down, pick up, pick up. And in this spirit, each tray can be a
different color and come with instructions that tell you what you are
allowed to do next, for example, pick up or put down trays if they
happen to be a certain color. The languages are given by the sequence
of trays you go through, and called context-free languages and Chomsky
noted that there is something similar going on in human language, the
way we build sentences and have to keep track of what we're up to
syntactically. This is also important for building parsers, for parsing
commands. But it's also important in everyday life: I can "make a
commitment", which opens a left parentheses that waits for a right
parentheses.

You have to go down that stack in order. (Just a side note: Jesus of
Nazareth had a cryptic saying, "The first will be last, and the last
will be first", and that's pretty much what's going on here: the first
tray put down will be the last tray taken off, and the last one put down
will be the first one taken off, unless you put even more down.)

Well, there is a more powerful computer, a "linear bounded automata",
which we may say lets you move your position on that stack. We may
think of this as a magic tray that you can move up and down the stack.
What this does, is it allows you to reference how big is your stack.
You can have that magic tray go into a "gliding" state where it is
effectively bringing with it another tray of a particular color, perhaps
leaving it at the bottom. In this way, it can effectively "count" the
number of trays in the stack, and effectively void certain numbers of
trays by entering an "unhappy" state. This means that you can generate
languages such as "a to the 2 to the i": {aa, aaaa,
aaaaaaaa, aaaaaaaaaaaaaaaa, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,...} which
is growing exponentially. A normal pushdown automata couldn't do that
because it observes the "pumping lemma". Note that all of these
computers have only finitely many states and instructions. But they may
typically accept strings of arbitrary length. The pumping lemma
observes that, when a simple computer, like a finite state automata, or
a pushdown automata, is found to accept a sufficiently long string, then
we can show there must be some pattern in that string that can be
repeated, like a do loop, so that all such are likewise accepted. But
in the case of the linear bounded automata, it can manage to break any
linear pattern, any rhythm, by going into its own gliding modes, which
lets it do its own internal counting. The languages it accepts are
called "context sensitive languages".

Finally, the most powerful computer known, which has all manner of
equivalent definitions, is called a "Turing machine". It can model
absolutely every computation that we can describe. And there are many
that it models that we can't even describe for practical purposes. It
can go on calculating and we can never be sure if it will actually halt
or not, which is to say, we may never be sure if it actually accepts a
word, or not. The heart of the power of the Turing Machine is that it
can destroy evidence. Imagine being allowed to delete part of the
history of the trays that you've laid down. That pretty much makes it
effectively impossible to keep track of all of the ways things may have
been developing. It means that a very simple situation may come from a
very complicated situation. Whereas with the "linear bounded automata"
you always know that things are growing in complexity, time flows in one
way, so to speak, and you know when you can rule things out, when
they're surely not going to happen, especially for short strings. With
Turing machines it's possible that you may never know for sure.

Mathematicians approach this by talking about a hierarchy of classes of
languages. But, as somebody modeling life, I know that's a locally
evident quality, not only globally evident. To the extent that this is
helpful to model life, we should be able to say that "here the machine
is acting with less sophistication, and here with more sophistication",
or "this is a less sophisticated instruction, this is a more
sophisticated instruction". Mathematicians don't do this, they see that
as a value judgement. So that is a major goal of mine here. (For
example, the "perfect chess player" may be coded with a finite state
automata, but a lot more intelligence may be revealed as a pushdown
automata, or linear bounded automata, or Turing machine. And I think
that distinction in intelligence can be observed instruction by
instruction.)

In order to do that, we need a shared framework for the different kinds
of automata. Now, I haven't been exact or classical in terms of
defining the different kinds of machines here. But the upshot is that,
I don't have to be, because for each class there are many ways to define
the associated machines, and we can choose the definitions that work
best for us.

I look for the "most natural" framework. Metaphysically, in my
experience, the most basic and general objects are matrices. A matrix
A(ij) defines a relationship from each state i to each state j. This is
metaphysically very basic, like our mind moving from one thought to
another. We don't even need to define numbers to work with matrices,
they can simply be understood to relate states.

Let us define the states with a matrix X(ij) where the states are given
by X(ii) and when i does not equal j, we have zeroes. Then we can use
the following operations to model the different kinds of instructions:

Matrix multiplication X -> AX gives what finite automata do:
generating walks along the edges of A.
Two-sided multiplication X -> AXA yields what the pushdown automata can do.
The replacement XA -> AX turns out to leave the terms from A unaffected,
and simply "shift" the cursor recorded by X.
And then, we can have an operation that lets us remove a cycle of terms
from A.

My instinct is that these building blocks can be used to define
equivalents for all of the machines that we have discussed above.

This cycle removal operation is very interesting and I suspect related
to an outstanding math problem which asks if there are problems that can
be solved using a polynomial amount of resources (such as time) by a
nondeterministic machine (many calculation branchings allowed), but not
any deterministic one (no branchings allowed in the calculation).
Basically, allowing a calculation to branch is like guessing, and is in
a sense a multiplication of the calculational power, a sort of
exponentiation. But if that is only internal, and the external bounds
apply, its not clear if that is any real help or not.

A sample problem is to take a graph (perhaps coded by the edges of a
matrix) and ask, does it have any cycle of edges that go through all the
nodes? If you allow for guessing, branching, then you can design an
efficient algorithm that just quickly branches through all sets of
edges, and then in each case it does not take a lot of resources to
check if it works. (Maybe n-squared?) Whereas if you don't allow for
the branching, then nobody has found any efficient algorithm, it seems
that its exponential.

A down-to-earth way to think about this is, if you add another node, say
to the graph, so that the size goes from N to N+1, then how does that
affect your computation? How does it compare with all that you've
already done? If it's the same order of magnitude, or bigger, then it's
exponential growth! That seems to be the case for analyzing those
cycles. If it's a smaller order of magnitude, that is more like
polynomial growth. For example, if you're adding the numbers from 1 to
N, then that's roughly N-squared, which is significantly bigger than
N+1, especially when N gets large.

The funky thing about the universal framework that I've set up is that
it seems to generate all languages. You can specialize it by setting
particular values for A and X. But, in a sense, by the basic
operations, it is representing (and tracking!) every single possible
machine. Furthermore, with the exception of the "Turing machine
operations", it only grows in complexity. Strings are "accepted" simply
by being generated. So we have a very clear way of seeing the linear
boundedness of the resources. It is simply the number of operations
compared with the length of the string.

Now my hunch is that each time that we remove a cycle, we are adding to
the resources as if by a factor of the length of the cycle. And, in
defining the operation, we don't do it for particular cycles, the cycle
may be of any length, so we must assume that the cycle is as long as it
might be. All of the operations are quite general! which is odd but
powerful. In this set up, we can select the particular machine at the
very end, but for now we are dealing with all of them at once.

If we look at the kind of problems that are considered "tough", they may
be all thought to be looking for structures that lack "sub-cycles".
They are looking for machines that "cannot be pumped", that have no
regular pattern generator. So, for example, if in a graph with N nodes,
we are looking for a cycle through all the nodes, my point is that the
cycle, by definition, doesn't have any sub-cycles.

Well, in our set up, these kinds of problems are very easy to set-up.
We simply code the matrix A accordingly, putting 0 for nonexisting
edges. And we can do this at the end! The question is: Can we use our
basic operations to generate a language that would answer our question?
The answer is Yes. We can generate a language where the objects of
length N are precisely those, if any, which do not have any cycles. But
the only way we can restrict ourselves is if we use our cycle
eliminator! And we have to do this N-1 times if we want to eliminate
any subcycles. And for this particular algorithm it seems like we are
using exponential resources, something like N factorial, it seems.

So this says that we have found one bad algorithm. But does that say
anything about the good ones?

It might! That is because our set-up, although powerful enough to
capture the different kinds of automata, is extremely restrictive. So
we might be able to show that, in this set-up, there really isn't any
other way to get the language we want. The only tool that we have is
the cycle eliminator.

We may be able to show that, in our framework, for the resources that we
have defined, there is basically no simpler approach to the problem.
This might involve showing that, any restatement of the problem is
actually a transformation of this statement. As a transformation, it
may have essentially the same eigenvalues. And having the same
eigenvalues, there may be no way of getting it to simplify.

We may also be able to show that there is a transformation from this
framework into a more classical one with the traditional definitions for
time resources.

It's interesting what this might mean metaphysically, if anything. But
there seems to be a connection between the exponential power of our
ability to make guesses (allow for several options to travel down) and
the Turing machine's exponential power to destroy evidence (forcing us
to travel down several options to track what is going on).

Of course, very speculative, but I think, imaginative in a useful way,
at least for me. I invite such speculation on subjects that matter to us.

Atstatyti