ReadenReaden

LeCun Blasts LLM, Hinton & Leaves Meta: "JEPA will rule in 5 Years

新闻资讯初级 · 3.0
1800 词 0 分钟 18 次阅读
#科技

Yann LeCun breaks from Hinton & Bengio, says LLM not path to AGI, leaves Meta, starts AMI & Tapestry. In a long interview, he covers world models, why LLM not path to AGI, and his new ventures.

LeCun really went all out with Hinton this time…

Hinton never paid special attention to LLM before. Then 2023 GPT-4 came out, and he suddenly had an epiphany:

"Oh my god, these models are already very close to human intelligence, they might have subjective experiences…"

For this shift, LeCun says—

completely disagree, hard to understand.

I feel he just wants to give up: "Alright, this is what we need, I can declare victory."

"Yeah, I can retire now. Then go around giving speeches about the dangers of AI."

Then, turning the conversation, he pointed the finger at another Turing Award winner.

Actually, I've said a lot of things years ago, Hinton only realized it recently.

Bengio's situation is similar.

That's why when the host asked LeCun why he was so "alternative", he answered:

There was never a time when Hinton, Bengio and I parted ways; they changed.

Since we're talking about old bosses, of course we can't avoid talking about the old company.

By early 2024, especially 2025, FAIR no longer meets the conditions I think are needed to maintain innovation, research and breakthroughs.

A lot of good people left.

As for the reason, LeCun said Zuckerberg was great, and the leadership supported him a lot. It's just that after Meta got involved in the LLM race, there was no way to just focus on doing research.

LeCun expressed regret for this.

Because in his view, getting breakthrough research "is actually very simple."

Just hire the best people, these people have a nose, know what to do. You give them the resources they need to succeed, then…

Get out of the way, don't get in the way.

But the host was still not satisfied, asking "why?" repeatedly.

Key suspect—Alex Wang.

Host:

Was the Scale AI acquisition one of the catalysts for this pure LLM focus?

LeCun's answer was very honest, really answering whatever was asked.

Definitely yes. But I'm not sure I have enough internal information to comment.

Zuckerberg may have seen a kind of successor in Alex Wang, a younger version of himself.

Besides these, of course the classic show was also retained.

With a bit of a teasing tone, LeCun once again challenged the LLM camp.

JEPA-like world models will dominate AI in five years. (laughs)

This is LeCun's latest podcast interview, he chatted with the host for almost an hour and a half, about world models, JEPA, why left Meta, why LLM won't get to AGI…

It's been a long time since I listened to an interview word by word, I'm really a bit exhausted.

I didn't dare to skip any part, no boring parts, LeCun was crazy with hot takes the whole time:

The full interview is attached below.

To ensure readability, QbitAI has partially adjusted the content without changing the original meaning.

Enjoy.

Host: Back then you bet on neural networks, everyone was questioning you, and you turned out right.

Now you're doing something similar, betting against LLM and mainstream generative architectures.

You also just started a new company AMI around this direction recently. What is AMI doing?

LeCun: First of all, let me make it clear that there's nothing wrong with LLM.

LLM are the foundation of many very useful AI products, I'm using them myself, they're great, do what they're supposed to do.

But LLM are not the path to human-level intelligence, not even animal-level intelligence.

Host: You even helped make one of the earliest major open-source LLM.

LeCun: That's right. So what is AMI? AMI stands for Advanced Machine Intelligence, our positioning is AI for the real world.

The AI technology everyone is familiar with today is good at language manipulation.

Language is a very special thing, it's especially suitable for these successful architectures.

But what about the real world? It's high-dimensional, continuous, noisy, chaotic. The difficulty is not on the same level.

This is what I've been working on for most of my career, accelerated in the past five or six years, made substantial progress in the past two years.

By the end of last year, it was obvious that Meta was no longer the right place to advance this project, so I left and founded AMI.

Host: This seems to be an industry trend, more and more people leaving big companies or research labs, starting businesses with their exciting research directions.

LeCun: This is indeed a very strange trade-off.

There are two modes. One is a lot of exploratory research, many directions in parallel. Then something seems to work, you need to push it forward, but it's no longer research.

The people doing this are researchers—at least the media calls them that—but actually it's engineering and productization.

This happened several times at Meta.

In early 2023, Llama 1 developed by FAIR was very promising, Meta specially created Gen AI organization to turn it into a real product, later came Llama 2, Llama 3, Llama 4.

Llama 4 was a bit disappointing, Zuckerberg was dissatisfied with this, restructured the whole organization, changed people.

But what really happened in the past year is that Meta realized it was behind, so refocused strategy on catching up with the industry.

Side effect is, a lot of exploratory research was deprioritized.

My work on JEPA and world models wasn't affected, but other parts of the company completely focused on LLM.

This made it clear to me, Meta was no longer the right place to advance this project.

We had preliminary results, needed to shift from research to real technology development, scaling and productization.

At the same time, we also realized that Meta wasn't very interested in most application scenarios, like manufacturing.

Host: You're pursuing the big direction of world models. But there are also other people approaching world models from a more generative angle, like Google's Genie, various video models, VLA, and 3D spatial models by Fei-Fei Li… How do you compare JEPA models and these approaches?

LeCun: World models are quickly becoming a buzzword, already in research, starting in industry.

I won't talk much about VLA. This path is now generally considered to not work, not reliable enough, requires too much training data.

So what is a world model? Fundamentally, a world model lets an agent predict the consequences of its own actions.

I can't imagine how you can build an agent system that doesn't have the ability to predict the consequences of its own actions. If humans acted without considering consequences, people would think we're idiots.

So world models are just this, able to predict the consequences of your own actions, you can plan a series of actions to complete a task, achieve a goal.

Do this through planning, reasoning, search and optimization, not autoregressive prediction one token after another like LLM. You're searching for an optimal action sequence to complete the task.

LLM don't have the ability to predict the consequences of their own actions, nor do they have real planning ability, because reasoning is just predicting the next token, not search.

So, intelligent behavior requires three characteristics.

First, the ability to predict the consequences of actions.

Second, the ability to plan through optimization and search, find the action sequence that produces the correct result.

Third, how you predict the consequences of actions.

For example, there's an open water bottle in front of me. If I push the bottom of the bottle, it will slide on the table. If I push the top of the bottle, it might tip over.

But we can't predict exactly which direction the bottle will tip. We can't predict these at the pixel level.

The world model in our brain predicts at an abstract level of representation.

Host: The design of this architecture is largely inspired by the human brain?

LeCun: At least inspired by cognitive science. There's a big gap between that and directly translating it into a specific neural network architecture.

Cognitive science is indeed a motivation. System two in psychology is this, when you're doing thoughtful, reflective behavior, you imagine, predict the consequences of your actions, then plan accordingly. Different from system one's instinctive, reactive behavior.

So there are sources of inspiration, but also a lot of empirical evidence shows you shouldn't generate pixels.

I've been interested in building world models through prediction for a long time.

About five years ago there was an epiphany, I realized all successful architectures that learned good image and video representations are non-generative.

VAE, variational autoencoder, or more broadly autoencoders, intuitively seem like a natural way to learn abstract representations of inputs. You feed an image into a neural network, train it to reconstruct the input at the output.

But if you do this directly with a big neural network, nothing interesting happens, it just learns the identity function, completely meaningless.

Learning image representations with VAE can get something, but the effect is really not good. Sparse autoencoders too.

There's another category of techniques, called denoising autoencoder, MAE is a variant, BERT in NLP is similar idea. You corrupt part of an image, then train neural network to restore original image.

FAIR once had a big project doing this, invested a lot of compute, results very disappointing.

But at the same time, some of the same people and others in Paris, New York were doing another set of techniques, using non-generative architectures.

You take an image, corrupt it, feed both versions into encoders, then use a predictor to predict the representation of the original version from the representation of the corrupted version.

This is JEPA. One encoder encodes one observation, another encoder encodes another observation, then use a predictor to predict the first's representation from the second's.

Host: Now many robotics companies are releasing more and more impressive demos, seem to show some planning and reasoning ability, can execute even for unseen rooms or task versions. What do you think?

LeCun: There's real progress, some demos are really impressive. But these systems need massive data to train, either collected through teleoperation, or by holding end-effectors manually…

Mainly trained by imitation learning, plus a little reinforcement learning in simulation.

The problem is, imitation learning requires a lot of data, and you have to collect data separately for every task you want the robot to do, high cost, also quite fragile.

And if the system has a world model, can predict the outcome of actions, it can directly plan actions to complete a new task, no need to train specifically for this task.

The generalization ability brought by world models is much larger, can cover a wider task spectrum with less training data.

There are indeed synergies between tasks, the more tasks you train the system to do, the less data it needs to learn new tasks.

But the hope of world models is, can zero-shot solve new tasks. The goal is to solve a lot of problems with little or zero training data, maybe add a little RL-style fine-tuning.

Humans completely have this ability, many animals too.

A 17-year-old kid only needs ten to twenty hours to learn to drive. We have millions of hours of driving data, still don't have L5 autonomous driving.

Imitation learning can't even handle autonomous driving.

Host: There's an idea of using video models to generate a lot of synthetic data for simulation, even if physically not perfect, can improve robot performance in real world. What do you think?

LeCun: Still the same question, why can a 17-year-old kid learn to drive in 20 hours?

You don't need millions of hours of demonstration data, nor synthetic data.

If we crack this problem, we don't need to generate data.

May still need to train in simulation, but don't need the amount of data and number of trials required by current systems.

Host: An interesting point is, if you're OpenAI, you know something will keep getting better if you keep scaling, from a business perspective, you don't have much incentive to do more data-efficient things.

LeCun: Other companies also don't have incentive to do different things, no one can afford the price of falling behind competitors. This is a Silicon Valley herd effect, everyone is digging the same trench.

That's also why I set AMI's headquarters in Paris, US office in New York, not in Silicon Valley.

Host: What's the application direction of AMI technology you're most excited about?

LeCun: AI for the real world. Home robots, L5 autonomous driving.

Host: When can I have a home robot?

LeCun: That's still several years away. Even though there are a lot of companies making robots, no one really knows how to make them smart enough.

Host: Also can't trust them to work in a home with a baby.

LeCun: That definitely won't work. Even for relatively narrow manufacturing tasks, imitation learning can only handle a few tasks, no one really knows how to make them work reliably.

In the short term, there are massive application scenarios in industry.

You need an intelligent system that can predict what will happen if I change a certain control variable on this complex system. Jet engines, chemical plants, power plants, production lines, human bodies, human cells…

These systems are too complex to model with a few equations, traditional modeling methods won't work.

What you need to do is train a model from data with deep learning to capture the dynamic behavior of this system. What you get is a phenomenological model.

If it's action-conditioned, then you have a world model of this system, can be used for optimal control.

The number of such applications is amazing.

Host: How far do you think JEPA models will develop in the next few years?

LeCun: Five years.

Within five years, completely rule the world.

Host: Alright, five years to rule the world. (laughs)

LeCun: Just kidding.

This is a quote from Linus Torvalds. Back then someone asked him what Linux's goal was, he said total world domination. He actually did it.

But I do think JEPA-like world models are the blueprint for future intelligent systems.

LLM will still have a small place, as a language interface.

But what we're designing are systems that can think. They might not speak or listen at first, but they will think, then you can add speaking and listening abilities on top.

Host: You've had this experience before, made an extremely contrarian bet on neural networks, eventually proven correct by history.

When do you think everyone will realize you're right again?

LeCun: I think it will come faster than expected.

Many people realize VLA won't work, LLM can't handle real world data. The awareness of paradigm shift is happening. By early 2027, this will become completely obvious to everyone.

Host: Changing the topic, let's talk about Tapestry you're working on.

LeCun: This is a bit orthogonal to AMI Labs.

Host: Seems like AMI alone isn't enough to keep you busy.

LeCun: This is an idea I've slowly formed over the past three years or so.

People are increasingly using AI assistants to do all kinds of things, usage of traditional search engines is declining, everyone just asks their AI assistant.

If smart device plans by Meta and other companies come true, like smart glasses and so on, you basically talk to your AI assistant through voice. All your information acquisition will be mediated by AI assistant.

Then here's the problem.

If you're from a country outside China and the US, the AI assistant you use is made by a Silicon Valley or Beijing company. Actually not very good.

The language you speak, not taken seriously at all.

Your culture, these AI companies don't understand.

Your values, almost not reflected in publicly available training data on the internet.

How to solve this problem?

You need a platform, based on an open, free foundation model, Llama style, anyone can fine-tune it, adapt to specific language, specific culture.

This is the core of Tapestry. Global contributors participate in training a global model, this model is essentially a repository of the world's knowledge and culture.

Contributors contribute data and compute, while retaining control over data. They don't need to share data with other contributors. They contribute parameter vectors.

This is a federated learning idea.

A bunch of data centers, each gets a global consensus model's parameter vector, can be understood as the average of all contributors' parameter vectors. All contributors exchange parameter vectors through a central server regularly.

Local workers update their own parameters, while trying to keep it close to the global consensus vector. As training progresses, all parameters converge to a consensus model, which is equivalent to the effect of training on all data.

Now you have an open model, as good as training on the world's data. Then you can fine-tune it for your own purposes.

I think there's a natural force pushing this to happen.

AI is quickly becoming a platform, platforms have a natural trend towards openness.

Linux is like this, the internet's software infrastructure too, wireless networks too. All started as proprietary, later all replaced by open source.

Host: This is indeed a very smart way to fight the trend of open source shrinking. Many people worry closed-source models are getting stronger and stronger, will be used to train the next generation, forming a closed-source escape effect.

LeCun: Remember who the big players in internet infrastructure were in 1996?

Sun Microsystems, HP, Dell. Sun paired you Solaris with their proprietary hardware, HP paired with HP-UX.

Unix was much more reliable than Windows, you wouldn't run a web server on Windows.

But now who's running web servers on Windows NT? All killed by Linux. The whole internet runs on Linux, even Azure, Microsoft themselves are Linux.

So today's OpenAI, Anthropic, are yesterday's Sun Microsystems and HP-UX.

Host: This implies your judgment on the upper limit of these models' capabilities, open source will catch up sooner or later.

LeCun: They've already used up data.

Publicly available, valuable text data is all used up, no more. What these companies are doing is buying commercial copyright data licenses, or training with synthetic data.

Host: But there have been some impressive results in the past few years, achieved after large-scale pre-training. IMO gold medals, various benchmarks keep improving.

LeCun: This is very interesting.

Think about these two fields, math and code. What's the commonality between these two fields?

Language itself is the carrier of reasoning. Not the only carrier, but when you do formal mathematical deduction on paper, you're manipulating language, LLM are really good at this. Proving theorems and so on, LLM are good at it.

But LLM aren't very good at coming up with good concepts, good definitions. Creative behaviors, LLM can't do. Math isn't just problem-solving, most is actually creative behavior.

Code is the same.

LLM are good programmers, but not software architects, not computer scientists. They can help us write code, but can't replace humans yet.

It changes the role of humans.

Humans now move up one level of abstraction, our job is to decide what to make, and the process of making can let LLM help.

Host: What do LLM need to do to convince you to change your mind?

LeCun: Zero-shot agenting.

Give it a brand-new problem, it hasn't been trained to solve this problem, no script for it. Can it complete this never-trained task?

Unless the system has the ability to predict the consequences of actions, and can use this ability to plan.

Maybe a greatly enhanced LLM can, the kind with added search and planning abilities.

Current math LLM are already doing this, they search for token sequences that can complete specific tasks, can run code or verify if proofs are correct, so there's a way to check if outputs are correct.

But this isn't an efficient way of planning, and only works in domains where search can be done in token space.

What I'm talking about with JEPA, not doing this in token space, doing it in abstract thinking space.

Host: Maybe some listeners will think, even if inefficient, what can work in token space already covers a large part of the economy.

LeCun: Right.

Use LLM for what they're good at, that's completely fine.

I'm just saying, it's not the path to AGI. And the fields covered by general AGI will be quite huge.

Host: Sounds like you think LLM will hit a ceiling before becoming software architects.

LeCun: It won't hit a ceiling. But it will become more and more difficult to deploy these systems in more and more application scenarios, because every scenario needs to collect a lot of training data.

And you can't make these systems completely reliable, no hallucinations, no dangerous behaviors.

Host: You share this honor with the other two Turing Award winners. But they seem to have completely different views on LLM's potential, or potential threats and security risks. When did you start to differ?

LeCun: 2023.

Host: What drove this difference?

LeCun: It's not that I changed my mind, they changed theirs.

Hinton wasn't like this before, he never paid special attention to LLM.

Then in 2023 when GPT-4 came out, he suddenly had an epiphany:

Oh my god, these systems are already very close to human-level intelligence, they might have subjective experiences.

I know his thinking was roughly like this—

Human cortex has about 16 billion neurons.

If you want to do something like backpropagation, the brain doesn't do backpropagation directly, but if it does some gradient estimation to optimize some objective function, you probably need a circuit of several neurons to replicate the function of a virtual neuron.

So suppose you need 10 real neurons to replicate the function of a backpropagation neuron, then your cortex is only equivalent to 1.6 billion neurons.

Then he reasoned:

Oh my god, GPT-4 is already very close to this number! Maybe it will become as smart as humans.

I completely disagree with this statement.

I feel he just wants to give up, then go around giving speeches about the hopes and dangers of AI.

Alright, I can retire now, I can declare victory.

I've spent my whole life looking for the learning algorithm of the cortex, maybe I didn't find out what it is exactly, but backpropagation seems like a good substitute, it works really well.

Yeah, this is what we need, I can retire.

(laughs)

But his voice about AI dangers is much smaller now than a year or two ago.

I think he realized a few things.

First, current LLM aren't that smart.

Second, before reaching human-level intelligence, some conceptual breakthroughs are still needed.

Third, the blueprint of these systems will be very different from LLM, and we probably have ways to make them controllable.

I said these things years ago, Hinton only realized it recently.

Bengio's situation is similar.

I think what the two of them are really worried about is whether the social system can ensure that the benefits of AI are maximized, ensure that AI doesn't just exacerbate inequality.

It's not that doomsday scenario of AI ruling the world, more a problem of bad users.

Host: But this risk already exists with today's LLM.

LeCun: Indeed. But I don't think it's as doomsday as some people claim. Definitely not as doomsday as Anthropic claims.

Anthropic is trying to use fear to push AI regulation. I completely disagree with this approach.

Host: They seem to really believe it.

LeCun: I think they do really believe it. But I also think, they have some good business reasons to believe these.

Host: Speaking of these new architectures, you're quite sure LLM isn't the endgame, but you're also quite aggressive about the timeline for new architectures. What do you think about the security aspects if these new architectures really bring breakthroughs?

LeCun: I'm going to say something that might be controversial. My Meta colleagues definitely won't like me saying this.

LLM can't be made reliable, because you can't stop them from hallucinating.

Host: According to you, are you not surprised they can complete those 15-hour programming tests?

LeCun: Code is a domain you can verify. Whether the code you generate meets the specification, this can be checked.

But not everything is code. There are already examples of programming agents clearing your hard drive, or doing stupid things that cost you a lot of data or money.

You give a prompt, it completes the corresponding task, but only because training let it learn to do the right thing for this prompt. No hard constraints forcing it to complete this task, no mechanism for it to predict whether the task is completed correctly.

And they have no common sense. That car wash joke that was circulating a month ago, I tried again two weeks ago, all models said you should walk. Except Gemini.

Host: Then Gemini was probably trained with the video where you told this example before.

LeCun: Not my video. I didn't invent this example. But this does happen, I say LLM can't do something, six months later they can do it.

The reason is simple, after I say on a podcast that LLM can't do this, everyone will of course go to ChatGPT to type this question. So it becomes part of the training set. The next version can of course answer it.

But it's not because it suddenly got smarter, just because it was trained on this question.

I don't think there's a way to fix this under the current paradigm.

The architecture I'm proposing is objective-driven AI. You give an AI system a goal, which is to complete this task.

How does the system know it will complete this task?

It has a world model, it predicts the results of a series of imagined actions.

If this result satisfies a cost function, describing how well the task is completed. Then the system works by optimizing, finds an action sequence that completes the task, minimizes the cost.

Of course there are still many things that can go wrong.

The cost function might be inaccurate, you think it's measuring task completion, but maybe not.

The world model might be inaccurate, the system's prediction of the consequences of actions might be wrong.

This system will still make mistakes, but at least it can predict the consequences of actions to some extent, I think this is indispensable for any agentic system.

You can also add not just one cost function to ensure task completion, but also a bunch of other objective functions, cost functions, even constraints.

You can specify these at the abstract level, or have low-level objective functions, combined to ensure the system isn't dangerous. The system is constructed so it can't violate these conditions.

LLM can't do this.

LLM can always escape. There's always a gap between training error and test error, there will always be some prompt that makes the system do something very stupid.

Host: Can we talk about a specific field. There are also many people using LLM in the medical field. What can't LLM do in medicine, need a model that really understands the world?

LeCun: Like designing treatment plans for patients with chronic diseases, even non-chronic diseases, especially when this patient's situation doesn't completely fit the templates you've seen before, if you have a good mental model of the patient's physiological dynamics, you might be able to design a treatment plan that can truly bring the patient to a good state.

The patient can also be a cell.

How to make a stem cell become a pancreatic beta cell that can produce insulin?

A type 1 diabetes patient, the immune system attacked its own beta cells. How to continuously make beta cells?

Do you have a model of human cells that can let you figure out what sequence of signals to send to the stem cell to make it a beta cell?

What LLM can do is repeat the knowledge you can read in books.

But you can't just rely on reading books to be a doctor. You have to do residency training, have to listen to hearts, press abdomens, to make a diagnosis.

Host: You were at Meta for more than a decade, built one of the world's most respected research labs. Recently left. Looking back at that time, what do you think you did right, what wrong?

LeCun: What I did right was, built a top research lab, really made innovations, produced a lot of basic methods, scientific results and tools, like PyTorch.

Also an open culture that respects the scientific process, I think this is necessary for breakthrough innovation.

There's a whole chain of innovation. At the very front is blue sky research, brand new concepts, most happen in universities, a small part in advanced industrial research labs, such labs can be counted on one hand.

Google has a good one, FAIR used to be a good one. Hope it will continue to be.

Then the next step is, this is a good idea, let's push it forward to see if it can become useful.

But still in the sense of research, we won't fool ourselves only finding a solution that works for this problem, we'll see if this technology can be pushed to practical, not necessarily product-level, but at least prove it broke a record on some task or benchmark.

Next step is the company says, okay, we need to invest a lot of engineering force to push this forward.

This is where a lot of projects fail, also where a lot of companies drop the ball.

Meta was actually okay in this respect, but far from perfect.

Partly organizational issues.

You need a team close to research but not completely a product organization to take over. Not an organization with a three-month deadline for a product, but one that can continue to push the technology forward.

We once had such an organization, then lost it. FAIR became isolated in the company, a lot of ideas no one took over.

In 2023 Gen AI organization was established, pulled 60 to 70 scientists and engineers from FAIR, later expanded in scale.

But it faced too much short-term pressure, didn't have time to communicate with FAIR. Result Gen AI, which was supposed to stay cutting-edge and innovative on LLM, could only focus on short-term goals, became very conservative. There was a gap between research and product.

Host: Was Llama 4 like this?

LeCun: Even started from Llama 3. Llama 1 was a small project inside FAIR from 2022 to early 2023.

Then Gen AI organization was established, Llama people were transferred over, started working on Llama 2.

Then a bunch of people realized, I can go out and start a business.

This is the origin of Mistral, two authors of Llama 1 and a person from Google co-founded Mistral together.

During that time, quite a few people left Meta.

Gen AI organization that took over Llama's subsequent work faced huge short-term pressure, became very conservative.

There was pressure from leadership, also problems with the team itself. There are many ways things can go wrong, you can't blame it on one person.

Host: Now many organizations face this kind of short-term pressure. Do you think a pure research environment like FAIR back then is still possible in today's industry? Or is the only way out to leave and start your own company?

LeCun: I think there are still a few places inside Google Research and DeepMind really doing research. But the whole industry is getting more closed.

Google is tightening up, Meta and FAIR are also going in the same direction. Now there are more restrictions on publishing papers, if what you're doing is related to the company's business in the medium term, they'll tell you not to discuss publicly.

This atmosphere is not good for breakthrough research.

It's a pity, because getting breakthrough research is actually very simple. Just hire the best people, these people have a nose, know what projects to do.

You give them the resources they need to succeed, then…

Get out of the way, don't get in the way.

Host: What does this mean for the broader research community? One of FAIR's legacies is training a lot of researchers, now spread throughout the ecosystem. But young people entering this field now, might be thrown into short-term oriented environment from the start.

LeCun: People willing to work with me usually have two characteristics.

One is crazy enough.

Two is agree with an idea, that during PhD in academia, you should do the next generation of AI systems, not the current generation.

If you're doing LLM in academia now, honestly very boring. Basically just researching why LLM work, how they work, what their limitations are. This is descriptive science, not very creative. Not interesting.

And if you really want to make new things with LLM, you can't get the GPUs you need at school at all.

So forget it.

If you're doing a PhD, don't do LLM. No point, you can't make contributions.

Host: How did you know it was time to leave Meta?

LeCun: It's a combination of multiple factors.

A lot of people have a completely wrong perception of my role at Facebook and Meta. I joined at the end of 2013, really started in early 2014. First four and a half years I was director of FAIR, I built FAIR's organizational structure, established the culture, hired the core people, managed the whole team.

Four and a half years later I stepped down from this role, became Chief AI Scientist.

On the one hand, almost sixty, I just don't want to do management anymore. I'm willing to do it for a while to build the organization, but I'm not good at this.

I'm more of a scientific or technical visionary, an engineering scientist.

After becoming Chief AI Scientist, I reported to CTO. Started pushing a research project I thought was necessary, because FAIR's ambition was always to build intelligent systems. When I was in charge of FAIR I put my own research aside, didn't have time to do it.

At that time I had already formed a concept, this architecture would be based on self-supervised learning, based on predicting from perceptual signals like video. These are the world model ideas.

In 2016 I gave a keynote at NeurIPS, said AI research should go this direction, world models, predict consequences of actions, then plan.

I said RL won't take us there, because too inefficient. Supervised learning already showed its limitations. The future is self-supervised learning and world models.

So how to do self-supervised learning and world models?

I started several projects, some directions didn't work out. Did some video prediction work, then formed this concept:

You can do self-supervised training on video, but must let the system predict in representation space, not in pixel space.

This is the core idea of JEPA.

This idea roughly took shape in 2020. In 2022 I wrote a very long vision paper, wrote out my whole vision. Laid all secrets open, I don't care. I hope this can pull a group of people into this direction.

It actually worked.

Not only attracted a group of students, at NYU and Paris, because they wanted to do this direction.

There was also a whole team inside FAIR said, this is what we want to do. Then Joelle Pineau said, this should become a major mission of FAIR, we call it Advanced Machine Intelligence.

Host: Then they let you go out and start a business with this name.

LeCun: Right. Zuckerberg read that paper, knew what it was saying, agreed with this project. CTO Andrew Bosworth too, former CTO too, CPO too. The leadership had a lot of support for this project.

But then the company refocused all energy on LLM.

Despite leadership support, the levels below didn't really buy in.

And the applications of JEPA world models, although there are scenarios in wearable agents and robots, Meta robotics research group was cut.

So the environment wasn't right.

Most application scenarios of JEPA were in industrial fields Meta wasn't interested in. FAIR was increasingly asked to help LLM.

Host: Was the Scale AI acquisition one of the catalysts for this pure LLM focus?

LeCun: Definitely yes. Maybe other reasons too. I'm not sure I have enough internal information to comment, but it's possible Zuckerberg saw some kind of successor in Alex Wang, a younger version of himself.

Host: A lot of media narrative is after Alex Wang came, pure research organizations became harder to run.

LeCun: There's a big misunderstanding here, about my role, my relationship with Alex Wang, and how AI worked at Meta.

My technical contribution to Llama is zero, completely nothing. My only contribution to Llama was strongly advocating for open-source Llama 2.

At that time there was a big internal debate. This was a very high-level discussion, two hours every week, about 40 people from Zuckerberg down, lasted for several months.

Boz and I were very clear in advocating, security risks were exaggerated, opportunity to create an industry was very big, open-source Llama 2 would start the whole AI industry. It turned out to be true.

But Llama itself, my technical contribution is zero. I neither pushed it, nor hindered or slowed it down.

There were a lot of people doing LLM inside FAIR, that's good. I never opposed it, just said this isn't the path to human-level intelligence. But it's useful, as useful as speech recognition or translation.

Especially after I stepped down as FAIR director in 2018, I had no direct influence on what other people were doing. I just put out my vision, then pull people to my project.

They worked with me because they wanted to, not because I was their boss.

By early 2024, especially 2025, FAIR's direction and management style no longer met the conditions I thought were needed to maintain innovation, research and breakthroughs.

A lot of good people left.

Podcast link: https://unsupervised-learning.simplecast.com/episodes/ep-86-yann-lecun-on-leaving-meta-breaking-the-llm-paradigm-why-hinton-is-wrong-rZ6fpa_8

Reference link: [1]https://x.com/jacobeffron/status/2055279354821607551

One-click triple "like", "forward", "heart"

Welcome to leave your thoughts in the comments!

— End —

评论

0 条讨论

按时间

登录后发表评论

立即登录

暂无评论

成为第一个分享想法的人吧!